> Wouldn't renaming the existing "surrogates" handler be an incompatible
> change, and thus inappropriate?
No - it's new in Python 3.1.
So what do you think about Antoine's proposal?
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
htt
On approximately 5/6/2009 10:53 PM, came the following characters from
the keyboard of Martin v. Löwis:
The error handler designed with utf-8 in mind has no name in the encode
direction and is called "utf_8b_decoder_invalid_bytes" in the decode
direction. By your reasoning, *that* should be its
>> So are you proposing that I should rename the PEP 383 handler
>> to "utf_8b_encoder_invalid_codepoints"?
>
>
> No, he's saying that your algorithm for choosing the PEP 383 handler
> should have come up with that name, rather than utf8b. But since PEP
> 383 applies to other codecs besides UTF-
> By the way, what are the ASCII characters that are not suppported by
> Shift-JIS?
> Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
> backslash and the tilde).
The problem with this encoding is that bytes below 128 appear as second
bytes of a two-byte encoding:
py>
> The error handler designed with utf-8 in mind has no name in the encode
> direction and is called "utf_8b_decoder_invalid_bytes" in the decode
> direction. By your reasoning, *that* should be its name in Python. The
> encoding error handler would then be named analogously
> "utf_8b_encoder_inva
Michael Urman wrote:
> On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" wrote:
>> Despite there being also an error handler called "surrogates".
>
> Not that I have to be, but I'm not sold on the previous UTF-8 codec
> behavior becoming an error handler of the name "surrogates" for two
> reasons (
On approximately 5/6/2009 6:06 PM, came the following characters from
the keyboard of M.-A. Lemburg:
Martin, please stop being silly and just change the name.
Yes, please. If indeed Marc-Andre invented the codec business as he
claims, he would be an appropriate person to give a fiat name t
Martin v. Löwis wrote:
Are you serious?
Are you? ;-? You are the one naming a codec-agnostic error handler (if
I understand correctly, and correct me if I do not) after a particular
codec, and denying that that could cause confusion. See other message.
I can only repeat what I said before: I
"Martin v. Löwis" writes:
> > Now, with Python's file system encoding == UTF-8 or any packed EUC,
> > and more than a handful of Shift JIS or Big5 characters in file names,
> > one is *almost certain* to encounter ASCII as the second byte of a
> > multibyte sequence. PEP 383 can't handle this
On behalf of the Python development team, I'm thrilled to announce the first and
only beta release of Python 3.1.
Python 3.1 focuses on the stabilization and optimization of features and changes
Python 3.0 introduced. For example, the new I/O system has been rewritten in C
for speed. File system
Some of my messages appear not to have gotten through.
--
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mai
Martin v. Löwis wrote:
The name "utf8b" suggested in the PEP is not in line with the codec
design
>>> Where is that design documented, and how exactly violates the name
>>> the design (chapter and verse, please).
>> Martin, I designed the whole Python codec machinery
>
> Not true. PEP 29
On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" wrote:
> Despite there being also an error handler called "surrogates".
Not that I have to be, but I'm not sold on the previous UTF-8 codec
behavior becoming an error handler of the name "surrogates" for two
reasons (I do respect the obvious PBP arg
Eric Smith wrote:
Mark: I've reviewed this and it looks okay to me.
Thanks Eric - I've now applied that patch. As you mentioned in a
followup to the bug:
| Thanks for looking at this, Mark. If we could only assign issues to
| Python 3.2 and 3.3 to change the pending deprecation warning to a
Martin v. Löwis v.loewis.de> writes:
> py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
> '\ud800'
The point is, "surrogates" does not mean anything intuitive for an /error
handler/. You seem to be the only one who finds this name explicit enough,
perhaps because you chose it.
Most other handlers
> I qualify with a). I believe I understand c) but, as explained in my
> other post, I do not think your reason applies. In fact, I think
> concern for naming rights might suggest that you *not* reuse the name
> for something different. I would have to learn more about the existing
> 'surrogates'
Antoine Pitrou wrote:
Martin v. Löwis v.loewis.de> writes:
Despite there being also an error handler called "surrogates".
People, perhaps we could end all the bikeshedding and call one of those handlers
"surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
to what t
>> Are you serious?
>
> Are you? ;-? You are the one naming a codec-agnostic error handler (if
> I understand correctly, and correct me if I do not) after a particular
> codec, and denying that that could cause confusion. See other message.
I can only repeat what I said before: I call it utf8b
Martin v. Löwis wrote:
Antoine Pitrou wrote:
Martin v. Löwis v.loewis.de> writes:
Despite there being also an error handler called "surrogates".
People, perhaps we could end all the bikeshedding and call one of those handlers
"surrogates-pass" and the other "surrogates-escape", which sounds q
Martin v. Löwis wrote:
Because utf8b (or, perhaps "UTF-8b") is the official name for this
algorithm:
http://hyperreal.org/~est/utf-8b/
Thank you for the link. It starts:
"This directory contains a C implementation of a UTF-8b codec.
A Python codec based on it is provided as well."
'RTF-8b' c
2009/5/6 Antoine Pitrou :
> Martin v. Löwis v.loewis.de> writes:
>>
>> Despite there being also an error handler called "surrogates".
>
> People, perhaps we could end all the bikeshedding and call one of those
> handlers
> "surrogates-pass" and the other "surrogates-escape", which sounds quite
>
Martin v. Löwis wrote:
+1 for "surrogate" as the name for the error handler.
+1 from me also
Despite there being also an error handler called "surrogates".
Given that additional information which MAL apparently omitted, I would
revise.
Are you serious?
Are you? ;-? You are the one
Antoine Pitrou wrote:
> Martin v. Löwis v.loewis.de> writes:
>> Despite there being also an error handler called "surrogates".
>
> People, perhaps we could end all the bikeshedding and call one of those
> handlers
> "surrogates-pass" and the other "surrogates-escape", which sounds quite
> faith
> But first, it should be stopped by any of several
> standard precautions. For example, applying os.path.realpath (come to
> think of it, PEP 383 should say something about realpath, shouldn't
> it?)
Why do you think so? I think the existing documentation of realpath
is correct and complete.
>
Martin v. Löwis v.loewis.de> writes:
>
> Despite there being also an error handler called "surrogates".
People, perhaps we could end all the bikeshedding and call one of those handlers
"surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
to what they actually /do/?
R
> Is it only usable with utf8 as an encoding?
No, it applies to any codec which potentially cannot decode
all bytes >127.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
Terry Reedy wrote:
> Glenn Linderman wrote:
>> On approximately 5/6/2009 3:08 AM, came the following characters from
>> the keyboard of MRAB:
>>> M.-A. Lemburg wrote:
Martin v. Löwis wrote:
>>
>>> Judging by the existing names, I think that 'surrogate' would be
>>> reasonable. It already conta
> Judging by the existing names, I think that 'surrogate' would be
> reasonable
MAL's list of existing names is incomplete. "surrogates" is already
an existing name, also, and it means something different (similar,
but different).
Regards,
Martin
___
Py
> I'm sorry for the lack of clarity of my posts, but somehow you're
> completely missing the point. The point is precisely that Python
> *won't* use Shift JIS as the file system encoding (if it did there
> would be no problem with reading Shift JIS), but the people who
> created the media *did*.
>
>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>> design
>> Where is that design documented, and how exactly violates the name
>> the design (chapter and verse, please).
>
> Martin, I designed the whole Python codec machinery
Not true. PEP 293 was written and designed by
On approximately 5/6/2009 12:18 PM, came the following characters from
the keyboard of Zooko Wilcox-O'Hearn:
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:
Zooko Wilcox-O'Hearn zooko.com> writes:
I'm not thinking of API compatibility as much as data compatibility
-- someone used Python
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:
Zooko Wilcox-O'Hearn zooko.com> writes:
I'm not thinking of API compatibility as much as data
compatibility -- someone used Python 3.1 to write down some
filenames, and now a few years later they are trying to use the
latest and greates
Glenn Linderman wrote:
On approximately 5/6/2009 3:08 AM, came the following characters from
the keyboard of MRAB:
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not
On approximately 5/6/2009 12:53 AM, came the following characters from
the keyboard of Martin v. Löwis:
Sorry! I suggest substituting the paragraph above for the paragraph
which begins "The encode error handler interface presentlyrequires..."
at line 129.
Ah, ok. This was Glen Linderman's te
On approximately 5/6/2009 3:08 AM, came the following characters from
the keyboard of MRAB:
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not too
long, and the codes
On approximately 5/6/2009 6:33 AM, came the following characters from
the keyboard of Stephen J. Turnbull:
"Martin v. Löwis" writes:
> In any case, Python 3.1b1 may get released today, so it's way too late
> for new features in the PEP. They can wait for Python 3.2.
You have convinced me that
Zooko Wilcox-O'Hearn zooko.com> writes:
>
> I'm not thinking of API compatibility as much as
> data compatibility -- someone used Python 3.1 to write down some
> filenames, and now a few years later they are trying to use the
> latest and greatest Python release to read those filenames...
John Millikin wrote:
> In Python 2, PyMapping_Check will return 0 for list objects. In Python
> 3, it returns 1. Obviously, this makes it rather difficult to
> differentiate between mappings and other sized iterables. In addition,
> it differs from the behavior of the ``collections.Mapping`` ABC --
On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote:
Now, with Python's file system encoding == UTF-8 or any packed EUC,
and more than a handful of Shift JIS or Big5 characters in file names,
one is *almost certain* to encounter ASCII as the second byte of a
multibyte sequence. PEP 383 can't h
On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote:
You have convinced me that the PEP should wait as well.
In its current form it is incomplete and dangerous.
+1 on delaying PEP 383
I think PEP 383 is a good idea in principle, but I'm still struggling
to understand it myself, and it se
On Wed, 6 May 2009 at 13:40, Antoine Pitrou wrote:
Stephen J. Turnbull xemacs.org> writes:
Nothing is lost compared to 'strict', true, but under the PEP as it is
a large fraction of Shift JIS and Big5 filenames cannot be read under
ASCII-compatible file system encodings using 'utf8b'.
You sh
Stephen J. Turnbull xemacs.org> writes:
>
> Nothing is lost compared to 'strict', true, but under the PEP as it is
> a large fraction of Shift JIS and Big5 filenames cannot be read under
> ASCII-compatible file system encodings using 'utf8b'.
You should really be more specific. I'm not sure abou
"Martin v. Löwis" writes:
> > Yeah, yeah, this is the same old same old from PEP 3131. Anything
> > that handles the various attacks based on ASCII-alike characters
> > should at least rule out invalid Unicode, too!
> >
> > And where is this U+DC2F supposed to be coming from, anyway? The
Lino Mastrodomenico writes:
> It's a know problem with Shift-JIS and was fixed in UTF-8.
It was fixed in EUC before Shift-JIS was invented by Microsoft or Big5
was invented by the Taiwanese clone makers. Guido's not the only
language designer with a time machine
__
On Wed, May 6, 2009 at 09:31, "Martin v. Löwis" wrote:
> They *are* separate naemspaces; that's guaranteed by the implementation.
Yes. But utf8b *sounds like* an encoding. When it isn't. I sure
thought it was when it was first mentioned. I agree that it would be
better to find another name.
'utf
2009/5/6 Antoine Pitrou :
> By the way, what are the ASCII characters that are not suppported by
> Shift-JIS?
> Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
> backslash and the tilde).
The biggest problem with Shift-JIS is that a perfectly valid unicode
character ab
MRAB mrabarnett.plus.com> writes:
>
> Judging by the existing names, I think that 'surrogate' would be
> reasonable. It already contains the meaning of substitute,
Only if you are a native English-speaker I suppose... For me it's just a
technical term denoting a certain class of unicode code poi
M.-A. Lemburg wrote:
Martin v. Löwis wrote:
The name "utf8b" suggested in the PEP is not in line with the codec
design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery, so even if
thi
Martin v. Löwis wrote:
>> The name "utf8b" suggested in the PEP is not in line with the codec
>> design
>
> Where is that design documented, and how exactly violates the name
> the design (chapter and verse, please).
Martin, I designed the whole Python codec machinery, so even if
this is not expl
"Martin v. Löwis" writes:
> I fail to see how this could ever matter. If, by "media", you mean
> things like removable disks, and the file name encoding used on them,
> it's fairly irrelevant for the PEP, since Python won't start using
> Shift JIS as its file system encoding just because that'
Martin v. Löwis v.loewis.de> writes:
>
> > I don't personally care (I already was aware of UTF-8B), but there are
> > plenty of others who do.
>
> I think it is a fairly bad name, because it is easy to confuse it with
> the "surrogates" error handler (unless you suggest to rename that also).
I
Hello,
I need some help on http://bugs.python.org/issue5941
The bug is quite simple: the Distutils unixcompiler used to set the
archiver command to "ar -rc".
For quite a while now, this behavior has changed in order to be able
to customize the compiler behavior from
the environment. That introdu
> Yeah, yeah, this is the same old same old from PEP 3131. Anything
> that handles the various attacks based on ASCII-alike characters
> should at least rule out invalid Unicode, too!
>
> And where is this U+DC2F supposed to be coming from, anyway? The
> user's *local* environment or the user's
> > > Second, I suggest "surrogate-replace" as the name of the error handler
> > > rather than "utf8b".
> >
> > I think this is bike-shedding.
>
> I don't personally care (I already was aware of UTF-8B), but there are
> plenty of others who do.
I think it is a fairly bad name, because it is
Stephen J. Turnbull wrote:
> "Martin v. Löwis" writes:
> > > It occurs to me that the PEP maybe should say that it is an error
> > > to have your POSIX locale set to UTF-16 or something like that.
> >
> > No. It is *impossible* to have UTF-16 as the locale character set,
> > not an er
> The name "utf8b" suggested in the PEP is not in line with the codec
> design
Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).
> Error handlers and codecs are two different things, so the namespaces
> need to be clearly separate.
They *a
56 matches
Mail list logo