Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
"Martin v. Löwis" writes: > Done: the Python-Version header already clarifies that point. Ah, OK. I wish my day job required reading more PEPs so I'd be more familiar with these formalities. :-) > > Second, I suggest "surrogate-replace" as the name of the error handler > > rather than "utf8b

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
Lino Mastrodomenico writes: > 2009/5/5 Stephen J. Turnbull : > > Third, it is not clear to me why non-decodable ASCII should be an > > error. > > The PEP originally allowed the conversion to U+DCxx of bytes below 128 > that cannot be decoded by the encoding used, but this creates > potentia

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
"Martin v. Löwis" writes: > > It occurs to me that the PEP maybe should say that it is an error > > to have your POSIX locale set to UTF-16 or something like that. > > No. It is *impossible* to have UTF-16 as the locale character set, > not an error. Your statement is like saying "it

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: >> I have three substantive comments. First, although consequences for >> Python 3 byte interfaces (ie, "none") are explicitly stated, as far as >> I can see this PEP could apply to Python 2 as well. I don't think >> it's intended that way. Either way, I think you should c

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
> It occurs to me that the PEP maybe should say that it is an error > to have your POSIX locale set to UTF-16 or something like that. No. It is *impossible* to have UTF-16 as the locale character set, not an error. Your statement is like saying "it is an error to breathe in the vacuum". I

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
> I have three substantive comments. First, although consequences for > Python 3 byte interfaces (ie, "none") are explicitly stated, as far as > I can see this PEP could apply to Python 2 as well. I don't think > it's intended that way. Either way, I think you should clarify that > point. Done:

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
> > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > > > it's an algorithm based on 16-bit or 32-bit code points. > > I don't understand this phrasing. The algorithm is only applicable to > ASCII-compatible octet streams. It results in code points by a simple > disp

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Lino Mastrodomenico
2009/5/5 Stephen J. Turnbull : > Third, it is not clear to me why non-decodable ASCII should be an > error. The PEP originally allowed the conversion to U+DCxx of bytes below 128 that cannot be decoded by the encoding used, but this creates potential security problems. See:

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
MRAB writes: > [snip] > It might be slightly OT, but sometimes strict UTF-8 encoding is violated > by encoding U+ using 2 bytes (0xC0 0x80) so that 0x00 can be used as > a terminator. I think I read that Microsoft sometimes does this. Nice hack! as long as you don't let it escape. But if

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread MRAB
Stephen J. Turnbull wrote: MRAB writes: > > I don't think "people shouldn't be using non-ASCII-compatible > > encodings for locale encodings" is a sufficient rationale for a hard > > error here. I mean, of course they *should* be using UTF-8. Maybe > > Python 3.1 should just go ahead and e

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
MRAB writes: > > I don't think "people shouldn't be using non-ASCII-compatible > > encodings for locale encodings" is a sufficient rationale for a hard > > error here. I mean, of course they *should* be using UTF-8. Maybe > > Python 3.1 should just go ahead and error on any other encoding on

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
Zooko O'Whielacronx writes: > How would an application make sure that they were producing only > valid unicode? That's very difficult. There are a couple of sources that I can think of, in Python: C modules, chr(), \u literals, and now codecs with the 'utf8b'. There may be others. You'd need

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread MRAB
Stephen J. Turnbull wrote: "Martin v. Löwis" writes: > I've updated the PEP accordingly. I have three substantive comments. First, although consequences for Python 3 byte interfaces (ie, "none") are explicitly stated, as far as I can see this PEP could apply to Python 2 as well. I don't thin

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Zooko O'Whielacronx
On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull wrote: > > 2.  The specification should state, and the discussion emphasize, that >    strings which were produced by surrogate replacement *must not* be >    used in data interchange with systems that do not specifically >    accept such strings

[Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
"Martin v. Löwis" writes: > I've updated the PEP accordingly. I have three substantive comments. First, although consequences for Python 3 byte interfaces (ie, "none") are explicitly stated, as far as I can see this PEP could apply to Python 2 as well. I don't think it's intended that way. Ei

[Python-Dev] [Fwd: [Python-checkins] r72331 - python/branches/py3k/Modules/posixmodule.c]

2009-05-05 Thread Eric Smith
Modules/posixmodule.c now compiles for me, but I get a Bus Error in test_lchflags when running test_posixmodule on Mac OS X 10.5. I'll open a release blocker bug on this. Original Message Subject: [Python-checkins] r72331 - python/branches/py3k/Modules/posixmodule.c Date: Tu

Re: [Python-Dev] using help function in Py3k

2009-05-05 Thread Daniel Stutzbach
On Tue, May 5, 2009 at 5:41 AM, s|s wrote: > LookupError: unknown encoding: uft-8 > uft-8? Looks like a variation of Issue 4540 (or a duplicate? I can't tell) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Stephen J. Turnbull
M.-A. Lemburg writes: > On 2009-05-03 19:39, Martin v. Löwis wrote: > >> If the error handler is supposed to be used for codecs other than utf-8, > >> perhaps it should renamed something more generic, e.g. "surrogate-escape"? > > > > Perhaps. However, utf-8b doesn't really have to do anything

Re: [Python-Dev] using help function in Py3k

2009-05-05 Thread Aahz
On Tue, May 05, 2009, s|s wrote: > > I Ran Python 3.0 for the first time. I used help() function and wrote > "modules hash". It issues an error. Please file a report on bugs.python.org -- Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize cor

[Python-Dev] using help function in Py3k

2009-05-05 Thread s|s
Hello, I Ran Python 3.0 for the first time. I used help() function and wrote "modules hash". It issues an error. Traceback (most recent call last): File "", line 1, in File "/home/ss/eproj/xapian/INST//lib/python3.0/site.py", line 427, in __call__ return pydoc.help(*args, **kwds) File

Re: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-05-05 Thread Eric Smith
Mark Hammond wrote: Is that enough consensus for it to go in? If so, are there any core developers who could help me get it in before the 3.1 feature freeze? The patch should be in good shape; it has unit tests and updated documentation. I've taken the liberty of explicitly CCing Martin jus

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Terry Reedy
M.-A. Lemburg wrote: On 2009-05-03 19:39, Martin v. Löwis wrote: If the error handler is supposed to be used for codecs other than utf-8, perhaps it should renamed something more generic, e.g. "surrogate-escape"? Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - it's an a

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread M.-A. Lemburg
On 2009-05-03 19:39, Martin v. Löwis wrote: >> If the error handler is supposed to be used for codecs other than utf-8, >> perhaps it should renamed something more generic, e.g. "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm bas

Re: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef

2009-05-05 Thread Larry Hastings
Mark Dickinson wrote: This doesn't sound right. The functions in the third party code will get compiled with the wrong signature, so they can crash (or behave unexpectedly) when called by Python. Yes, of course the signature of the getters and setters changes. Please ignore me. :-) If t