Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Thomas Breuel
> > Not for me (I am using Python 2.6.2). > > >>> f = open(chr(255), 'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' > >>> You can get the same error on Linux: $ python Python 2.6.2 (release26-maint, Apr 19 2009, 01:5

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > What's an analogous failure? Or, rather, why would a failure analogous > to the one I got when using System.IO.DirectoryInfo ever exist in > Python? Mono.Unix uses an encoder and a decoder that knows about special quoting rules. System.IO uses a different encoder and decoder because it's a r

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > > access, and open all files on your filesystem, regardless of encoding." > > I think this is misleading. With Mono 2.0.1, I get This has nothing to do with how Mono quotes. The reason for this is that Mono quotes

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > And then it goes on to say: "You won't be able to pass non-Unicode > filenames as command-line arguments."(*) Not only that, but you can't > reliably use such files with System.IO (whatever that is, but it > sounds pretty basic). This support is only available "within the > Mono.Unix and Mono

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > Java is not capable of doing that. Mono, as I keep pointing out, is. It > uses NULLs to escape invalid UNIX filenames. Please see: > > http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > access, and

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
On Thu, Apr 30, 2009 at 12:32, "Martin v. Löwis" wrote: > > OK, so what's wrong with os.listdir() and similar functions returning a > > unicode string for strings that correctly encode/decode, and with byte > > strings for strings that are not valid unicode? > > See http://bugs.python.org/issue31

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > > Since both have had to deal with this, have you looked at what they > > actually do before proposing PEP 383? What did you find? > > See > > http://mail.python.org/pipermail/python-3000/2007-September/010450.html > Thanks, that's very useful. > > Why did you choose an incompatible approac

Re: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
On Thu, Apr 30, 2009 at 10:21, "Martin v. Löwis" wrote: > Thomas Breuel wrote: > > Given the stated rationale of PEP 383, I was wondering what Windows > > actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file > > names on a device, plugged the

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
> > Yes. Now think about the implications. This means that adopting PEP > > 383 will make IronPython and Jython running on UNIX intrinsically > > incompatible with CPython running on UNIX, and there's no way to fix > that. > > *Not* adapting the PEP will also make CPython and IronPython > incompa

[Python-Dev] what Windows and Linux really do Re: PEP 383 (again)

2009-04-30 Thread Thomas Breuel
Given the stated rationale of PEP 383, I was wondering what Windows actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device, plugged them into my Windows Vista machine, and fired up Python 3.0. First, os.listdir("f:") returns a list of strings for those file name

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Thomas Breuel
On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher wrote: > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like file names and environmental variables > operate on UTF-16 strings --

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Thomas Breuel
On Wed, Apr 29, 2009 at 23:03, Terry Reedy wrote: > Thomas Breuel wrote: > >> >>Sure. However, that requires you to provide meaningful, reproducible >>counter-examples, rather than a stenographic formulation that might >>hint some problem you apparentl

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Thomas Breuel
> > The whole purpose of PEP 383 is to send the exact same bytes that were > read from the OS back to the OS => violating (2) (for whatever the > apparent system file-encoding is, not limited to UTF-8), It's fine to read a file name from a file system and write the same file back as the same raw

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Thomas Breuel
> Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). Well, here's another one: PEP 383 would disallow UTF-8 encodings of half surro

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Wed, Apr 29, 2009 at 07:45, "Martin v. Löwis" wrote: > Your claim was > that PEP 383 may have unfortunate effects on Windows, No, I simply think that PEP 383 is not sufficiently specified to be able to tell. > and I'm telling > you that it won't, because the behavior of Python on Windows w

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > It cannot crash Python; it can only crash > hypothetical third-party programs or libraries with deficient error > checking and > unreasonable assumptions about input data. The error checking isn't necessarily deficient. For example, a safe and legitimate thing to do is for third party librar

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > On Windows, the Wide APIs are already used throughout the code base, > e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the > specific API for a specific functionality, please read the source code. > [...] > No, I don't assume that. I assume that all functions are strictly > ava

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Tue, Apr 28, 2009 at 20:45, "Martin v. Löwis" wrote: > > Furthermore, I don't believe that PEP 383 works consistently on Windows, > > What makes you say that? PEP 383 will have no effect on Windows, > compared to the status quo, whatsoever. > That's what you believe, but it's not clear to me

[Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-28 Thread Thomas Breuel
I think we should break up this problem into several parts: (1) Should the default UTF-8 decoder fail if it gets an illegal byte sequence. It's probably OK for the default decoder to be lenient in some way (see below). (2) Should the default UTF-8 encoder for file system operations be allowed to

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > However, it is "mission creep": Martin didn't volunteer to > write a PEP for it, he volunteered to write a PEP to solve the > "roundtrip the value of os.listdir()" problem. And he succeeded, up > to some minor details. Yes, it solves that problem. But that doesn't come without cost. Most i

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Thomas Breuel
> > Yep, that's the problem. Lots of theoretical problems noone has ever > encountered > brought up against a PEP which resolves some actual problems people > encounter on > a regular basis. How can you bring up practical problems against something that hasn't been implemented? The fact that no

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote: > On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote: > > Returning an error for an incorrect encoding doesn't make > > internationalization harder, it makes it easier because it makes > debugging >

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > >Until it's hard there will be no internationalization. A fact of life, > damn it. Programmers are lazy, and have many problems to solve. PEP 383 doesn't make it any easier; it just turns one set of problems into another. Actually, it makes it worse, since any problems that show up now s

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > Therefore, when Python encounters path names on a file system > > that are not consistent with the (assumed) encoding for that file > > system, Python should raise an error. > > This is what happens currently, and users are quite unhappy about it. We need to keep "users" and "programmers" dis

[Python-Dev] PEP 383 (again)

2009-04-27 Thread Thomas Breuel
I thought PEP-383 was a fairly neat approach, but after thinking about it, I now think that it is wrong. PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in a reversible way. But how do those non-UTF-8 byte sequences get into those path names in the first place? Most lik