Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
Thomas Breuel wrote: > On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher > wrote: > > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like f

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
Jeroen Ruigrok van der Werven wrote: > -On [20090430 07:18], "Martin v. Löwis" (mar...@v.loewis.de) wrote: >> Suppose I create a new directory, and run the following script >> in 3.x: >> >> py> open("x","w").close() >> py> open(b"\xff","w").close() >> py> os.listdir(".") >> ['x'] > > That is actua

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Thomas Breuel
On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher wrote: > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like file names and environmental variables > operate on UTF-16 strings --

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Thomas Breuel
On Wed, Apr 29, 2009 at 23:03, Terry Reedy wrote: > Thomas Breuel wrote: > >> >>Sure. However, that requires you to provide meaningful, reproducible >>counter-examples, rather than a stenographic formulation that might >>hint some problem you apparently see (which I believe is just no

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Jeroen Ruigrok van der Werven
-On [20090430 07:18], "Martin v. Löwis" (mar...@v.loewis.de) wrote: >Suppose I create a new directory, and run the following script >in 3.x: > >py> open("x","w").close() >py> open(b"\xff","w").close() >py> os.listdir(".") >['x'] That is actually a regression in 3.x: Python 2.6.1 (r261:67515, Mar

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> Thanks for clarifying the Windows behavior, here. A little more > clarification in the PEP could have avoided lots of discussion. It > would seem that a PEP, proposed to modify a poorly documented (and > therefore likely poorly understood) area, should be educational about > the status quo, as

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> Perhaps not a full description of the status quo, but the PEP definitely > needs a good summary I completely agree, and believe that the PEP *does* have a good summary - it has both an abstract, and a rationale, and both say exactly what I want them to say. If people want them to say different t

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
Curt Hagenlocher wrote: > On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel wrote: >> Also, what are Jython and IronPython supposed to do on UNIX? Can they >> implement these semantics at all? > > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> I don't understand the proposal and issues. I see a lot of people > claiming that they do, and then spending all their time either > talking past each other, or disagreeing. If everyone who claims they > understand the issues actually does, why is it so hard to reach a > consensus? Because t

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> How do get a printable unicode version of these path strings if they > contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. > I'm guessing that an app has to understand that filenames come in tw

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> But I shouldn't have to guess. The PEP should explain how these things > are useful. The discussion section could be extended with use cases for > both the encode and decode cases. See PEP 293. Regards, Martin ___ Python-Dev mailing list Python-Dev

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Terry Reedy
Glenn Linderman wrote: On approximately 4/29/2009 1:28 PM, came the following characters from So where is the ambiguity here? None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentatio

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Steven D'Aprano
On Thu, 30 Apr 2009 01:16:20 pm Thomas Breuel wrote: > And that's why I think this proposal should be shelved for a while > until people have had more time to try to understand the issues and > also come up with alternative proposals.  Once this is adopted and > implemented in C-Python, Python is s

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Curt Hagenlocher
On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel wrote: > > Also, what are Jython and IronPython supposed to do on UNIX?  Can they > implement these semantics at all? IronPython will inherit whatever behavior Mono has implemented. The Microsoft CLR defines the native string type as UTF-16 and all o

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Thomas Breuel
> > The whole purpose of PEP 383 is to send the exact same bytes that were > read from the OS back to the OS => violating (2) (for whatever the > apparent system file-encoding is, not limited to UTF-8), It's fine to read a file name from a file system and write the same file back as the same raw

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Aahz
On Wed, Apr 29, 2009, "Martin v. L?wis" wrote: > > I'm at a loss how to make the text more clear than it already is. I'm > really not good at writing long essays, with a lot of > explanatory-but-non-normative text. I also think that explanations do > not belong in the section titled specification,

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Aahz
On Thu, Apr 30, 2009, Cameron Simpson wrote: > > The lengthy discussion mostly revolves around: > > - Glenn points out that strings that came _not_ from listdir, and that are > _not_ well-formed unicode (== "have bare surrogates in them") but that > were intended for use as filenames wil

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Cameron Simpson
On 29Apr2009 23:41, Barry Scott wrote: > On 22 Apr 2009, at 07:50, Martin v. Löwis wrote: >> If the locale's encoding is UTF-8, the file system encoding is set to >> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >> (which must be >= 0x80) into half surrogate codes U+DC80..U

Re: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-04-29 Thread Eric Smith
Michael Foord wrote: Larry Hastings wrote: I've written a patch for Python 3.1 that changes os.path so it handles UNC paths on Windows: http://bugs.python.org/issue5799 +1 for the feature. I have to deal with Windows networks from time to time and this would be useful. +1 from me, too

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Barry Scott
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote: If the locale's encoding is UTF-8, the file system encoding is set to a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. Forgive me if this has been covered.

Re: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-04-29 Thread Michael Foord
Larry Hastings wrote: I've written a patch for Python 3.1 that changes os.path so it handles UNC paths on Windows: http://bugs.python.org/issue5799 +1 for the feature. I have to deal with Windows networks from time to time and this would be useful. Michael In a Windows path string,

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Cameron Simpson
On 29Apr2009 22:14, Stephen J. Turnbull wrote: | Baptiste Carvello writes: | > By contrast, if the new utf-8b codec would *supercede* the old one, | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where | > surrogates are unused). Thus ambiguity could be avoided. | | Unfortunat

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 1:06 PM, came the following characters from the keyboard of Martin v. Löwis: > Thanks, fixed. Thanks for your fixes. They are helpful. I'm at a loss how to make the text more clear than it already is. I'm really not good at writing long essays, with a lot of expl

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Cameron Simpson
On 29Apr2009 17:03, Terry Reedy wrote: > Thomas Breuel wrote: >> Sure. However, that requires you to provide meaningful, reproducible >> counter-examples, rather than a stenographic formulation that might >> hint some problem you apparently see (which I believe is just not >> there

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> The whole purpose of PEP 383 is to send the exact same bytes that were > read from the OS back to the OS => violating (2) (for whatever the > apparent system file-encoding is, not limited to UTF-8), and that has > overwhelmingly popular support. > > Note that this won't happen automatically, eit

Re: [Python-Dev] string to float containing whitespace

2009-04-29 Thread Martin v. Löwis
s...@pobox.com wrote: > Someone please tell me I'm not going mad. I could have sworn that once upon > a time attempting to convert numeric strings to ints or floats if they > contained whitespace raised an exception. As far back as 1.5.2 it appears > that float(), string.atof() and string.atoi()

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 1:28 PM, came the following characters from the keyboard of Martin v. Löwis: C. File on disk with the invalid surrogate code, accessed via the str interface, no decoding happens, matches in memory the file on disk with the byte that translates to the same surrogate, acc

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Terry Reedy
Thomas Breuel wrote: Sure. However, that requires you to provide meaningful, reproducible counter-examples, rather than a stenographic formulation that might hint some problem you apparently see (which I believe is just not there). Well, here's another one: PEP 383 would disall

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Terry Reedy
Glenn Linderman wrote: On approximately 4/29/2009 4:36 AM, came the following characters from the keyboard of Cameron Simpson: On 29Apr2009 02:56, Glenn Linderman wrote: os.listdir(b"") I find that on my Windows system, with all ASCII path file names, that I get quite different results wh

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> So while out of scope of the PEP, I don't think it's at all > artificial. Sure - but I see this as the same case as "the file got renamed". If you have a LRU list in your app, and a file gets renamed, then the LRU list breaks (unless you also store the inode number in the LRU list, and lookup th

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
>>> C. File on disk with the invalid surrogate code, accessed via the >>> str interface, no decoding happens, matches in memory the file on disk >>> with the byte that translates to the same surrogate, accessed via the >>> bytes interface. Ambiguity. >> What does that mean? What sp

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). > > > Well, here's another one: PEP 383 would disallow UTF-8 e

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> In the first paragraph, you should make it clear that Python 3.0 does > not use the Windows bytes interfaces, if it doesn't. "Python uses > *only* the wide character APIs..." would suffice. That's not quite exact. It uses both ANSI and Wide APIs - depending on whether you pass bytes as input or

[Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-04-29 Thread Larry Hastings
I've written a patch for Python 3.1 that changes os.path so it handles UNC paths on Windows: http://bugs.python.org/issue5799 In a Windows path string, a UNC path functions *exactly* like a drive letter. This patch means that the Python path split/join functions treats them as if they were

Re: [Python-Dev] Installing Python 2.5.4 from Source under Windows

2009-04-29 Thread Paul Franz
Ok. I will ask on the python-list. Paul Franz Aahz wrote: On Wed, Apr 29, 2009, Paul Franz wrote: I have looked and looked and looked. But I can not find any directions on how to install the version of Python build using Microsoft's compiler. It builds. I get the dlls and the exe's. But

Re: [Python-Dev] Installing Python 2.5.4 from Source under Windows

2009-04-29 Thread Aahz
On Wed, Apr 29, 2009, Paul Franz wrote: > > I have looked and looked and looked. But I can not find any directions > on how to install the version of Python build using Microsoft's > compiler. It builds. I get the dlls and the exe's. But there is no > documentation that says how to install wh

[Python-Dev] Installing Python 2.5.4 from Source under Windows

2009-04-29 Thread Paul Franz
I have looked and looked and looked. But I can not find any directions on how to install the version of Python build using Microsoft's compiler. It builds. I get the dlls and the exe's. But there is no documentation that says how to install what has been built. I have read every readme and stop

Re: [Python-Dev] string to float containing whitespace

2009-04-29 Thread skip
Amaury> You are maybe referring to the Decimal constructor: Amaury>decimal.Decimal(" 123") Amaury> fails with 2.5, but works with 2.6. (issue 1780) Highly unlikely, since my recollection is from way back in the early days. Also, I have yet to actually use the decimal module. :-/

Re: [Python-Dev] string to float containing whitespace

2009-04-29 Thread Amaury Forgeot d'Arc
Hi, 2009/4/29 : > Someone please tell me I'm not going mad.  I could have sworn that once upon > a time attempting to convert numeric strings to ints or floats if they > contained whitespace raised an exception.  As far back as 1.5.2 it appears > that float(), string.atof() and string.atoi() allo

[Python-Dev] string to float containing whitespace

2009-04-29 Thread skip
Someone please tell me I'm not going mad. I could have sworn that once upon a time attempting to convert numeric strings to ints or floats if they contained whitespace raised an exception. As far back as 1.5.2 it appears that float(), string.atof() and string.atoi() allow whitespace. Maybe I'm t

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Stephen J. Turnbull
"Martin v. Löwis" writes: > I find the case pretty artificial, though: if the locale encoding > changes, all file names will look incorrect to the user, so he'll > quickly switch back, or rename all the files. It's not necessarily the case that the locale encoding changes, but rather the name

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Stephen J. Turnbull
Baptiste Carvello writes: > By contrast, if the new utf-8b codec would *supercede* the old one, > \udcxx would always mean raw bytes (at least on UCS-4 builds, where > surrogates are unused). Thus ambiguity could be avoided. Unfortunately, that's false. It could have come from a literal strin

[Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Stephen J. Turnbull
Thomas Breuel writes: > PEP 383 violated (2), and I think that's a bad thing. The whole purpose of PEP 383 is to send the exact same bytes that were read from the OS back to the OS => violating (2) (for whatever the apparent system file-encoding is, not limited to UTF-8), and that has overwhelmi

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 4:36 AM, came the following characters from the keyboard of Cameron Simpson: On 29Apr2009 02:56, Glenn Linderman wrote: os.listdir(b"") I find that on my Windows system, with all ASCII path file names, that I get quite different results when I pass os.listdir an

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 4:07 AM, came the following characters from the keyboard of R. David Murray: On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote: On approximately 4/28/2009 7:40 PM, came the following characters from the keyboard of R. David Murray: On Tue, 28 Apr 2009 at 13:37, Gle

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Cameron Simpson
On 29Apr2009 02:56, Glenn Linderman wrote: > os.listdir(b"") > > I find that on my Windows system, with all ASCII path file names, that I > get quite different results when I pass os.listdir an empty str vs an > empty bytes. > > Rather than keep you guessing, I get the root directory contents

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread R. David Murray
On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote: On approximately 4/28/2009 7:40 PM, came the following characters from the keyboard of R. David Murray: On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote: > C. File on disk with the invalid surrogate code, accessed via the str > interfac

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 12:29 AM, came the following characters from the keyboard of Martin v. Löwis: C. File on disk with the invalid surrogate code, accessed via the str interface, no decoding happens, matches in memory the file on disk with the byte that translates to the same surrogate, ac

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 12:38 AM, came the following characters from the keyboard of Baptiste Carvello: Glenn Linderman a écrit : 3. When an undecodable byte 0xPQ is found, decode to the escape codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. The problem with this

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Antoine Pitrou
Thomas Breuel gmail.com> writes: > > The error checking isn't necessarily deficient.  For example, a safe and legitimate thing to do is for third party libraries to throw a C++ exception, raise a Python exception, or delete the half surrogate. Do you have any concrete examples of this behaviour?

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Thomas Breuel
> Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). Well, here's another one: PEP 383 would disallow UTF-8 encodings of half surro

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Baptiste Carvello
Glenn Linderman a écrit : If there is going to be a required transformation from de novo strings to funny-encoded strings, then why not make one that people can actually see and compare and decode from the displayable form, by using displayable characters instead of lone surrogates? The

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Baptiste Carvello
Lino Mastrodomenico a écrit : Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP). This is questionable. This would have the consequence that \udcxx in a python string would sometimes mean a surrogate,

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Hrvoje Niksic
Zooko O'Whielacronx wrote: If you switch to iso8859-15 only in the presence of undecodable UTF-8, then you have the same round-trip problem as the PEP: both b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a way to unambiguously recover the original file name. Why do you say

Re: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in System Character?Interfaces

2009-04-29 Thread Cameron Simpson
On 29Apr2009 08:27, Martin v. L?wis wrote: | > I would like utility functions to perform: | > os-bytes->funny-encoded | > funny-encoded->os-bytes | > or explicit example code snippets for same in the PEP text. | | Done! Thanks! -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 12:17 AM, came the following characters from the keyboard of Martin v. Löwis: OK, so you are saying that under PEP 383, utf-8b wouldn't be used anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three p

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Baptiste Carvello
Glenn Linderman a écrit : 3. When an undecodable byte 0xPQ is found, decode to the escape codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. The problem with this strategy is: paths are often sliced, so your 2 codepoints could get separated. The good thing with the PEP'

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> C. File on disk with the invalid surrogate code, accessed via the str > interface, no decoding happens, matches in memory the file on disk > with > the byte that translates to the same surrogate, accessed via the bytes > interface. Ambiguity. Is that an alternative to A

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> OK, so you are saying that under PEP 383, utf-8b wouldn't be used > anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three paragraphs of the "Specification" section make that clear. Regards, Martin _