Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Adam Olsen
On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: > Victor Stinner schrieb: >> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit : >>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem >>> encoding (if it were UTF-8 otherwise), despite

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Georg Brandl
Victor Stinner schrieb: > Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit : >> If I had to choose, I'd still argue for the modified UTF-8 as filesystem >> encoding (if it were UTF-8 otherwise), despite possible surprises when a >> such-encoded filename escapes from Python. > > I

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Martin v. Löwis
> Change the default file system encoding to store bytes in Unicode is like > introducing a new Python type: . Exactly. Seems like the best solution to me, despite your polemics. Regards, Martin ___ Python-3000 mailing list Python-3000@python.org http

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-29 Thread Brett Cannon
On Mon, Sep 29, 2008 at 5:47 PM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Hi, > > See attached patch: python3_bytes_filename.patch > Patches should go on the tracker, not the mailing list. Otherwise it will just get lost. -Brett ___ Python-3000 maili

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Stephen J. Turnbull
James Y Knight writes: > On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote: > > UTF-8b doesn't work as intended. It produces an invalid unicode > > object (garbage surrogates) that cannot be used with external APIs or > > libraries that require unicode. > > I'd be interested to hear more detai

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Stephen J. Turnbull
Guido van Rossum writes: > On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner > <[EMAIL PROTECTED]> wrote: > > It would be hard for a newbie programmer to understand why he's > > unable to find his very important file ("important r?port.doc") > > using os.listdir(). > *Every* failure in this s

[Python-3000] Patch for an initial support of bytes filename in Python3

2008-09-29 Thread Victor Stinner
Hi, See attached patch: python3_bytes_filename.patch Using the patch, you will get: - open() support bytes - listdir(unicode) -> only unicode, *skip* invalid filenames (as asked by Guido) - remove os.getcwdu() - create os.getcwdb() -> bytes - glob.glob() support bytes - fnmatch.filter()

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Victor Stinner
Le Tuesday 30 September 2008 01:31:45 Adam Olsen, vous avez écrit : > The alternative is not be valid unicode, but since we can't use such > objects with external libs, can't even print them, we might as well > call them something else. We already have a name for that: bytes. :-) __

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : >> >> - listdir(unicode) -> unicode and raise an error on invalid filename >> >> I know I keep flipflopping on this one, but the more I think about

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Martin v. Löwis
> import os > import os.path > import sys > if os.path.supports_unicode_filenames: > cwd = getcwd() > else: > cwd = getcwdb() > encoding = sys.getfilesystemencoding() > for filename in os.listdir(cwd): > if os.path.supports_unicode_filenames: > text = str(fil

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit : > If I had to choose, I'd still argue for the modified UTF-8 as filesystem > encoding (if it were UTF-8 otherwise), despite possible surprises when a > such-encoded filename escapes from Python. If I understand correctly this sol

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 5:33 PM, James Y Knight <[EMAIL PROTECTED]> wrote: > On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote: >> >> An ugly hack, but more correct than UTF-8b or any similar attempt to >> do "unicode but not quite unicode"; either it's lossy, or it's not >> unicode. There's no in bet

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : >> >> - listdir(unicode) -> unicode and raise an error on invalid filename >> >> I know I keep flipflopping on this one, but the more I think about

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 5:31 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >>> ISTM that 8859-1 is all about decoding, so I don't understand why >>> you say it is a way not to decode. >> >> 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an >> API that always returns unicode bu

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread James Y Knight
On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote: An ugly hack, but more correct than UTF-8b or any similar attempt to do "unicode but not quite unicode"; either it's lossy, or it's not unicode. There's no in between. Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very poo

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Martin v. Löwis
>> ISTM that 8859-1 is all about decoding, so I don't understand why >> you say it is a way not to decode. > > 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an > API that always returns unicode but accepts an encoding you can use > it, then reencode using 8859-1 to get back the

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : > >> - listdir(unicode) -> unicode and raise an error on invalid filename > > I know I keep flipflopping on this one, but the more I think about it > the more I believe it is better to drop those names than to raise an > exce

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 5:14 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Adam Olsen wrote: >> There's no solution except to not >> decode, and 8859-1 is the way to do that. > > I think you need to elaborate that. What does ISO-8859-1 has to do > with a Python datatype in this context: which

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Martin v. Löwis
Adam Olsen wrote: > There's no solution except to not > decode, and 8859-1 is the way to do that. I think you need to elaborate that. What does ISO-8859-1 has to do with a Python datatype in this context: which datatype, and what algorithm on it are you specifically referring to? When I do (in 2.

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Martin v. Löwis
> The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). I'm still opposed to allowing bytes as file names at all in 3k. Python should really strive

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread James Y Knight
On Sep 29, 2008, at 6:17 PM, Adam Olsen wrote: I suspect linux will eventually take this route as well. If ext3 had an option for UTF-8 validation I know I'd want it on. That'd move the error to the program creating bogus file names, rather than those trying to read, display, and manage them.

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 11:06 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: > >> This approach (changing all path-handling functions to accept either bytes >> or string, but not both) is doomed in my eyes. First, there are

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 10:00 AM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit : >> > - getcwd() -> unicode >> > - getcwd(bytes=True) -> bytes >> >> Please let's not introduce boolean flags like this. How about >> ``getcwdb`` in

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Adam Olsen
On Mon, Sep 29, 2008 at 5:12 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Adam Olsen gmail.com> writes: >> >> UTF-8b doesn't work as intended. It produces an invalid unicode >> object (garbage surrogates) that cannot be used with external APIs or >> libraries that require unicode. > > At least

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Guido van Rossum
> Victor Stinner schrieb: (Thanks Victor for moving this to the list. Having a discussion in the tracker is really painful, I find.) >> POSIX OS >> >> >> The default behaviour should be to use unicode and raise an error if >> conversion to unicode fails. It should also be possible to use

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Georg Brandl
Victor Stinner schrieb: > POSIX OS > > > The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). > > - listdir(unicode) -> unicode and rais

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread James Y Knight
On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote: On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]> wrote: [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be successfully round-tripped. It's not currently implemented in python core, but it's a pretty

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit : > > - getcwd() -> unicode > > - getcwd(bytes=True) -> bytes > > Please let's not introduce boolean flags like this. How about > ``getcwdb`` in parallel with the old ``getcwdu``? Yeah, you're right. So i wrote a new patch: os_

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Steven Bethard
On Mon, Sep 29, 2008 at 6:07 AM, Victor Stinner <[EMAIL PROTECTED]> wrote: > The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). > > - listdir(unicod

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Patches are already avaible in the issue #3187 (os.listdir): Le Monday 29 September 2008 14:07:55 Victor Stinner, vous avez écrit : > - listdir(unicode) -> unicode and raise an error on invalid filename Need raise_decoding_errors.patch (don't clear Unicode error > - listdir(bytes) -> bytes Al

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Victor Stinner
Le Monday 29 September 2008 06:43:55, vous avez écrit : > It will make users happy, and it's simple enough to implement for > python 3.0. I dislike your argument. A "quick and dirty hack" is always faster to implement than a real solution, but we may hits later new issues if we don't choose the

[Python-3000] New proposition for Python3 bytes filename issue

2008-09-29 Thread Victor Stinner
Hi, After reading the previous discussion, here is new proposition. Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX (eg. Linux or *BSD) is affected. Some system are broken, but Python have to be able to open/copy/move/remove files with an "invalid filename". The i

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Antoine Pitrou
Adam Olsen gmail.com> writes: > > UTF-8b doesn't work as intended. It produces an invalid unicode > object (garbage surrogates) that cannot be used with external APIs or > libraries that require unicode. At least it works with all Python operations supported by the unicode type (methods, concat

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-29 Thread Adam Olsen
On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]> wrote: > [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be > successfully round-tripped. It's not currently implemented in python core, > but it's a pretty trivial encoding, and is available under the BS