Re: [Python-3000] Filename: unicode normalization

2008-09-30 Thread Martin v. Löwis
> Bad news: it looks like Linux doesn't normalize filenames. So if you used NFC > to create a file, you have to reuse NFC to open your file (and the same for > NFD). That's not news to me. Of course it does: Unix is completely agnostic of encodings in file APIs. On the implementation level, it's

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> However, Martin, I can promise you that I will _never_ ask for any > convenience functions related to bytes as a result of this decision. :-) Regards, Martin ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/p

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Terry Reedy
Martin v. Löwis wrote: Guido van Rossum wrote: However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin operations. My concern still is that it brings the bytes

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote: However, Martin, I can promise you that I will _never_ ask for any convenience functions related to bytes as a result of this decision. I want bytes to come back from filesystem APIs because I intend to have a wrapper layer which knows

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 8:06 PM, <[EMAIL PROTECTED]> wrote: > The proposal of using U+ seems like it would have been almost the same > from such a wrapper's perspective, except (A) people using the filesystem > APIs without the benefit of such a wrapper would have been even more > screwed, and

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Greg Ewing
James Y Knight wrote: Since from what I've tried, things seem to work, I'd really like to know what precisely does fail from the opponents of utf-8b. Seems like what will fail is taking one of these utf-8b decoded names and passing it to some external library that uses it as a filename withou

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 5:51 PM, Martin v. Löwis wrote: While I can sympathize with people having non-ASCII file names on their disks, I can't sympathize with this example. Normal users just don't put \x90 into their command lines, and those who do deserve the error message they get. That's just

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Victor Stinner
Le Wednesday 01 October 2008 00:28:22 Martin v. Löwis, vous avez écrit : > I don't think we will manage to release Python 3.0 this year if that > change is to be implemented. And then, I don't think the release manager > will agree to such a delay. The minimum change is to disallow bytes/str mix:

Re: [Python-3000] Filename: unicode normalization

2008-09-30 Thread Guido van Rossum
Martin answered a similar question from Jack Jansen in another thread. OSX doesn't normalize either. It's unlikely to confuse users in practice. On Tue, Sep 30, 2008 at 4:11 PM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Since it's hard to follow the filename thread on two mailing list, i'm > sta

[Python-3000] Filename: unicode normalization

2008-09-30 Thread Victor Stinner
Since it's hard to follow the filename thread on two mailing list, i'm starting a new thread only on python-3000 about unicode normalization of the filenames. Bad news: it looks like Linux doesn't normalize filenames. So if you used NFC to create a file, you have to reuse NFC to open your file

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 6:21 PM, Martin v. Löwis wrote: IOW, Java hasn't solved the problem in the last 10 years. Java is already really bad at being a small little language to write cooperating tools in. I'd never even attempt to write a little pipeline filter in Java -- I've already pretty mu

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 3:21 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >>> My concern still is that it brings the bytes type into the status of >>> another character string type, which is really bad, and will require >>> further modifications to Python for the lifetime of 3.x. >> >> I'd like

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> How does windows (and Python on windows) handle NFC versus NFD issues? That's left to the application. > Can I have two files called "ümlaut.txt", one in NFD and one NFC form? Yes, you can. It sounds confusing, but only in a theoretical way. You never have combining characters on Windows (at

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> Yes! If there is a byte-string access method for Windows, pretty please > make it decode from UTF-8 internally and call the Unicode version of the > Windows APIs. The non-unicode windows APIs are pretty much just broken > -- Ideally, Python should never be calling those. I don't think we will ma

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Antoine Pitrou
Le mardi 30 septembre 2008 à 23:33 +0200, "Martin v. Löwis" a écrit : > > By the way, doesn't all this controversy yearn for a PEP? > > There must be a solution for 3.0 (which *could* be "it's a bug, > don't use Python 3.0 on such broken systems"); we can't wait for > a PEP to resolve this issue f

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
>> My concern still is that it brings the bytes type into the status of >> another character string type, which is really bad, and will require >> further modifications to Python for the lifetime of 3.x. > > I'd like to understand why this is "really bad". I though it was by > design that the str

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:47 AM, <[EMAIL PROTECTED]> wrote: > > On 05:56 pm, [EMAIL PROTECTED] wrote: >> >> On Tue, Sep 30, 2008 at 10:59 AM, <[EMAIL PROTECTED]> wrote: >>> >>> On 02:32 pm, [EMAIL PROTECTED] wrote: > >>> In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the >>>

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Nick Coghlan
Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 2:31 PM, Nick Coghlan <[EMAIL PROTECTED]> wrote: >> I'm also starting to wonder if allowing mixed types might be the way to >> go for these interfaces - leaving the bytes objects in place if the >> Unicode decode operation fails. > > No, no, no

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 5:40 PM, Martin v. Löwis wrote: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no. If we offer a "lower-level" bytes filename

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Martin v. Löwis
> $ ./python -c "import sys; print(sys.argv)" "$(echo -e 'filename\x90\x90')" > Could not convert argument 3 to str > $ ./python -c "import os; print(os.environ['DUMMY'])" > Traceback (most recent call last): > File "", line 1, in > File "/home/ncoghlan/devel/py3k/Lib/os.py", line 389, in __ge

Re: [Python-3000] Request for documentation: PyModuleDef

2008-09-30 Thread Martin v. Löwis
Jan Althaus wrote: > Please correct me if I'm wrong, but it doesn't seem like there is a full > documentation of PyModuleDef's members available? That's most likely the case, yes. > While some of them are intuitive, others aren't. The usage of m_size in > particular isn't clear to me. See PEP 31

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> Oh, ok. I had assumed Windows just uses a fixed encoding without the problem > of misencoded filenames. It's the other way 'round: On Windows, Unicode file names are the natural choice, and byte strings have limitations. In a sense, Windows got it right - but then, they started later. Unix misse

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
>> On Windows, we might reject bytes filenames for all file operations: open(), >> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) > > Since I've seen no objections to this yet: please no. If we offer a > "lower-level" bytes filename API, it should work for all platforms. Unfo

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 2:31 PM, Nick Coghlan <[EMAIL PROTECTED]> wrote: > I'm also starting to wonder if allowing mixed types might be the way to > go for these interfaces - leaving the bytes objects in place if the > Unicode decode operation fails. No, no, no! -- --Guido van Rossum (home p

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:29 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: >> However >> the *proposed* behavior (returns bytes if the arg was bytes, and >> returns str when the arg was str) is IMO sane, and no different than >> the polymorphism found in len() or many b

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Marcin 'Qrczak' Kowalczyk
2008/9/30 Glenn Linderman <[EMAIL PROTECTED]>: > So the problem is that a Unicode file system interface can't deal with > non-UTF-8 byte streams as file names. > > So it seems there are four suggested approaches, all of which have aspects > that are inconvenient. Let's not forget what happens whe

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> By the way, doesn't all this controversy yearn for a PEP? There must be a solution for 3.0 (which *could* be "it's a bug, don't use Python 3.0 on such broken systems"); we can't wait for a PEP to resolve this issue for 3.0. Most likely, the solution for 3.0 arrives through BDFL pronouncement, i

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Nick Coghlan
James Y Knight wrote: > Those aren't good behaviors, and can't be solved simply by pretending > certain files don't exist. A couple of output comparisons for two of James's examples (system Python is 2.5.3, the Python : $ python -V Python 2.5.2 $ python -c "import sys; print sys.argv" "$(echo -e

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:12 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: > Terry Reedy wrote: >> >> Guido van Rossum wrote: > >>> I'm not sure either way. I've heard it claim that Windows filesystem >>> APIs use Unicode natively. Does Python 3.0 on Windows currently >>> support filenames expressed a

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 1:04 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: >> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> >> wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: >> >> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: >>> >>> Victor Stinner schrieb: On Windows, we might reject bytes filenames for all file operations: open

[Python-3000] Request for documentation: PyModuleDef

2008-09-30 Thread Jan Althaus
Please correct me if I'm wrong, but it doesn't seem like there is a full documentation of PyModuleDef's members available? While some of them are intuitive, others aren't. The usage of m_size in particular isn't clear to me. I understand this is the size of additional per-interpreter storage,

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> I'm not sure either way. I've heard it claim that Windows filesystem > APIs use Unicode natively. Does Python 3.0 on Windows currently > support filenames expressed as bytes? Yes, it does (at least, os.open, os.stat support them, builtin open doesn't). > Are they encoded first before > passing

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Guido van Rossum wrote: > However > the *proposed* behavior (returns bytes if the arg was bytes, and > returns str when the arg was str) is IMO sane, and no different than > the polymorphism found in len() or many builtin operations. My concern still is that it brings the bytes type into the statu

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
> I didn't get an answer to my question: what is the result characters) stored in unicode> + ? I guess that the result is > instead of raising an error > (invalid types). So again: why introducing a new type instead of reusing > existing Python types? I didn't mean to introduce a new data typ

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Terry Reedy
Terry Reedy wrote: Guido van Rossum wrote: I'm not sure either way. I've heard it claim that Windows filesystem APIs use Unicode natively. Does Python 3.0 on Windows currently support filenames expressed as bytes? Are they encoded first before passing to the Unicode APIs? Using what encoding?

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Antoine Pitrou
Martin v. Löwis v.loewis.de> writes: > > True. I try to outweigh the need for simplicity in the API against the > need to support all cases. So I see two solutions: > > a) (...) > > b) (...) By the way, doesn't all this controversy yearn for a PEP? __

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Martin v. Löwis
Guido van Rossum wrote: > On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >>> Change the default file system encoding to store bytes in Unicode is like >>> introducing a new Python type: . >> Exactly. Seems like the best solution to me, despite your polemics. > > Mar

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Marcin 'Qrczak' Kowalczyk
2008/9/30 Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]>: > I've experimentally implemented (not for Python) a different escaping > scheme with a similar goal as UTF-8b: undecodable bytes are prefixed > with U+ instead of being converted to unpaired surrogates, and > '\x00' decodes as U+ U+

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Terry Reedy
Guido van Rossum wrote: On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: Victor Stinner schrieb: On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objection

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Georg Brandl
Guido van Rossum schrieb: > On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: >> Victor Stinner schrieb: >>> On Windows, we might reject bytes filenames for all file operations: open(), >>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) >> >> Since I've s

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 12:57 PM, Guido van Rossum wrote: And again: if utf-8b isn't acceptable, because it does break things in some unknown-to-me way, I really can't imagine anything working but just going back to byte-string access as the only API. It's really not okay for the "obvious" API

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: > Victor Stinner schrieb: >> On Windows, we might reject bytes filenames for all file operations: open(), >> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) > > Since I've seen no objections to this yet: pl

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Georg Brandl
Victor Stinner schrieb: > Hi, > > After reading the previous discussion, here is new proposition. > > Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX > (eg. Linux or *BSD) is affected. > > Some system are broken, but Python have to be able to open/copy/move/remove

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > On 02:32 pm, [EMAIL PROTECTED] wrote: >> If 2.6 weren't pretty much released already I'd ask to add >> os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer >> that converts os.getcwdu() to os.getcwd(), leaves os.getcwd

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 10:41 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: > Guido van Rossum <[EMAIL PROTECTED]> wrote: >> On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: >> > Victor Stinner <[EMAIL PROTECTED]> wrote: >> > >> >> - listdir(unicode) -> only unicode, *skip* i

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread M.-A. Lemburg
On 2008-09-30 18:46, Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: >> In the end, I think it's better not to be clever and just return >> the filenames that cannot be decoded as bytes objects in os.listdir(). > > Unfortunately that's going to b

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread James Y Knight
On Sep 30, 2008, at 1:37 PM, Marcin 'Qrczak' Kowalczyk wrote: I've experimentally implemented (not for Python) a different escaping scheme with a similar goal as UTF-8b: undecodable bytes are prefixed with U+ instead of being converted to unpaired surrogates, and '\x00' decodes as U+ U+00

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <[EMAIL PROTECTED]> wrote: >> How can it *regularly* drive you crazy when "the majority of fie names >> [...] encoded correctly" (as you assert above)? > > Because Office files are a) often named with long, seemingly descriptive > filenames, which inva

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Bill Janssen
Guido van Rossum <[EMAIL PROTECTED]> wrote: > On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: > > Victor Stinner <[EMAIL PROTECTED]> wrote: > > > >> - listdir(unicode) -> only unicode, *skip* invalid filenames > >>(as asked by Guido) > > > > Is there an option listdir(

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Marcin 'Qrczak' Kowalczyk
2008/9/30 James Y Knight <[EMAIL PROTECTED]>: u'\udc90\udc90'.encode('utf-8') > '\xed\xb2\x90\xed\xb2\x90' This is wrong: UTF-8 (like other UTF-x) encodes Unicode scalar values, not Unicode code points, i.e. surrogates as such are unencodable. '\xed\xb2\x90' is invalid UTF-8. I've experimen

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Terry Reedy
Guido van Rossum wrote: On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : I know I keep flipflopping on this one, but the more I think about it the more I believe it is better to drop those names than

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 9:20 AM, James Y Knight <[EMAIL PROTECTED]> wrote: > > On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote: > >>> Except...that one over there. That's the whole point of UTF-8b: >>> correctly encoded names get decoded correctly and readably, and the >>> other cases get d

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen <[EMAIL PROTECTED]> wrote: > Victor Stinner <[EMAIL PROTECTED]> wrote: > >> - listdir(unicode) -> only unicode, *skip* invalid filenames >>(as asked by Guido) > > Is there an option listdir(bytes) which will return *all* filenames (as > byte sequen

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > In the end, I think it's better not to be clever and just return > the filenames that cannot be decoded as bytes objects in os.listdir(). Unfortunately that's going to break most code that is using os.listdir(), so it's ha

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread James Y Knight
On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote: Except...that one over there. That's the whole point of UTF-8b: correctly encoded names get decoded correctly and readably, and the other cases get decoded into something unique that cannot possibly conflict. Sure. But there are lots o

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread glyph
On 12:47 am, [EMAIL PROTECTED] wrote: This is the most sane contribution I've seen so far :). See attached patch: python3_bytes_filename.patch Using the patch, you will get: - open() support bytes - listdir(unicode) -> only unicode, *skip* invalid filenames (as asked by Guido) Forgive me fo

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Bill Janssen
Victor Stinner <[EMAIL PROTECTED]> wrote: > - listdir(unicode) -> only unicode, *skip* invalid filenames >(as asked by Guido) Is there an option listdir(bytes) which will return *all* filenames (as byte sequences)? Otherwise, this seems troubling to me; *something* should be returned for f

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread M.-A. Lemburg
On 2008-09-30 16:05, Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: >> On 2008-09-30 08:00, Martin v. Löwis wrote: Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: . >>> Exactly. S

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread James Y Knight
On Sep 29, 2008, at 7:50 PM, Adam Olsen wrote: I'd rather the 1% of cases that need to handle bad file names make an explicit effort to do so, via alternate byte APIs or (if necessary) the 8859-1 hack. So are you okay with python failing to run properly if the current directory has strange by

Re: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 6:21 AM, <[EMAIL PROTECTED]> wrote: > On 12:47 am, [EMAIL PROTECTED] wrote: > > This is the most sane contribution I've seen so far :). Thanks. I'll review it later today (after coffee+breakfast :) and will apply it assuming the code is reasonably sane, otherwise I'll go a

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > On 2008-09-30 08:00, Martin v. Löwis wrote: >>> Change the default file system encoding to store bytes in Unicode is like >>> introducing a new Python type: . >> >> Exactly. Seems like the best solution to me, despite your

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Victor Stinner
Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez écrit : > On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >> Change the default file system encoding to store bytes in Unicode is > >> like introducing a new Python type: . > > > > Exactly. Seems lik

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Adam Olsen gmail.com> writes: >> >> The only way to display that file would be to transform it into some >> other valid unicode string. However, as that string is already valid, >> you've just made any files named after

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 11:22 PM, Georg Brandl <[EMAIL PROTECTED]> wrote: > No, that was not what I meant (although it is another possibility). As I > wrote, > Martin's proposal that I support here is using the modified UTF-8 codec that > successfully roundtrips otherwise invalid UTF-8 data. I th

Re: [Python-3000] [Python-Dev] Patch for an initia l support of bytes filename in Python3

2008-09-30 Thread Victor Stinner
Hi, > This is the most sane contribution I've seen so far :). Oh thanks. > Do I understand properly that (listdir(bytes) -> bytes)? Yes, os.listdir(bytes)->bytes. It's already the current behaviour. But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode conversion fails). >

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> Change the default file system encoding to store bytes in Unicode is like >> introducing a new Python type: . > > Exactly. Seems like the best solution to me, despite your polemics. Martin, I don't understand why you

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread Guido van Rossum
On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: > >> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit : > >>> I know I keep flipflopping on this one, but the more I think about it >>> the more I believe it is better to drop those names than to raise an

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

2008-09-30 Thread Antoine Pitrou
Le lundi 29 septembre 2008 à 17:50 -0600, Adam Olsen a écrit : > It's correct in the sense that it can roundtrip all filenames. UTF-8b > is lossy, so certain filenames are not roundtripped properly. Why do you say UTF-8b is lossy? From what I've read it claims to be lossless (i.e. the range of ch

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 5:24 AM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Adam Olsen writes: > > > [1] You could argue that Unicode should add new scalars to handle all > > currently invalid UTF-8 sequences. > > AFAIK there are about 2^31 of these, though! They've promised to never alloc

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Adam Olsen
On Tue, Sep 30, 2008 at 3:28 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Adam Olsen gmail.com> writes: >> >> The only way to display that file would be to transform it into some >> other valid unicode string. However, as that string is already valid, >> you've just made any files named after

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Stephen J. Turnbull
Adam Olsen writes: > [1] You could argue that Unicode should add new scalars to handle all > currently invalid UTF-8 sequences. AFAIK there are about 2^31 of these, though! ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/ma

Re: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

2008-09-30 Thread M.-A. Lemburg
On 2008-09-30 08:00, Martin v. Löwis wrote: >> Change the default file system encoding to store bytes in Unicode is like >> introducing a new Python type: . > > Exactly. Seems like the best solution to me, despite your polemics. Not a bad idea... have os.listdir() return Unicode subclasses that

Re: [Python-3000] New proposition for Python3 bytes filename issue

2008-09-30 Thread Antoine Pitrou
Adam Olsen gmail.com> writes: > > The only way to display that file would be to transform it into some > other valid unicode string. However, as that string is already valid, > you've just made any files named after it impossible to open. Not if those valid sequences are also properly escaped t