Re: [Python-Dev] PEP 383 (again)

Hrvoje Niksic Tue, 28 Apr 2009 05:41:47 -0700

Lino Mastrodomenico wrote:

Let's suppose that I use Python 2.x or something else to create a file
with name b'\xff'. My (Linux) system has a sane configuration and the
filesystem encoding is UTF-8, so it's an invalid name but the kernel
will blindly accept it anyway.


With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.


One question that really bothers me about this proposal is the following:

Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8sequence, will be converted to the half-surrogate '\udcff'. However, afile named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also beconverted to '\udcff'. Those are quite different POSIX pathnames; howwill Python know which one it was when I later pass '\udcff' to open()?


A poster hinted at this question, but I haven't seen it answered, yet.


[1]

I'm assuming that it's valid UTF8 because it passes through Python 2.5's'\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 expert.

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383 (again)

Reply via email to