Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Zooko O'Whielacronx Tue, 28 Apr 2009 12:06:31 -0700

On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:

Are you proposing to unconditionally encode file names asiso8859-15, or to do so only when undecodeable bytes are encountered?

For what it is worth, what we have previously planned to do for theTahoe project is the second of these -- decode using some 1-byteencoding such as iso-8859-1, iso-8859-15, or windows-1252 only in thecase that attempting to decode the bytes using the local allegedencoding failed.

If you switch to iso8859-15 only in the presence of undecodableUTF-8, then you have the same round-trip problem as the PEP: bothb'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without away to unambiguously recover the original file name.


Why do you say that?  It seems to work as I expected here:

>>> '\xff'.decode('iso-8859-15')
u'\xff'
>>> '\xc3\xbf'.decode('iso-8859-15')
u'\xc3\xbf'
>>>
>>>
>>>
>>> '\xff'.decode('cp1252')
u'\xff'
>>> '\xc3\xbf'.decode('cp1252')
u'\xc3\xbf'

Regards,

Zooko
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to