On Mon, Sep 29, 2008 at 5:33 PM, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
>>
>> An ugly hack, but more correct than UTF-8b or any similar attempt to
>> do "unicode but not quite unicode"; either it's lossy, or it's not
>> unicode.  There's no in between.
>
> Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very
> poor way forward. I don't see how you can claim it's more correct. It's
> correct in no case except for pure ASCII on a utf-8 system.

It's correct in the sense that it can roundtrip all filenames.  UTF-8b
is lossy, so certain filenames are not roundtripped properly.

It doesn't let you print correctly, but neither would an API that
returns bytes.  8859-1 is just a hack for when you want bytes, but the
API only allows unicode.


> I still like the UTF-8b proposal, but if you want to push against that, I
> don't see any sensible alternative but to move back towards a bytestring
> API. Having two parallel APIs or a mixture of data types is confusing, so,
> just toss the Unicode APIs entirely. That'd be much much nicer than having
> everyone use 8859-1, incorrectly, for their platform encoding.

As a user, I expect all file names to be printable.  That requires
unicode, and any program that creates filenames with arbitrary
bytestrings is just broken.  Not all operating systems enforce this
yet, but returning bytes only means we have to explicitly decode in
the 99% of cases where we'd happily assume it's correct unicode.

I'd rather the 1% of cases that need to handle bad file names make an
explicit effort to do so, via alternate byte APIs or (if necessary)
the 8859-1 hack.


> On Windows, the platform-native Unicode strings could simply be encoded into
> utf-8 when entering Python-land, and decoded back to Unicode when leaving
> pythonland, to keep the API consistently bytestring oriented on both
> platforms.


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to