Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Adam Olsen Mon, 29 Sep 2008 14:57:55 -0700

On Mon, Sep 29, 2008 at 5:12 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>>
>> UTF-8b doesn't work as intended.  It produces an invalid unicode
>> object (garbage surrogates) that cannot be used with external APIs or
>> libraries that require unicode.
>
> At least it works with all Python operations supported by the unicode type
> (methods, concatenation, etc.) without any bad surprise. That feeding it to 
> e.g.
> PyGTK may give bogus results is another problem.
>
>> If you don't need unicode then your
>> code should state so explicitly, and 8859-1 is ideal there.
>
> But then you can say bye-bye to proper representation (e.g. using print()) of
> even valid filenames.


You can't print UTF-8b either.  Printing requires converting the
unicode object to UTF-8 (or whatever output encoding), and the unicode
object isn't valid, so you'd get an exception[1].

The same applies to all other hacks (such as PUA scalars).  Either the
scalar value already has an expected behaviour, in which case decoding
is lossy and reencoding replaces the correct behaviour, or it's not a
valid scalar value, which then can't be used with any external API
that requires conformant unicode.  There's no solution except to not
decode, and 8859-1 is the way to do that.


[1] Python's UTF codecs are broken in a couple respects, including the
fact that python itself uses CESU-8(!).  See
http://bugs.python.org/issue3297 and http://bugs.python.org/issue3672


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Reply via email to