On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]>
wrote:
[1] UTF-8b has a similar property to 8859-1, in that all byte
strings can be
successfully round-tripped. It's not currently implemented in
python core,
but it's a pretty trivial encoding, and is available under the BSD
license,
see below.
UTF-8b doesn't work as intended. It produces an invalid unicode
object (garbage surrogates) that cannot be used with external APIs or
libraries that require unicode.
I'd be interested to hear more detail on what you expect the practical
ramifications of this to be. It doesn't sound likely to be a problem
to me.
If you don't need unicode then your
code should state so explicitly, and 8859-1 is ideal there.
But, I *do* want unicode. ALL my filenames are encoded in utf8.
Except...that one over there. That's the whole point of UTF-8b:
correctly encoded names get decoded correctly and readably, and the
other cases get decoded into something unique that cannot possibly
conflict.
James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com