On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]> wrote:
[1] UTF-8b has a similar property to 8859-1, in that all byte strings can be successfully round-tripped. It's not currently implemented in python core, but it's a pretty trivial encoding, and is available under the BSD license,
see below.

UTF-8b doesn't work as intended.  It produces an invalid unicode
object (garbage surrogates) that cannot be used with external APIs or
libraries that require unicode.

I'd be interested to hear more detail on what you expect the practical ramifications of this to be. It doesn't sound likely to be a problem to me.

If you don't need unicode then your
code should state so explicitly, and 8859-1 is ideal there.

But, I *do* want unicode. ALL my filenames are encoded in utf8. Except...that one over there. That's the whole point of UTF-8b: correctly encoded names get decoded correctly and readably, and the other cases get decoded into something unique that cannot possibly conflict.

James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to