On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau <[EMAIL PROTECTED]> wrote:
>>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>>> even things like lone surrogates because Python doesn't care about them.
>>> So both the Unicode API and the binary API would be fail-safe on Windows.
>>
>> Python is broken and needs to be fixed.
>>
>> http://bugs.python.org/issue3672
>> http://bugs.python.org/issue3297
>
> But the question of whether Python should care about lone surrogates or
> not is at best tangential to the issue at hand.  If you have lone
> surrogates in the Unicode API (and didn't raise an exception on the way
> getting there), then the sensible thing is to encode them into lone
> UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
> encoding to UTF-8 for the binary API would not be the place to enforce it.

No.  Unicode *requires* them to be treated as errors.  If you want to
pass them through then you're creating a custom encoding... which you
might argue for in this case, but it needs to be clearly separate from
the real UTF-8.


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to