On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau <[EMAIL PROTECTED]> wrote: >>> As far as I can see all Python Unicode strings can be encoded to UTF-8, >>> even things like lone surrogates because Python doesn't care about them. >>> So both the Unicode API and the binary API would be fail-safe on Windows. >> >> Python is broken and needs to be fixed. >> >> http://bugs.python.org/issue3672 >> http://bugs.python.org/issue3297 > > But the question of whether Python should care about lone surrogates or > not is at best tangential to the issue at hand. If you have lone > surrogates in the Unicode API (and didn't raise an exception on the way > getting there), then the sensible thing is to encode them into lone > UTF-8 surrogates. Even if you wanted to prevent lone surrogates, > encoding to UTF-8 for the binary API would not be the place to enforce it.
No. Unicode *requires* them to be treated as errors. If you want to pass them through then you're creating a custom encoding... which you might argue for in this case, but it needs to be clearly separate from the real UTF-8. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com