On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
An ugly hack, but more correct than UTF-8b or any similar attempt to
do "unicode but not quite unicode"; either it's lossy, or it's not
unicode.  There's no in between.

Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very poor way forward. I don't see how you can claim it's more correct. It's correct in no case except for pure ASCII on a utf-8 system.

I still like the UTF-8b proposal, but if you want to push against that, I don't see any sensible alternative but to move back towards a bytestring API. Having two parallel APIs or a mixture of data types is confusing, so, just toss the Unicode APIs entirely. That'd be much much nicer than having everyone use 8859-1, incorrectly, for their platform encoding.

On Windows, the platform-native Unicode strings could simply be encoded into utf-8 when entering Python-land, and decoded back to Unicode when leaving pythonland, to keep the API consistently bytestring oriented on both platforms.

James

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to