On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
An ugly hack, but more correct than UTF-8b or any similar attempt to do "unicode but not quite unicode"; either it's lossy, or it's not unicode. There's no in between.
Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very poor way forward. I don't see how you can claim it's more correct. It's correct in no case except for pure ASCII on a utf-8 system.
I still like the UTF-8b proposal, but if you want to push against that, I don't see any sensible alternative but to move back towards a bytestring API. Having two parallel APIs or a mixture of data types is confusing, so, just toss the Unicode APIs entirely. That'd be much much nicer than having everyone use 8859-1, incorrectly, for their platform encoding.
On Windows, the platform-native Unicode strings could simply be encoded into utf-8 when entering Python-land, and decoded back to Unicode when leaving pythonland, to keep the API consistently bytestring oriented on both platforms.
James _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com