On 19Aug2016 1225, Daniel Holth wrote:
#1 sounds like a great idea. I suppose surrogatepass solves
approximately the same problem of Rust's WTF-8, which is a way to
round-trip bad UCS-2? https://simonsapin.github.io/wtf-8/

Yep.

#2 sounds like it would leave several problems, since mbcs is not the
same as a normal text encoding, IIUC it depends on the active code page.
So if your active code page is Russian you might not be able to encode
Japanese characters into MBCS.

That's correct. In 99% (or more) of cases, mbcs is going to be the same as what we currently have. The difference is that when we encode/decode in CPython we can use a different handler than 'replace' and at least prevent the _silent_ data loss.

Solution #2a Modify Windows so utf-8 is a valid value for the current
MBCS code page.

Presumably a joke, but won't happen because too many applications assume that the active code page is one byte per character, which it isn't, but it's close enough that most of the time you never notice. (Incidentally, this is also the problem with utf-16, since many applications also assume that it's always one wchar_t per character and get away with it. At least with utf-8 you encounter multi-byte sequences often enough that you basically are forced to deal with them.)

Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to