Re: [Python-Dev] File system path encoding on Windows

Steve Dower Fri, 19 Aug 2016 12:39:45 -0700

On 19Aug2016 1225, Daniel Holth wrote:

#1 sounds like a great idea. I suppose surrogatepass solves
approximately the same problem of Rust's WTF-8, which is a way to
round-trip bad UCS-2? https://simonsapin.github.io/wtf-8/


Yep.

#2 sounds like it would leave several problems, since mbcs is not the
same as a normal text encoding, IIUC it depends on the active code page.
So if your active code page is Russian you might not be able to encode
Japanese characters into MBCS.

That's correct. In 99% (or more) of cases, mbcs is going to be the sameas what we currently have. The difference is that when we encode/decodein CPython we can use a different handler than 'replace' and at leastprevent the _silent_ data loss.

Solution #2a Modify Windows so utf-8 is a valid value for the current
MBCS code page.

Presumably a joke, but won't happen because too many applications assumethat the active code page is one byte per character, which it isn't, butit's close enough that most of the time you never notice. (Incidentally,this is also the problem with utf-16, since many applications alsoassume that it's always one wchar_t per character and get away with it.At least with utf-8 you encounter multi-byte sequences often enough thatyou basically are forced to deal with them.)


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] File system path encoding on Windows

Reply via email to