On Feb 10, 2016, at 15:11, eryk sun <eryk...@gmail.com> wrote:
> 
> On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev
> <python-dev@python.org> wrote:
>>  [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
>> systems, and you use
>> CreateFileA to create a file named "ハローワールド". The actual bytes you're 
>> sending are cp436
>> for "ânâìü[âÅü[âïâh", so the file on the CD is named, in Unicode, 
>> "ânâìü[âÅü[âïâh".
> 
> Unless the system default was changed or the program called
> SetFileApisToOEM, CreateFileA would decode using the ANSI codepage
> 1252, not the OEM codepage 437 (not 436), i.e.
> "ƒnƒ\x8d\x81[ƒ\x8f\x81[ƒ‹ƒh". Otherwise the example is right. But the
> transcoding strategy won't work in general. For example, if the tables
> are turned such that the ANSI codepage is 932 and the program passes a
> bytes name from codepage 1252, the user on the other end won't be able
> to transcode without error if the original bytes contained invalid
> DBCS sequences that were mapped to the default character, U+30FB.
> This
> transcodes as the meaningless string "\x81E". The user can replace
> that string with "--" and enjoy a nice game of hang man.

Of course there's no way to recover the actual intended filenames if that 
information was thrown out instead of being stored, but that's no different 
from the situation where the user mashed the keyboard instead of typing what 
they intended.

The point remains: the Mac strategy (which is also the linux strategy for 
filesystems that are inherently UTF-16) always generates valid UTF-8, and 
doesn't try to magically cure mojibake but doesn't get in the way of the user 
manually curing it. When the Unicode encoding is lossy, of course the user 
can't cure that, but UTF-8 isn't making it any harder.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to