Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

James Y Knight Fri, 24 Apr 2009 07:54:34 -0700

On Apr 24, 2009, at 8:00 AM, Paul Moore wrote:

However, it *does* agree with the reality of Windows file systems. The
fundamental problem here is that there is a strong OS disparity - for
Windows, the OS uses Unicode, for POSIX, the OS uses bytes.

It's unfortunately the case that this isn't *precisely* true. Windowsuses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bitsequences. Neither one is required by the operating system to be aproper unicode encoding. The main difference is that there is alreadya widely accepted way to decode a improperly-encoded 16-bit-sequencewith the utf-16 codec: simply leave the lone surrogate pairs in place.


James
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to