Dave Angel wrote:
On 11/20/2011 04:45 PM, Steven D'Aprano wrote:
<snip>

Something in the tool chain before it reached Python has saved it using a wide (four byte) encoding, most likely UTF-16 as that is widely used by Windows and Java. With the right settings, it could take as little as opening the file in Notepad, then clicking Save.


UTF-16 is a two byte format. That's typically what Windows uses for Unicode. It's Unices that are more likely to use a four-byte format.

Oops, you're right of course, two bytes, not four:

py> u'M'.encode('utf-16BE')
'\x00M'

I was thinking of four hex digits:

py> u'M'.encode('utf-16BE').encode('hex')
'004d'




--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to