UTF16 codec doesn't round-trip?

John Perks and Sarah Mount Sat, 28 May 2005 13:17:35 -0700

(My Python uses UTF16 natively; can someone with UTF32 Python let me
know if that behaves differently?)


>>> import codecs
>>> u'\ud800' # part of surrogate pair
u'\ud800'
codecs.utf_16_be_encode(_)[0]
'\xd8\x00'
codecs.utf_16_be_decode(_)[0]
Traceback (most recent call last):
  File "<input>", line 1, in ?
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1:
unexpected end of data

If the ascii can't be recognized as UTF16, then surely the codec
shouldn't have allowed it to be encoded in the first place? I could
understand if it was trying to decode ascii into (native) UTF32.

On a similar note, if you are using UTF32 natively, are you allowed to
have raw surrogate escape sequences (paired or otherwise) in unicode
literals?

Thanks

John


-- 
http://mail.python.org/mailman/listinfo/python-list

UTF16 codec doesn't round-trip?

Reply via email to