(My Python uses UTF16 natively; can someone with UTF32 Python let me know if that behaves differently?)
>>> import codecs >>> u'\ud800' # part of surrogate pair u'\ud800' codecs.utf_16_be_encode(_)[0] '\xd8\x00' codecs.utf_16_be_decode(_)[0] Traceback (most recent call last): File "<input>", line 1, in ? UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data If the ascii can't be recognized as UTF16, then surely the codec shouldn't have allowed it to be encoded in the first place? I could understand if it was trying to decode ascii into (native) UTF32. On a similar note, if you are using UTF32 natively, are you allowed to have raw surrogate escape sequences (paired or otherwise) in unicode literals? Thanks John -- http://mail.python.org/mailman/listinfo/python-list