STINNER Victor <victor.stin...@gmail.com> added the comment:

I ran tests of utf16_error_handling-3.2_4.patch on Python 3.1. Two tests are 
failing:
 - b'\x00\xd8'.decode('utf-16le', 'replace')='\ufffd\ufffd' != '\ufffd'
 - b'\xd8\x00'.decode('utf-16be', 'replace')='\ufffd\ufffd' != '\ufffd'

I don't think that the test is correct: UTF-16 should resynchronize as early as 
possible (ignore the first invalid byte and restart at the following byte), so 
'\ufffd\ufffd' is the correct answer.

Another examples:
 - b'\xd8\x00\x41'.decode('utf-16be', 'replace') should return '�A' (\ufffdA')
 - with UTF-8 decoder: (b'\xC3' + '\xe9'.encode('utf-8')).decode('utf-8', 
'replace') returns '\ufffd\xe9'

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14579>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to