[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

STINNER Victor Fri, 03 Jun 2011 15:28:26 -0700

STINNER Victor <[email protected]> added the comment:

cjk_decode.patch:
 - patch *all* CJK decoders to replace only the first byte of an invalid byte 
sequence (by U+FFFD). Example from the issue title: b'\xff\n'.decode('gb2312', 
'replace') gives now '�\n' instead of just '�'
 - add at least one unit test for *each* path in the decoder (sometimes it was 
really hard to see how to go into a specific path, especially for the johab 
decoder!)
 - add testcases for euc_jis_2004 and shift_jis_2004
 - factorize "codec tests" (codectests) of all japanese EUC tests 
(euc_commontests)


Because I consider this issue as a bug, I would like to apply this patch to 
2.7, 3.2 and 3.3.

----------
keywords: +patch
Added file: http://bugs.python.org/file22241/cjk_decode.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12016>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12016] Wrong behavior for '\xff\n'.decode('gb2312', 'ignore')

Reply via email to