[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

STINNER Victor Mon, 13 Jun 2011 06:54:13 -0700

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

Patch version 3:
 - add unit tests for code pages 932, 1252, CP_UTF7 and CP_UTF8
 - fix encode/decode flags for CP_UTF7/CP_UTF8
 - fix encode name on UnicodeDecodeError, support also "CP_UTF7" and "CP_UTF8" 
code page names


TODO:

 - The decoder (with errors) doesn't support multibyte characters, e.g. 
b"\xC3\xA9\xFF" is not correctly decoded using "replace" (insize is fixed to 1)
 - The encoder doesn't support surrogate pairs, but the result with UTF-8 looks 
correct
 - UTF-7 decoder is not strict, e.g. b'[+/]' is decoded to '[]' in strict mode
 - UTF-8 encoder is not strict, e.g. replace surrogates by U+FFFD
 - Use final in decode_mbcs_errors(): a multibyte character may be splitted 
between two chunks of INT_MAX bytes
 - Implement suggested Martin's optimizations?

----------
Added file: http://bugs.python.org/file22340/mbcs3.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12281>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

Reply via email to