New submission from Serhiy Storchaka <[email protected]>:
codecs.charmap_decode behaves differently with native and user string as decode
table.
>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
...
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)
It's because charmap decoder (function PyUnicode_DecodeCharmap in
Objects/unicodeobject.c) uses different algorithms for exact strings and for
other.
We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00',
'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00',
'replace', {0:0xFFFE})`?
----------
components: Interpreter Core
messages: 161054
nosy: storchaka
priority: normal
severity: normal
status: open
title: The inconsistency of codecs.charmap_decode
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14850>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com