[issue14850] The inconsistency of codecs.charmap_decode

Serhiy Storchaka Fri, 18 May 2012 07:46:42 -0700

New submission from Serhiy Storchaka <[email protected]>:

codecs.charmap_decode behaves differently with native and user string as decode 
table.


>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
... 
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)

It's because charmap decoder (function PyUnicode_DecodeCharmap in 
Objects/unicodeobject.c) uses different algorithms for exact strings and for 
other.

We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 
'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 
'replace', {0:0xFFFE})`?

----------
components: Interpreter Core
messages: 161054
nosy: storchaka
priority: normal
severity: normal
status: open
title: The inconsistency of codecs.charmap_decode
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14850>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14850] The inconsistency of codecs.charmap_decode

Reply via email to