New submission from Mahmoud Hashemi:
The encoding keyword argument to the Python 3 str() and Python 2 unicode()
constructors is excessively constraining to the practical use of these core
types.
Looking at common usage, both these constructors' primary mode is to convert
various objects into text:
>>> str(2)
'2'
But adding an encoding yields:
>>> str(2, encoding='utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: coercing to str: need bytes, bytearray or buffer-like object, int
found
While the error message is fine for an experienced developer, I would like to
raise the question: is it necessary at all? Even harmlessly getting a str from
a str is punished, but leaving off encoding is fine again:
>>> str('hi', encoding='utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decoding str is not supported
>>> str('hi')
'hi'
Merging and simplifying the two modes of these constructors would yield much
more predictable results for experienced and beginning Pythonists alike.
Basically, the encoding argument should be ignored if the argument is already a
unicode/str instance, or if it is a non-string object. It should only be
consulted if the primary argument is a bytestring. Bytestrings already have a
.decode() method on them, another, obscurer version of it isn't necessary.
Furthermore, despite the core nature and widespread usage of these types,
changing this behavior should break very little existing code and
understanding. unicode() and str() will simply behave as expected more often,
returning text versions of the arguments passed to them.
Appendix: To demonstrate the expected behavior of the proposed unicode/str,
here is a code snippet we've employed to sanely and safely get a text version
of an arbitrary object:
def to_unicode(obj, encoding='utf8', errors='strict'):
# the encoding default should look at sys's value
try:
return unicode(obj)
except UnicodeDecodeError:
return unicode(obj, encoding=encoding, errors=errors)
After many years of writing Python and teaching it to developers of all
experience levels, I firmly believe that this is the right interaction pattern
for Python's core text type. I'm also happy to expand on this issue, turn it
into a PEP, or submit a patch if there is interest.
----------
components: Unicode
messages: 241699
nosy: ezio.melotti, haypo, mahmoud
priority: normal
severity: normal
status: open
title: str/unicode encoding kwarg causes exceptions
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24019>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com