[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-23 Thread Mahmoud Hashemi

Mahmoud Hashemi added the comment:

I would urge you all take a stronger look at usability, rather than parroting 
the current state of the design and docs. Python gained renown over the years 
for its ability to stay flexible while maturing. Focusing on purity and 
ignoring the needs of practical programmers is exactly how PEP #461 ended up 
coming into play so late.

The inflexible arguments of str makes a common task, turning data into text, an 
order of magnitude harder than it needs to be.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-22 Thread R. David Murray

R. David Murray added the comment:

Sounds like we should close this as rejected, then.  Serhiy's point is a good 
one.  Maybe not the way we'd design the api from scratch, but it's what we've 
got and it serves a purpose.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-22 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

I agree with closing this as won't fix.

It is true that the encoding keyword argument is only useful when passing in 
byte strings or (and that's also where it originated in Python 2: the default 
string type is a byte string), but even in Python 3, this is still one of the 
main uses of the str() constructor.

Note that it's not uncommon to have arguments only be useful for certain types 
of input objects. See e.g. the int() constructor base argument for similar 
example.

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Please don't deprecate the encoding parameter in str. It has a use case. str 
constructor works with any bytes-like objects, even with these that don't have 
the decode method. It raises more appropriate TypeError instead of 
AttributeError, so often you don't need to wrap an error.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-22 Thread Benjamin Peterson

Changes by Benjamin Peterson benja...@python.org:


--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-22 Thread Martin Panter

Martin Panter added the comment:

I thought it might be okay to use codecs.decode() instead for those cases, 
though it doesn’t check for text encodings. And support for arbitrary 
bytes-like object doesn’t seem to be documented (though seems to work in 
reality).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Mahmoud Hashemi

Mahmoud Hashemi added the comment:

Python already has one approach that fails to decode non-bytestrings: the 
.decode() method. 

This is about removing unicode barriers to entry and making the str constructor 
in Python 3 as succinctly useful as possible. There are several problems the 
helper does not solve:

1) Usage-wise, str/unicode is used to turn values into text. From a high-level 
perspective, the content does not change, only the representation format. 
Should this fundamental operation really require type inspection and explicit 
try/except blocks every single time? Or should it just work? sorted() does not 
raise an exception if the values are already sorted, why does str() raise an 
exception when the value is already a str?*

2) By and large, among developers, keyword arguments are viewed as optional 
arguments that have defaults which can be overridden. However, that is not the 
case here; str is not simply str(obj, encoding=sys.getdefaultencoding()). 
Explicitly passing the keyword argument breaks the call.

3) The helper does not help promote Python adoption when it must be copied and 
pasted it into new developer's projects. It does not help break down the 
misconception that unicode is a punishing concept to be around in Python.

* This question is posed here rhetorically, but I have gotten variations on it 
from multiple Python developers in training.

--
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Eric V. Smith

Eric V. Smith added the comment:

As this is an enhancement request, I've changed the versions.

I'm opposed to this change. If I pass an encoding along with a type for which 
it makes no sense, I'd prefer an error instead of silently ignoring the 
encoding.

I think your helper function is an appropriate solution to your problem.

--
components: +Interpreter Core -Unicode
nosy: +eric.smith
type: behavior - enhancement
versions:  -Python 2.7, Python 3.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Martin Panter

Martin Panter added the comment:

I don’t think changes to Python 2 are considered here, unless they are bug 
fixes, and this does not sound like a bug fix.

For Python 3, it sounds like you are proposing that str() accept encoding 
arguments even when not decoding from bytes. It sounds like this would mask the 
error if you called str(buffer, ascii), and the buffer happened to be an 
integer or a list, etc, by accident. Also, this woul

It seems str() is designed to have two separate modes:

1. str(object) is basically equivalent to format(object), with a warning if 
“object” happens to be a byte string or array

2. str(object, encoding, ...) is normally equivalent to object.decode(encoding, 
...), or if that is not supported, codecs.decode(object, encoding, ...)

Your proposal sounds like it would make it easier to confuse these two modes. 
What should str(b123, encoding=None) do? Why should the behaviour of 
str(file, encoding) vary depending on whether an ordinary file object or a 
memory-mapped file is passed?

IMO in a perfect Python 4 world, str() would only have a single personality 
(perhaps always returning an empty string, or a more strict conversion). Making 
a formatted string representations of arbitrary objects would be left to the 
format() and repr() functions, and decoding bytes to text would be left to the 
existing decode() methods and functions, or maybe a separate str.from_bytes() 
constructor, mirroring int.from_bytes().

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Mahmoud Hashemi

Mahmoud Hashemi added the comment:

Martin, it sounds that way because that is what is being proposed: Merging and 
simplifying the two modes. Given the existence of .decode() on bytestrings, 
the only objects that generally need decoding in Python 2 and 3, the existence 
of str/unicode's second mode constitutes a design bug.

Without a doubt, Python has frequently preferred convenient idioms over EAFP. 
Look at dict.get for an excellent example of defaults being used instead of 
forcing users to catch KeyErrors. That conversation could have gone a different 
way, but Python is better off having stuck to its pragmatic roots.

In answer to your questions, Martin, 1) I'd expect str(b123, encoding=None) 
to do the same thing as str(b123)  and 2) I'd expect str(obj) behavior to 
continue to depend on whether the object passed is string-like. Python is a 
duck-typed, dynamic language, and dynamic languages are most powerful when 
their core types reflect usability. Consistency is one of the foremost factors 
of usability, and having to frequently switch between two call patterns of the 
str constructor feels inconsistent and unusable.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread R. David Murray

R. David Murray added the comment:

It does feel like the encoding argument is left over from the translation of 
the unicode constructor into the str constructor.  I wouldn't be opposed to 
deprecating it, myself, though we'd probably never remove it.  I would be 
opposed to making it work on non-bytes-like objects.

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Martin Panter

Martin Panter added the comment:

Okay, I was trying to confirm your proposal in Python 3 terms, because in 
Python 2, str has a different meaning and I was confused.

I agree that the existence of the decoding mode is a design bug, so how would 
you feel about deprecating it, at least in the documentation? I.e. in Python 3, 
deprecate usage like str(buffer, utf-8) in favour of buffer.decode(utf-8) 
or using the codecs module directly. If this was done, it would clearly remove 
the need for an encoding parameter to str() in all cases. I would be in favour 
of deprecating the complementary bytes() and bytearray() encoding modes as well.

Do you have an example use case in Python 3 that would benefit from always 
allowing an encoding parameter? I can understand that your to_unicode() 
function could be useful in Python 2. But in Python 3, byte strings tend to 
hold raw data that is not necessarily textual at all. There are some places 
(warts in my opinion) such as the binascii module where ASCII-encoded byte 
strings are common, but I still don’t think this proposal would be very helpful 
with that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Berker Peksag

Changes by Berker Peksag berker.pek...@gmail.com:


--
nosy: +berker.peksag

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-21 Thread Eric V. Smith

Eric V. Smith added the comment:

I agree with deprecating (in the documentation) but never removing the encoding 
argument to str() in Python 3. .decode() is the better way to convert a 
bytes-like object to a str.

Every change proposed here would be an enhancement in 2.7, and we are not 
implementing enhancements there.

--
versions: +Python 3.6 -Python 2.7, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24019] str/unicode encoding kwarg causes exceptions

2015-04-20 Thread Mahmoud Hashemi

New submission from Mahmoud Hashemi:

The encoding keyword argument to the Python 3 str() and Python 2 unicode() 
constructors is excessively constraining to the practical use of these core 
types.

Looking at common usage, both these constructors' primary mode is to convert 
various objects into text:

 str(2)
'2'

But adding an encoding yields:

 str(2, encoding='utf8')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: coercing to str: need bytes, bytearray or buffer-like object, int 
found

While the error message is fine for an experienced developer, I would like to 
raise the question: is it necessary at all? Even harmlessly getting a str from 
a str is punished, but leaving off encoding is fine again:

 str('hi', encoding='utf8')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: decoding str is not supported
 str('hi')
'hi'

Merging and simplifying the two modes of these constructors would yield much 
more predictable results for experienced and beginning Pythonists alike. 
Basically, the encoding argument should be ignored if the argument is already a 
unicode/str instance, or if it is a non-string object. It should only be 
consulted if the primary argument is a bytestring. Bytestrings already have a 
.decode() method on them, another, obscurer version of it isn't necessary.

Furthermore, despite the core nature and widespread usage of these types, 
changing this behavior should break very little existing code and 
understanding. unicode() and str() will simply behave as expected more often, 
returning text versions of the arguments passed to them. 

Appendix: To demonstrate the expected behavior of the proposed unicode/str, 
here is a code snippet we've employed to sanely and safely get a text version 
of an arbitrary object:

def to_unicode(obj, encoding='utf8', errors='strict'):
# the encoding default should look at sys's value
try:
return unicode(obj)
except UnicodeDecodeError:
return unicode(obj, encoding=encoding, errors=errors)

After many years of writing Python and teaching it to developers of all 
experience levels, I firmly believe that this is the right interaction pattern 
for Python's core text type. I'm also happy to expand on this issue, turn it 
into a PEP, or submit a patch if there is interest.

--
components: Unicode
messages: 241699
nosy: ezio.melotti, haypo, mahmoud
priority: normal
severity: normal
status: open
title: str/unicode encoding kwarg causes exceptions
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24019
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com