[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Kyle Stanley Sun, 15 Dec 2019 16:59:17 -0800

Serhiy Storchaka wrote:
> Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.


+1, I suspect that nobody would intentionally pass an argument to the
encoding and/or errors parameter(s) without specifying an object. Returning
an empty string from this seems like it would cover up bugs rather than be
useful in any capacity.

Serhiy Storchaka wrote:
> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only.

+1, I don't think I've ever seen a single instance of code that passes the
first parameter, *object*, as a kwarg: str(object=obj). As long as the
other two parameters, *encoding* and *error*, remain keyword arguments, I
think this would make sense.

Serhiy Storchaka wrote:
> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.

Hmm, I think this one might require some further consideration. But I will
say that the implicit behavior is not very obvious.

Isn't overly clear, implicit 'utf-8' conversion:
>>> str(b'\xc3\xa1', errors='strict')
'á'

Makes sense, and is highly explicit:
>>> str(b'\xc3\xa1', encoding='utf-8', errors='strict')
'á'

This is also fine ('strict' is a very reasonable default for *errors*)
>>> str(b'\xc3\xa1', encoding='utf-8')
'á'


On a related note though, I'm not a fan of this behavior:
>>> str(b'\xc3\xa1')
"b'\\xc3\\xa1'"

Passing a bytes object to str() without specifying an encoding seems like a
mistake, I honestly don't see how this ("b'\\xc3\\xa1'") would even be
useful in any capacity. I would expect this to instead raise a TypeError,
similar to passing a string to bytes() without specifying an encoding:
>>> bytes('á')
...
TypeError: string argument without an encoding

I'd much prefer to see something like this:
>>> str(b'\xc3\xa1')
...
TypeError: bytes argument without an encoding

Is there some use case for returning "b'\\xc3\\xa1'" from this operation
that I'm not seeing? To me, it seems equally, if not more confusing and
pointless than returning an empty string from str(errors='strict') or some
other combination of *errors* and *encoding* kwargs without passing an
object.

On Sun, Dec 15, 2019 at 9:10 AM Serhiy Storchaka <[email protected]>
wrote:

> Currently str() takes up to 3 arguments. All are optional and
> positional-or-keyword. All combinations are valid:
>
> str()
> str(object=object)
> str(object=buffer, encoding=encoding)
> str(object=buffer, errors=errors)
> str(object=buffer, encoding=encoding, errors=errors)
> str(encoding=encoding)
> str(errors=errors)
> str(encoding=encoding, errors=errors)
>
> The last three are especially surprising. If you do not specify an
> object, str() ignores values of encoding and errors and returns an empty
> string.
>
> bytes() and bytearray() are more limited. Valid combinations are:
>
> bytes()
> bytes(source=object)
> bytes(source=string, encoding=encoding)
> bytes(source=string, encoding=encoding, errors=errors)
>
> I propose several changes:
>
> 1. Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.
>
> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only. Originally this feature was an implementation artifact:
> before 3.6 parameters of a C implemented function should be either all
> positional-only (if used PyArg_ParseTuple), or all keyword (if used
> PyArg_ParseTupleAndKeywords). So str(), bytes() and bytearray() accepted
> the first parameter by keyword. We already made similar changes for
> int(), float(), etc: int(x=42) no longer works.
>
> Unlikely str(object=object) is used in a real code, so we can skip a
> deprecation period for this change too.
>
> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.
> _______________________________________________
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/AJCHCKJR2M7PLB5T2JGQ7R4EPUCP6PSJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Reply via email to