[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Guido van Rossum Mon, 16 Dec 2019 20:40:41 -0800

On Mon, Dec 16, 2019 at 12:04 PM Serhiy Storchaka <storch...@gmail.com>
wrote:


> 16.12.19 18:35, Guido van Rossum пише:
> > On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka <storch...@gmail.com
> > <mailto:storch...@gmail.com>> wrote:
> >
> >     1. Forbids calling str() without object if encoding or errors are
> >     specified. It is very unlikely that this can break a real code, so I
> >     propose to make it an error without a deprecation period.
> >
> >
> > What problem are you trying to solve with this proposal? I am only -0 on
> > this, but I am wondering why bother with the churn.
>
> Initially I wanted to check the documentation and the docstrings of
> str() and fix it if needed. It was inspired by the Discourse topic [1].
> I have found that in contrary to the OP's claim the documentation is
> correct, but the docstring is not.
>

So let's fix the docstring.

The documentation is correct (because Chris Jerdonek accurately
> documented the actual behavior in 2012 [2]), but ambiguous.
>
>      str(object='')
>      str(object=b'', encoding='utf-8', errors='strict')
>

Honestly this notation leaves a lot unsaid. Apparently the first form
allows `object` to have any type, while the second only allows it to be
bytes (or bytearray, or memoryview, or presumably anything that supports
the buffer protocol?). And it appears unnecessary to specify a default in
the first case -- then the 0-args form would only match the second pattern.


> 0- and 1-argument calls match both signatures. Also it implies that
> str(encoding='ascii') and str(errors='ignore') are valid, and this is
> true!


And the docs spell this out clearly enough that I don't see any reason to
change it. This is a function that is *so* common that *any* tweak we make
to it will break someone's code.


> And more, str(encoding='spam') and str(errors='ham') are valid
> too, because the values of encoding and errors are ignored. I cannot
> imagine a use case for this. It looks like an implementation artifact.
>

But again one that we can't change.

At least for errors='ham', this seems to be the case for all
encoding/decoding functions -- the error handler is looked up lazily, and
an empty input string doesn't need it. b''.decode(errors="ham") acts the
same way.

In fact, it's the same for b.decode(encoding='spam'). So str() is not
special here, and I recommend keeping it that way.


> The docstring is left not fixed.
>
>      str(object='') -> str
>      str(bytes_or_buffer[, encoding[, errors]]) -> str
>
> It uses different names for the first parameter (it would not matter if
> it would be positional-only), it requires bytes_or_buffer for decoding,
> it requires encoding if errors is passed.
>
> So my goal is to remove glitches which are not used in a real code in
> any case, and make the behavior closer to the initial intention.  If
> apply all three my proposition, signatures would look like:
>
>      str(object='', /) -> str
>      str(bytes_or_buffer, /, encoding, errors='strict') -> str
>
> Almost the same as for bytes:
>
>      bytes(object=b'', /) -> bytes
>      bytes(string, /, encoding, errors='strict') -> bytes
>

bytes() and str() just aren't each other's opposite -- bytes() really only
takes str input, but str() takes any input. So there's always going to be a
discrepancy. I now think the current behavior should not change.


> [1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
> [2] https://bugs.python.org/issue13538
>
>
> >     3. Make encoding required if errors is specified in str(). This will
> >     reduce the number of possible combinations, makes str() more similar
> to
> >     bytes() and bytearray() and simplify the mental model: if encoding is
> >     specified, then we decode, and the first argument must be a
> bytes-like
> >     object, otherwise we convert an object to a string using __str__.
> >
> >
> >   I'm -0 on this. It seems that the presence of either errors= or
> > encoding= causes str() to switch to "decode bytes" semantics, and a
> > default decoding of UTF-8. That default makes sense: UTF-8 is our
> > default source encoding, and we are trending to use it as the default in
> > other places. I doubt that such calls would confuse anyone.
>
> This proposition is the one about which I am not sure. On one side, the
> bytes() constructor requires encoding for decoding. On other side, it is
> optional in str.encode() and bytes.decode(). But str.encode() and
> bytes.decode() have only one function, so you can omit both encoding and
> errors without ambiguity.
>
> If we allow str(bytes_or_buffer, errors=errors), should not we allow
> also bytes(string, errors=errors)?
>

Not necessarily. There's an old saying in PEP 8 about foolish consistency...

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G3L5OQ24SCZC5UMJSC7J2TFDWFPCU2K7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Reply via email to