[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Serhiy Storchaka Mon, 16 Dec 2019 12:06:52 -0800

16.12.19 18:35, Guido van Rossum пише:

On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka <storch...@gmail.com<mailto:storch...@gmail.com>> wrote:
    1. Forbids calling str() without object if encoding or errors are
    specified. It is very unlikely that this can break a real code, so I
    propose to make it an error without a deprecation period.
What problem are you trying to solve with this proposal? I am only -0 onthis, but I am wondering why bother with the churn.

Initially I wanted to check the documentation and the docstrings ofstr() and fix it if needed. It was inspired by the Discourse topic [1].I have found that in contrary to the OP's claim the documentation iscorrect, but the docstring is not.

The documentation is correct (because Chris Jerdonek accuratelydocumented the actual behavior in 2012 [2]), but ambiguous.


    str(object='')
    str(object=b'', encoding='utf-8', errors='strict')

0- and 1-argument calls match both signatures. Also it implies thatstr(encoding='ascii') and str(errors='ignore') are valid, and this istrue! And more, str(encoding='spam') and str(errors='ham') are validtoo, because the values of encoding and errors are ignored. I cannotimagine a use case for this. It looks like an implementation artifact.


The docstring is left not fixed.

    str(object='') -> str
    str(bytes_or_buffer[, encoding[, errors]]) -> str

It uses different names for the first parameter (it would not matter ifit would be positional-only), it requires bytes_or_buffer for decoding,it requires encoding if errors is passed.

So my goal is to remove glitches which are not used in a real code inany case, and make the behavior closer to the initial intention. Ifapply all three my proposition, signatures would look like:


    str(object='', /) -> str
    str(bytes_or_buffer, /, encoding, errors='strict') -> str

Almost the same as for bytes:

    bytes(object=b'', /) -> bytes
    bytes(string, /, encoding, errors='strict') -> bytes

[1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
[2] https://bugs.python.org/issue13538

    3. Make encoding required if errors is specified in str(). This will
    reduce the number of possible combinations, makes str() more similar to
    bytes() and bytearray() and simplify the mental model: if encoding is
    specified, then we decode, and the first argument must be a bytes-like
    object, otherwise we convert an object to a string using __str__.
I'm -0 on this. It seems that the presence of either errors= orencoding= causes str() to switch to "decode bytes" semantics, and adefault decoding of UTF-8. That default makes sense: UTF-8 is ourdefault source encoding, and we are trending to use it as the default inother places. I doubt that such calls would confuse anyone.

This proposition is the one about which I am not sure. On one side, thebytes() constructor requires encoding for decoding. On other side, it isoptional in str.encode() and bytes.decode(). But str.encode() andbytes.decode() have only one function, so you can omit both encoding anderrors without ambiguity.

If we allow str(bytes_or_buffer, errors=errors), should not we allowalso bytes(string, errors=errors)?

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OXPP7HTFU32VXE3LMSICPB57V5KHM4PW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Parameters of str(), bytes() and bytearray()

Reply via email to