16.12.19 18:35, Guido van Rossum пише:
On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka <storch...@gmail.com <mailto:storch...@gmail.com>> wrote:

    1. Forbids calling str() without object if encoding or errors are
    specified. It is very unlikely that this can break a real code, so I
    propose to make it an error without a deprecation period.


What problem are you trying to solve with this proposal? I am only -0 on this, but I am wondering why bother with the churn.

Initially I wanted to check the documentation and the docstrings of str() and fix it if needed. It was inspired by the Discourse topic [1]. I have found that in contrary to the OP's claim the documentation is correct, but the docstring is not.

The documentation is correct (because Chris Jerdonek accurately documented the actual behavior in 2012 [2]), but ambiguous.

    str(object='')
    str(object=b'', encoding='utf-8', errors='strict')

0- and 1-argument calls match both signatures. Also it implies that str(encoding='ascii') and str(errors='ignore') are valid, and this is true! And more, str(encoding='spam') and str(errors='ham') are valid too, because the values of encoding and errors are ignored. I cannot imagine a use case for this. It looks like an implementation artifact.

The docstring is left not fixed.

    str(object='') -> str
    str(bytes_or_buffer[, encoding[, errors]]) -> str

It uses different names for the first parameter (it would not matter if it would be positional-only), it requires bytes_or_buffer for decoding, it requires encoding if errors is passed.

So my goal is to remove glitches which are not used in a real code in any case, and make the behavior closer to the initial intention. If apply all three my proposition, signatures would look like:

    str(object='', /) -> str
    str(bytes_or_buffer, /, encoding, errors='strict') -> str

Almost the same as for bytes:

    bytes(object=b'', /) -> bytes
    bytes(string, /, encoding, errors='strict') -> bytes

[1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
[2] https://bugs.python.org/issue13538


    3. Make encoding required if errors is specified in str(). This will
    reduce the number of possible combinations, makes str() more similar to
    bytes() and bytearray() and simplify the mental model: if encoding is
    specified, then we decode, and the first argument must be a bytes-like
    object, otherwise we convert an object to a string using __str__.


 I'm -0 on this. It seems that the presence of either errors= or encoding= causes str() to switch to "decode bytes" semantics, and a default decoding of UTF-8. That default makes sense: UTF-8 is our default source encoding, and we are trending to use it as the default in other places. I doubt that such calls would confuse anyone.

This proposition is the one about which I am not sure. On one side, the bytes() constructor requires encoding for decoding. On other side, it is optional in str.encode() and bytes.decode(). But str.encode() and bytes.decode() have only one function, so you can omit both encoding and errors without ambiguity.

If we allow str(bytes_or_buffer, errors=errors), should not we allow also bytes(string, errors=errors)?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OXPP7HTFU32VXE3LMSICPB57V5KHM4PW/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to