16.12.19 18:35, Guido van Rossum пише:
On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka <storch...@gmail.com
<mailto:storch...@gmail.com>> wrote:
1. Forbids calling str() without object if encoding or errors are
specified. It is very unlikely that this can break a real code, so I
propose to make it an error without a deprecation period.
What problem are you trying to solve with this proposal? I am only -0 on
this, but I am wondering why bother with the churn.
Initially I wanted to check the documentation and the docstrings of
str() and fix it if needed. It was inspired by the Discourse topic [1].
I have found that in contrary to the OP's claim the documentation is
correct, but the docstring is not.
The documentation is correct (because Chris Jerdonek accurately
documented the actual behavior in 2012 [2]), but ambiguous.
str(object='')
str(object=b'', encoding='utf-8', errors='strict')
0- and 1-argument calls match both signatures. Also it implies that
str(encoding='ascii') and str(errors='ignore') are valid, and this is
true! And more, str(encoding='spam') and str(errors='ham') are valid
too, because the values of encoding and errors are ignored. I cannot
imagine a use case for this. It looks like an implementation artifact.
The docstring is left not fixed.
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str
It uses different names for the first parameter (it would not matter if
it would be positional-only), it requires bytes_or_buffer for decoding,
it requires encoding if errors is passed.
So my goal is to remove glitches which are not used in a real code in
any case, and make the behavior closer to the initial intention. If
apply all three my proposition, signatures would look like:
str(object='', /) -> str
str(bytes_or_buffer, /, encoding, errors='strict') -> str
Almost the same as for bytes:
bytes(object=b'', /) -> bytes
bytes(string, /, encoding, errors='strict') -> bytes
[1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866
[2] https://bugs.python.org/issue13538
3. Make encoding required if errors is specified in str(). This will
reduce the number of possible combinations, makes str() more similar to
bytes() and bytearray() and simplify the mental model: if encoding is
specified, then we decode, and the first argument must be a bytes-like
object, otherwise we convert an object to a string using __str__.
I'm -0 on this. It seems that the presence of either errors= or
encoding= causes str() to switch to "decode bytes" semantics, and a
default decoding of UTF-8. That default makes sense: UTF-8 is our
default source encoding, and we are trending to use it as the default in
other places. I doubt that such calls would confuse anyone.
This proposition is the one about which I am not sure. On one side, the
bytes() constructor requires encoding for decoding. On other side, it is
optional in str.encode() and bytes.decode(). But str.encode() and
bytes.decode() have only one function, so you can omit both encoding and
errors without ambiguity.
If we allow str(bytes_or_buffer, errors=errors), should not we allow
also bytes(string, errors=errors)?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/OXPP7HTFU32VXE3LMSICPB57V5KHM4PW/
Code of Conduct: http://python.org/psf/codeofconduct/