On Mon, Dec 16, 2019 at 12:04 PM Serhiy Storchaka <storch...@gmail.com> wrote:
> 16.12.19 18:35, Guido van Rossum пише: > > On Sun, Dec 15, 2019 at 6:09 AM Serhiy Storchaka <storch...@gmail.com > > <mailto:storch...@gmail.com>> wrote: > > > > 1. Forbids calling str() without object if encoding or errors are > > specified. It is very unlikely that this can break a real code, so I > > propose to make it an error without a deprecation period. > > > > > > What problem are you trying to solve with this proposal? I am only -0 on > > this, but I am wondering why bother with the churn. > > Initially I wanted to check the documentation and the docstrings of > str() and fix it if needed. It was inspired by the Discourse topic [1]. > I have found that in contrary to the OP's claim the documentation is > correct, but the docstring is not. > So let's fix the docstring. The documentation is correct (because Chris Jerdonek accurately > documented the actual behavior in 2012 [2]), but ambiguous. > > str(object='') > str(object=b'', encoding='utf-8', errors='strict') > Honestly this notation leaves a lot unsaid. Apparently the first form allows `object` to have any type, while the second only allows it to be bytes (or bytearray, or memoryview, or presumably anything that supports the buffer protocol?). And it appears unnecessary to specify a default in the first case -- then the 0-args form would only match the second pattern. > 0- and 1-argument calls match both signatures. Also it implies that > str(encoding='ascii') and str(errors='ignore') are valid, and this is > true! And the docs spell this out clearly enough that I don't see any reason to change it. This is a function that is *so* common that *any* tweak we make to it will break someone's code. > And more, str(encoding='spam') and str(errors='ham') are valid > too, because the values of encoding and errors are ignored. I cannot > imagine a use case for this. It looks like an implementation artifact. > But again one that we can't change. At least for errors='ham', this seems to be the case for all encoding/decoding functions -- the error handler is looked up lazily, and an empty input string doesn't need it. b''.decode(errors="ham") acts the same way. In fact, it's the same for b.decode(encoding='spam'). So str() is not special here, and I recommend keeping it that way. > The docstring is left not fixed. > > str(object='') -> str > str(bytes_or_buffer[, encoding[, errors]]) -> str > > It uses different names for the first parameter (it would not matter if > it would be positional-only), it requires bytes_or_buffer for decoding, > it requires encoding if errors is passed. > > So my goal is to remove glitches which are not used in a real code in > any case, and make the behavior closer to the initial intention. If > apply all three my proposition, signatures would look like: > > str(object='', /) -> str > str(bytes_or_buffer, /, encoding, errors='strict') -> str > > Almost the same as for bytes: > > bytes(object=b'', /) -> bytes > bytes(string, /, encoding, errors='strict') -> bytes > bytes() and str() just aren't each other's opposite -- bytes() really only takes str input, but str() takes any input. So there's always going to be a discrepancy. I now think the current behavior should not change. > [1] https://discuss.python.org/t/str-mybytes-wrong-docs/2866 > [2] https://bugs.python.org/issue13538 > > > > 3. Make encoding required if errors is specified in str(). This will > > reduce the number of possible combinations, makes str() more similar > to > > bytes() and bytearray() and simplify the mental model: if encoding is > > specified, then we decode, and the first argument must be a > bytes-like > > object, otherwise we convert an object to a string using __str__. > > > > > > I'm -0 on this. It seems that the presence of either errors= or > > encoding= causes str() to switch to "decode bytes" semantics, and a > > default decoding of UTF-8. That default makes sense: UTF-8 is our > > default source encoding, and we are trending to use it as the default in > > other places. I doubt that such calls would confuse anyone. > > This proposition is the one about which I am not sure. On one side, the > bytes() constructor requires encoding for decoding. On other side, it is > optional in str.encode() and bytes.decode(). But str.encode() and > bytes.decode() have only one function, so you can omit both encoding and > errors without ambiguity. > > If we allow str(bytes_or_buffer, errors=errors), should not we allow > also bytes(string, errors=errors)? > Not necessarily. There's an old saying in PEP 8 about foolish consistency... -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/G3L5OQ24SCZC5UMJSC7J2TFDWFPCU2K7/ Code of Conduct: http://python.org/psf/codeofconduct/