I don't see any rationale in the PEP or in the python-ideas thread (admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass" there). Is this just for consistency with other methods like .casefold?
I can understand why you'd want it to be consistent, but I think it's misguided in this case. It adds unnecessary complexity for subclass implementers to need to re-implement these two additional methods, and I can see no obvious reason why this behavior would be necessary, since these methods can be implemented in terms of string slicing. Even if you wanted to use `str`-specific optimizations in C that aren't available if you are constrained to use the subclass's __getitem__, it's inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast path" that doesn't use slice. I think defining this in terms of string slicing makes the most sense (and, notably, slice itself returns `str` unless explicitly overridden, the default is for it to return `str` anyway...). Either way, it would be nice to see the rationale included in the PEP somewhere. Best, Paul On 3/22/20 7:16 AM, Eric V. Smith wrote: > On 3/22/2020 1:42 AM, Nick Coghlan wrote: >> On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <c...@cskk.id.au> wrote: >>> On 21Mar2020 12:45, Eric V. Smith <e...@trueblade.com> wrote: >>>> On 3/21/2020 12:39 PM, Victor Stinner wrote: >>>>> Well, if CPython is modified to implement tagged pointers and >>>>> supports >>>>> storing a short strings (a few latin1 characters) as a pointer, it >>>>> may >>>>> become harder to keep the same behavior for "x is y" where x and y >>>>> are >>>>> strings. >>> Are you suggesting that it could become impossible to write this >>> function: >>> >>> def myself(o): >>> return o >>> >>> and not be able to rely on "o is myself(o)"? That seems... a pretty >>> nasty breaking change for the language. >> Other way around - because strings are immutable, their identity isn't >> supposed to matter, so it's possible that functions that currently >> return the exact same object in some cases may in the future start >> returning a different object with the same value. >> >> Right now, in CPython, with no tagged pointers, we return the full >> existing pointer wherever we can, as that saves us a data copy. With >> tagged pointers, the pointer storage effectively *is* the instance, so >> you can't really replicate that existing "copy the reference not the >> storage" behaviour any more. >> >> That said, it's also possible that identity for tagged pointers would >> be value based (similar to the effect of the small integer cache and >> string interning), in which case the entire question would become >> moot. >> >> Either way, the PEP shouldn't be specifying that a new object *must* >> be returned, and it also shouldn't be specifying that the same object >> *can't* be returned. > > Agreed. I think the PEP should say that a str will be returned (in the > event of a subclass, assuming that's what we decide), but if the > argument is exactly a str, that it may or may not return the original > object. > > Eric > > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZU56PWYRJDG45HMRBXE3CBXMX/ > Code of Conduct: http://python.org/psf/codeofconduct/
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4KZYIIXL3HK3C6IJ2ATQ6CM7PG/ Code of Conduct: http://python.org/psf/codeofconduct/