I don't see any rationale in the PEP or in the python-ideas thread
(admittedly I didn't read the whole thing, I just Ctrl + F-ed "subclass"
there). Is this just for consistency with other methods like .casefold?

I can understand why you'd want it to be consistent, but I think it's
misguided in this case. It adds unnecessary complexity for subclass
implementers to need to re-implement these two additional methods, and I
can see no obvious reason why this behavior would be necessary, since
these methods can be implemented in terms of string slicing.

Even if you wanted to use `str`-specific optimizations in C that aren't
available if you are constrained to use the subclass's __getitem__, it's
inexpensive to add a "PyUnicode_CheckExact(self)" check to hit a "fast
path" that doesn't use slice.

I think defining this in terms of string slicing makes the most sense
(and, notably, slice itself returns `str` unless explicitly overridden,
the default is for it to return `str` anyway...).

Either way, it would be nice to see the rationale included in the PEP
somewhere.

Best,
Paul

On 3/22/20 7:16 AM, Eric V. Smith wrote:
> On 3/22/2020 1:42 AM, Nick Coghlan wrote:
>> On Sun, 22 Mar 2020 at 15:13, Cameron Simpson <c...@cskk.id.au> wrote:
>>> On 21Mar2020 12:45, Eric V. Smith <e...@trueblade.com> wrote:
>>>> On 3/21/2020 12:39 PM, Victor Stinner wrote:
>>>>> Well, if CPython is modified to implement tagged pointers and
>>>>> supports
>>>>> storing a short strings (a few latin1 characters) as a pointer, it
>>>>> may
>>>>> become harder to keep the same behavior for "x is y" where x and y
>>>>> are
>>>>> strings.
>>> Are you suggesting that it could become impossible to write this
>>> function:
>>>
>>>      def myself(o):
>>>          return o
>>>
>>> and not be able to rely on "o is myself(o)"? That seems... a pretty
>>> nasty breaking change for the language.
>> Other way around - because strings are immutable, their identity isn't
>> supposed to matter, so it's possible that functions that currently
>> return the exact same object in some cases may in the future start
>> returning a different object with the same value.
>>
>> Right now, in CPython, with no tagged pointers, we return the full
>> existing pointer wherever we can, as that saves us a data copy. With
>> tagged pointers, the pointer storage effectively *is* the instance, so
>> you can't really replicate that existing "copy the reference not the
>> storage" behaviour any more.
>>
>> That said, it's also possible that identity for tagged pointers would
>> be value based (similar to the effect of the small integer cache and
>> string interning), in which case the entire question would become
>> moot.
>>
>> Either way, the PEP shouldn't be specifying that a new object *must*
>> be returned, and it also shouldn't be specifying that the same object
>> *can't* be returned.
>
> Agreed. I think the PEP should say that a str will be returned (in the
> event of a subclass, assuming that's what we decide), but if the
> argument is exactly a str, that it may or may not return the original
> object.
>
> Eric
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/JHM7T6JZU56PWYRJDG45HMRBXE3CBXMX/
> Code of Conduct: http://python.org/psf/codeofconduct/

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RTQWEE4KZYIIXL3HK3C6IJ2ATQ6CM7PG/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to