[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

Steven D'Aprano Mon, 23 Mar 2020 18:46:03 -0700

On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:

> Changes:
>     - More complete Python implementation to match what the type checking in 
> the C implementation would be
>     - Clarified that returning ``self`` is an optimization
>     - Added links to past discussions on Python-Ideas and Python-Dev
>     - Specified ability to accept a tuple of strings


I am concerned about that tuple of strings feature.

First, an implementation question: you do this when the prefix is a 
tuple:

            if isinstance(prefix, tuple):
                for option in tuple(prefix):
                    if not isinstance(option, str):
                        raise TypeError()
                    option_str = str(option)

which looks like two unnecessary copies:

1. Having confirmed that `prefix` is a tuple, you call tuple() to 
   make a copy of it in order to iterate over it. Why?

2. Having confirmed that option is a string, you call str() on
   it to (potentially) make a copy. Why?


Aside from those questions about the reference implementation, I am 
concerned about the feature itself. No other string method that returns 
a modified copy of the string takes a tuple of alternatives.

* startswith and endswith do take a tuple of (pre/suff)ixes, but they
  don't return a modified copy; they just return a True or False flag;

* replace does return a modified copy, and only takes a single 
  substring at a time;

* find/index/partition/split etc don't accept multiple substrings 
  to search for.

That makes startswith/endswith the unusual ones, and we should be 
conservative before emulating them.

The difficulty here is that the notion of "cut one of these prefixes" is 
ambiguous if two or more of the prefixes match. It doesn't matter for 
startswith:

    "extraordinary".startswith(('ex', 'extra'))

since it is True whether you match left-to-right, shortest-to-largest, 
or even in random order. But for cutprefix, which prefix should be 
deleted?

Of course we can make a ruling by fiat, right now, and declare that it 
will cut the first matching prefix reading left to right, whether that's 
what users expect or not. That seems reasonable when your prefixes are 
hard-coded in the source, as above.

But what happens here?

    prefixes = get_prefixes('user.config')
    result = mystring.cutprefix(prefixes)

Whatever decision we make -- delete the shortest match, longest match, 
first match, last match -- we're going to surprise and annoy the people 
who expected one of the other behaviours.

This is why replace() still only takes a single substring to match and 
this isn't supported:

    "extraordinary".replace(('ex', 'extra'), '')

We ought to get some real-life exposure to the simple case first, before 
adding support for multiple prefixes/suffixes.


-- 
Steven
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GPXSIDLKTI6WKH5EKJWZEG5KR4AQ6P3J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

Reply via email to