[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

Barney Gale Sat, 21 Mar 2020 22:48:46 -0700

My 2c on the naming:

'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're
looking for a noun if we want to cut/strip/trim a string. You can use
'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a
more obvious choice in English to me.


Pathlib has `with_suffix()` and `with_name()`, which would give us
something like `without_prefix()` or `without_suffix()` in this case.

I think the name "strip", and the default (no-argument) behaviour of
stripping whitespace implies that the method is used to strip something
down to its bare essentials, like stripping a bed of its covers. Usually
you use strip() to remove whitespace and get to the real important data. I
don't think such an implication holds for removing a *specific*
prefix/suffix.

I also don't much like "strip" as the semantics are quite different - if
i'm understanding correctly, we're removing a *single* instance of a
*single* *multi-character* string. A verb like "trim" or "cut" seems
appropriate to highlight that difference.

Barney



On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney <sweeney.dennis...@gmail.com>
wrote:

> Browser Link: https://www.python.org/dev/peps/pep-0616/
>
> PEP: 616
> Title: String methods to remove prefixes and suffixes
> Author: Dennis Sweeney <sweeney.dennis...@gmail.com>
> Sponsor: Eric V. Smith <e...@trueblade.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 19-Mar-2020
> Python-Version: 3.9
> Post-History: 30-Aug-2002
>
>
> Abstract
> ========
>
> This is a proposal to add two new methods, ``cutprefix`` and
> ``cutsuffix``, to the APIs of Python's various string objects.  In
> particular, the methods would be added to Unicode ``str`` objects,
> binary ``bytes`` and ``bytearray`` objects, and
> ``collections.UserString``.
>
> If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
> ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
> been removed.  If ``s`` does not have ``pre`` as a prefix, an
> unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
> is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
>
> The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
> roughly equivalent to
> ``s[:-len(suf)] if suf and s.endswith(suf) else s``.
>
>
> Rationale
> =========
>
> There have been repeated issues [#confusion]_ on the Bug Tracker
> and StackOverflow related to user confusion about the existing
> ``str.lstrip`` and ``str.rstrip`` methods.  These users are typically
> expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they
> are surprised that the parameter for ``lstrip`` is interpreted as a
> set of characters, not a substring.  This repeated issue is evidence
> that these methods are useful, and the new methods allow a cleaner
> redirection of users to the desired behavior.
>
> As another testimonial for the usefulness of these methods, several
> users on Python-Ideas [#pyid]_ reported frequently including similar
> functions in their own code for productivity.  The implementation
> often contained subtle mistakes regarding the handling of the empty
> string (see `Specification`_).
>
>
> Specification
> =============
>
> The builtin ``str`` class will gain two new methods with roughly the
> following behavior::
>
>     def cutprefix(self: str, pre: str, /) -> str:
>         if self.startswith(pre):
>             return self[len(pre):]
>         return self[:]
>
>     def cutsuffix(self: str, suf: str, /) -> str:
>         if suf and self.endswith(suf):
>             return self[:-len(suf)]
>         return self[:]
>
> The only difference between the real implementation and the above is
> that, as with other string methods like ``replace``, the
> methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
> ``suf`` is not an instace of ``str``, and will cast subclasses of
> ``str`` to builtin ``str`` objects.
>
> Note that without the check for the truthyness of ``suf``,
> ``s.cutsuffix('')`` would be mishandled and always return the empty
> string due to the unintended evaluation of ``self[:-0]``.
>
> Methods with the corresponding semantics will be added to the builtin
> ``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
> or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
> will accept any bytes-like object as an argument.
>
> Note that the ``bytearray`` methods return a copy of ``self``; they do
> not operate in place.
>
> The following behavior is considered a CPython implementation detail,
> but is not guaranteed by this specification::
>
>     >>> x = 'foobar' * 10**6
>     >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
>     True
>     >>> x.cutprefix('') is x is x.cutsuffix('')
>     True
>
> That is, for CPython's immutable ``str`` and ``bytes`` objects, the
> methods return the original object when the affix is not found or if
> the affix is empty.  Because these types test for equality using
> shortcuts for identity and length, the following equivalent
> expressions are evaluated at approximately the same speed, for any
> ``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
>
>     >>> (True, x[len(y):]) if x.startswith(y) else (False, x)
>     >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
>
>
> The two methods will also be added to ``collections.UserString``,
> where they rely on the implementation of the new ``str`` methods.
>
>
> Motivating examples from the Python standard library
> ====================================================
>
> The examples below demonstrate how the proposed methods can make code
> one or more of the following:
>
>     Less fragile:
>         The code will not depend on the user to count the length of a
>         literal.
>     More performant:
>         The code does not require a call to the Python built-in
>         ``len`` function.
>     More descriptive:
>         The methods give a higher-level API for code readability, as
>         opposed to the traditional method of string slicing.
>
>
> refactor.py
> -----------
>
> - Current::
>
>     if fix_name.startswith(self.FILE_PREFIX):
>         fix_name = fix_name[len(self.FILE_PREFIX):]
>
> - Improved::
>
>     fix_name = fix_name.cutprefix(self.FILE_PREFIX)
>
>
> c_annotations.py:
> -----------------
>
> - Current::
>
>     if name.startswith("c."):
>         name = name[2:]
>
> - Improved::
>
>     name = name.cutprefix("c.")
>
>
> find_recursionlimit.py
> ----------------------
>
> - Current::
>
>     if test_func_name.startswith("test_"):
>         print(test_func_name[5:])
>     else:
>         print(test_func_name)
>
> - Improved::
>
>     print(test_finc_name.cutprefix("test_"))
>
> deccheck.py
> -----------
>
> This is an interesting case because the author chose to use the
> ``str.replace`` method in a situation where only a prefix was
> intended to be removed.
>
> - Current::
>
>     if funcname.startswith("context."):
>         self.funcname = funcname.replace("context.", "")
>         self.contextfunc = True
>     else:
>         self.funcname = funcname
>         self.contextfunc = False
>
> - Improved::
>
>     if funcname.startswith("context."):
>         self.funcname = funcname.cutprefix("context.")
>         self.contextfunc = True
>     else:
>         self.funcname = funcname
>         self.contextfunc = False
>
> - Arguably further improved::
>
>     self.contextfunc = funcname.startswith("context.")
>     self.funcname = funcname.cutprefix("context.")
>
>
> test_i18n.py
> ------------
>
> - Current::
>
>     if test_func_name.startswith("test_"):
>         print(test_func_name[5:])
>     else:
>         print(test_func_name)
>
> - Improved::
>
>     print(test_finc_name.cutprefix("test_"))
>
> - Current::
>
>     if creationDate.endswith('\\n'):
>         creationDate = creationDate[:-len('\\n')]
>
> - Improved::
>
>     creationDate = creationDate.cutsuffix('\\n')
>
>
> shared_memory.py
> ----------------
>
> - Current::
>
>     reported_name = self._name
>     if _USE_POSIX and self._prepend_leading_slash:
>         if self._name.startswith("/"):
>             reported_name = self._name[1:]
>     return reported_name
>
> - Improved::
>
>     if _USE_POSIX and self._prepend_leading_slash:
>         return self._name.cutprefix("/")
>     return self._name
>
>
> build-installer.py
> ------------------
>
> - Current::
>
>     if archiveName.endswith('.tar.gz'):
>         retval = os.path.basename(archiveName[:-7])
>         if ((retval.startswith('tcl') or retval.startswith('tk'))
>                 and retval.endswith('-src')):
>             retval = retval[:-4]
>
> - Improved::
>
>     if archiveName.endswith('.tar.gz'):
>         retval = os.path.basename(archiveName[:-7])
>         if retval.startswith(('tcl', 'tk')):
>             retval = retval.cutsuffix('-src')
>
> Depending on personal style, ``archiveName[:-7]`` could also be
> changed to ``archiveName.cutsuffix('.tar.gz')``.
>
>
> test_core.py
> ------------
>
> - Current::
>
>     if output.endswith("\n"):
>         output = output[:-1]
>
> - Improved::
>
>     output = output.cutsuffix("\n")
>
>
> cookiejar.py
> ------------
>
> - Current::
>
>     def strip_quotes(text):
>         if text.startswith('"'):
>             text = text[1:]
>         if text.endswith('"'):
>             text = text[:-1]
>         return text
>
> - Improved::
>
>     def strip_quotes(text):
>         return text.cutprefix('"').cutsuffix('"')
>
> - Current::
>
>     if line.endswith("\n"): line = line[:-1]
>
> - Improved::
>
>     line = line.cutsuffix("\n")
>
>
> fixdiv.py
> ---------
>
> - Current::
>
>     def chop(line):
>         if line.endswith("\n"):
>             return line[:-1]
>         else:
>             return line
>
> - Improved::
>
>     def chop(line):
>         return line.cutsuffix("\n")
>
>
> test_concurrent_futures.py
> --------------------------
>
> In the following example, the meaning of the code changes slightly,
> but in context, it behaves the same.
>
> - Current::
>
>     if name.endswith(('Mixin', 'Tests')):
>         return name[:-5]
>     elif name.endswith('Test'):
>         return name[:-4]
>     else:
>         return name
>
> - Improved::
>
>     return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
>
>
> msvc9compiler.py
> ----------------
>
> - Current::
>
>     if value.endswith(os.pathsep):
>         value = value[:-1]
>
> - Improved::
>
>     value = value.cutsuffix(os.pathsep)
>
>
> test_pathlib.py
> ---------------
>
> - Current::
>
>     self.assertTrue(r.startswith(clsname + '('), r)
>     self.assertTrue(r.endswith(')'), r)
>     inner = r[len(clsname) + 1 : -1]
>
> - Improved::
>
>     self.assertTrue(r.startswith(clsname + '('), r)
>     self.assertTrue(r.endswith(')'), r)
>     inner = r.cutprefix(clsname + '(').cutsuffix(')')
>
>
>
> Rejected Ideas
> ==============
>
> Expand the lstrip and rstrip APIs
> ---------------------------------
>
> Because ``lstrip`` takes a string as its argument, it could be viewed
> as taking an iterable of length-1 strings.  The API could therefore be
> generalized to accept any iterable of strings, which would be
> successively removed as prefixes.  While this behavior would be
> consistent, it would not be obvious for users to have to call
> ``'foobar'.cutprefix(('foo,))`` for the common use case of a
> single prefix.
>
> Allow multiple prefixes
> -----------------------
>
> Some users discussed the desire to be able to remove multiple
> prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
> However, this adds ambiguity about the order in which the prefixes are
> removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
> After this proposal, this can be spelled explicitly as
> ``s.cutprefix('Foo').cutprefix('FooBar')``.
>
> Remove multiple copies of a prefix
> ----------------------------------
>
> This is the behavior that would be consistent with the aforementioned
> expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
> function until the argument is unchanged.  This behavior is attainable
> from the proposed behavior via the following::
>
>     >>> s = 'foo' * 100 + 'bar'
>     >>> while s != (s := s.cutprefix("foo")): pass
>     >>> s
>     'bar'
>
> The above can be modififed by chaining multiple ``cutprefix`` calls
> together to achieve the full behavior of the ``lstrip``/``rstrip``
> generalization, while being explicit in the order of removal.
>
> While the proposed API could later be extended to include some of
> these use cases, to do so before any observation of how these methods
> are used in practice would be premature and may lead to choosing the
> wrong behavior.
>
>
> Raising an exception when not found
> -----------------------------------
>
> There was a suggestion that ``s.cutprefix(pre)`` should raise an
> exception if ``not s.startswith(pre)``.  However, this does not match
> with the behavior and feel of other string methods.  There could be
> ``required=False`` keyword added, but this violates the KISS
> principle.
>
>
> Alternative Method Names
> ------------------------
>
> Several alternatives method names have been proposed.  Some are listed
> below, along with commentary for why they should be rejected in favor
> of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
>
>     ``ltrim``
>         "Trim" does in other languages (e.g. JavaScript, Java, Go,
>         PHP) what ``strip`` methods do in Python.
>     ``lstrip(string=...)``
>         This would avoid adding a new method, but for different
>         behavior, it's better to have two different methods than one
>         method with a keyword argument that select the behavior.
>     ``cut_prefix``
>         All of the other methods of the string API, e.g.
>         ``str.startswith()``, use ``lowercase`` rather than
>         ``lower_case_with_underscores``.
>     ``cutleft``, ``leftcut``, or ``lcut``
>         The explicitness of "prefix" is preferred.
>     ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
>         All of these might have been acceptable, but they have more
>         characters than ``cut``.  Some suggested that the verb "cut"
>         implies mutability, but the string API already contains verbs
>         like "replace", "strip", "split", and "swapcase".
>     ``stripprefix``
>         Users may benefit from the mnemonic that "strip" means working
>         with sets of characters, while other methods work with
>         substrings, so re-using "strip" here should be avoided.
>
>
> Reference Implementation
> ========================
>
> See the pull request on GitHub [#pr]_.
>
>
> References
> ==========
>
> .. [#pr] GitHub pull request with implementation
>    (https://github.com/python/cpython/pull/18939)
> .. [#pyid] Discussion on Python-Ideas
>    (
> https://mail.python.org/archives/list/python-id...@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/
> )
> .. [#confusion] Comment listing Bug Tracker and StackOverflow issues
>    (
> https://mail.python.org/archives/list/python-id...@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/
> )
>
>
> Copyright
> =========
>
> This document is placed in the public domain or under the
> CC0-1.0-Universal license, whichever is more permissive.
>
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/WFEWPAOVXOGY7UDXSKMYLVWBCFSHI3VT/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IC352XZBRIQA6EHQKKK2BTFANPRUEJIV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 616 -- String methods to remove prefixes and suffixes

Reply via email to