My 2c on the naming: 'start' and 'end' in 'startswith' and 'endswith' are verbs, whereas we're looking for a noun if we want to cut/strip/trim a string. You can use 'start' and 'end' as nouns for this case but 'prefix' and 'suffix' seems a more obvious choice in English to me.
Pathlib has `with_suffix()` and `with_name()`, which would give us something like `without_prefix()` or `without_suffix()` in this case. I think the name "strip", and the default (no-argument) behaviour of stripping whitespace implies that the method is used to strip something down to its bare essentials, like stripping a bed of its covers. Usually you use strip() to remove whitespace and get to the real important data. I don't think such an implication holds for removing a *specific* prefix/suffix. I also don't much like "strip" as the semantics are quite different - if i'm understanding correctly, we're removing a *single* instance of a *single* *multi-character* string. A verb like "trim" or "cut" seems appropriate to highlight that difference. Barney On Fri, 20 Mar 2020 at 18:59, Dennis Sweeney <sweeney.dennis...@gmail.com> wrote: > Browser Link: https://www.python.org/dev/peps/pep-0616/ > > PEP: 616 > Title: String methods to remove prefixes and suffixes > Author: Dennis Sweeney <sweeney.dennis...@gmail.com> > Sponsor: Eric V. Smith <e...@trueblade.com> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 19-Mar-2020 > Python-Version: 3.9 > Post-History: 30-Aug-2002 > > > Abstract > ======== > > This is a proposal to add two new methods, ``cutprefix`` and > ``cutsuffix``, to the APIs of Python's various string objects. In > particular, the methods would be added to Unicode ``str`` objects, > binary ``bytes`` and ``bytearray`` objects, and > ``collections.UserString``. > > If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then > ``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has > been removed. If ``s`` does not have ``pre`` as a prefix, an > unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` > is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. > > The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is > roughly equivalent to > ``s[:-len(suf)] if suf and s.endswith(suf) else s``. > > > Rationale > ========= > > There have been repeated issues [#confusion]_ on the Bug Tracker > and StackOverflow related to user confusion about the existing > ``str.lstrip`` and ``str.rstrip`` methods. These users are typically > expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they > are surprised that the parameter for ``lstrip`` is interpreted as a > set of characters, not a substring. This repeated issue is evidence > that these methods are useful, and the new methods allow a cleaner > redirection of users to the desired behavior. > > As another testimonial for the usefulness of these methods, several > users on Python-Ideas [#pyid]_ reported frequently including similar > functions in their own code for productivity. The implementation > often contained subtle mistakes regarding the handling of the empty > string (see `Specification`_). > > > Specification > ============= > > The builtin ``str`` class will gain two new methods with roughly the > following behavior:: > > def cutprefix(self: str, pre: str, /) -> str: > if self.startswith(pre): > return self[len(pre):] > return self[:] > > def cutsuffix(self: str, suf: str, /) -> str: > if suf and self.endswith(suf): > return self[:-len(suf)] > return self[:] > > The only difference between the real implementation and the above is > that, as with other string methods like ``replace``, the > methods will raise a ``TypeError`` if any of ``self``, ``pre`` or > ``suf`` is not an instace of ``str``, and will cast subclasses of > ``str`` to builtin ``str`` objects. > > Note that without the check for the truthyness of ``suf``, > ``s.cutsuffix('')`` would be mishandled and always return the empty > string due to the unintended evaluation of ``self[:-0]``. > > Methods with the corresponding semantics will be added to the builtin > ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` > or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` > will accept any bytes-like object as an argument. > > Note that the ``bytearray`` methods return a copy of ``self``; they do > not operate in place. > > The following behavior is considered a CPython implementation detail, > but is not guaranteed by this specification:: > > >>> x = 'foobar' * 10**6 > >>> x.cutprefix('baz') is x is x.cutsuffix('baz') > True > >>> x.cutprefix('') is x is x.cutsuffix('') > True > > That is, for CPython's immutable ``str`` and ``bytes`` objects, the > methods return the original object when the affix is not found or if > the affix is empty. Because these types test for equality using > shortcuts for identity and length, the following equivalent > expressions are evaluated at approximately the same speed, for any > ``str`` objects (or ``bytes`` objects) ``x`` and ``y``:: > > >>> (True, x[len(y):]) if x.startswith(y) else (False, x) > >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x) > > > The two methods will also be added to ``collections.UserString``, > where they rely on the implementation of the new ``str`` methods. > > > Motivating examples from the Python standard library > ==================================================== > > The examples below demonstrate how the proposed methods can make code > one or more of the following: > > Less fragile: > The code will not depend on the user to count the length of a > literal. > More performant: > The code does not require a call to the Python built-in > ``len`` function. > More descriptive: > The methods give a higher-level API for code readability, as > opposed to the traditional method of string slicing. > > > refactor.py > ----------- > > - Current:: > > if fix_name.startswith(self.FILE_PREFIX): > fix_name = fix_name[len(self.FILE_PREFIX):] > > - Improved:: > > fix_name = fix_name.cutprefix(self.FILE_PREFIX) > > > c_annotations.py: > ----------------- > > - Current:: > > if name.startswith("c."): > name = name[2:] > > - Improved:: > > name = name.cutprefix("c.") > > > find_recursionlimit.py > ---------------------- > > - Current:: > > if test_func_name.startswith("test_"): > print(test_func_name[5:]) > else: > print(test_func_name) > > - Improved:: > > print(test_finc_name.cutprefix("test_")) > > deccheck.py > ----------- > > This is an interesting case because the author chose to use the > ``str.replace`` method in a situation where only a prefix was > intended to be removed. > > - Current:: > > if funcname.startswith("context."): > self.funcname = funcname.replace("context.", "") > self.contextfunc = True > else: > self.funcname = funcname > self.contextfunc = False > > - Improved:: > > if funcname.startswith("context."): > self.funcname = funcname.cutprefix("context.") > self.contextfunc = True > else: > self.funcname = funcname > self.contextfunc = False > > - Arguably further improved:: > > self.contextfunc = funcname.startswith("context.") > self.funcname = funcname.cutprefix("context.") > > > test_i18n.py > ------------ > > - Current:: > > if test_func_name.startswith("test_"): > print(test_func_name[5:]) > else: > print(test_func_name) > > - Improved:: > > print(test_finc_name.cutprefix("test_")) > > - Current:: > > if creationDate.endswith('\\n'): > creationDate = creationDate[:-len('\\n')] > > - Improved:: > > creationDate = creationDate.cutsuffix('\\n') > > > shared_memory.py > ---------------- > > - Current:: > > reported_name = self._name > if _USE_POSIX and self._prepend_leading_slash: > if self._name.startswith("/"): > reported_name = self._name[1:] > return reported_name > > - Improved:: > > if _USE_POSIX and self._prepend_leading_slash: > return self._name.cutprefix("/") > return self._name > > > build-installer.py > ------------------ > > - Current:: > > if archiveName.endswith('.tar.gz'): > retval = os.path.basename(archiveName[:-7]) > if ((retval.startswith('tcl') or retval.startswith('tk')) > and retval.endswith('-src')): > retval = retval[:-4] > > - Improved:: > > if archiveName.endswith('.tar.gz'): > retval = os.path.basename(archiveName[:-7]) > if retval.startswith(('tcl', 'tk')): > retval = retval.cutsuffix('-src') > > Depending on personal style, ``archiveName[:-7]`` could also be > changed to ``archiveName.cutsuffix('.tar.gz')``. > > > test_core.py > ------------ > > - Current:: > > if output.endswith("\n"): > output = output[:-1] > > - Improved:: > > output = output.cutsuffix("\n") > > > cookiejar.py > ------------ > > - Current:: > > def strip_quotes(text): > if text.startswith('"'): > text = text[1:] > if text.endswith('"'): > text = text[:-1] > return text > > - Improved:: > > def strip_quotes(text): > return text.cutprefix('"').cutsuffix('"') > > - Current:: > > if line.endswith("\n"): line = line[:-1] > > - Improved:: > > line = line.cutsuffix("\n") > > > fixdiv.py > --------- > > - Current:: > > def chop(line): > if line.endswith("\n"): > return line[:-1] > else: > return line > > - Improved:: > > def chop(line): > return line.cutsuffix("\n") > > > test_concurrent_futures.py > -------------------------- > > In the following example, the meaning of the code changes slightly, > but in context, it behaves the same. > > - Current:: > > if name.endswith(('Mixin', 'Tests')): > return name[:-5] > elif name.endswith('Test'): > return name[:-4] > else: > return name > > - Improved:: > > return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test') > > > msvc9compiler.py > ---------------- > > - Current:: > > if value.endswith(os.pathsep): > value = value[:-1] > > - Improved:: > > value = value.cutsuffix(os.pathsep) > > > test_pathlib.py > --------------- > > - Current:: > > self.assertTrue(r.startswith(clsname + '('), r) > self.assertTrue(r.endswith(')'), r) > inner = r[len(clsname) + 1 : -1] > > - Improved:: > > self.assertTrue(r.startswith(clsname + '('), r) > self.assertTrue(r.endswith(')'), r) > inner = r.cutprefix(clsname + '(').cutsuffix(')') > > > > Rejected Ideas > ============== > > Expand the lstrip and rstrip APIs > --------------------------------- > > Because ``lstrip`` takes a string as its argument, it could be viewed > as taking an iterable of length-1 strings. The API could therefore be > generalized to accept any iterable of strings, which would be > successively removed as prefixes. While this behavior would be > consistent, it would not be obvious for users to have to call > ``'foobar'.cutprefix(('foo,))`` for the common use case of a > single prefix. > > Allow multiple prefixes > ----------------------- > > Some users discussed the desire to be able to remove multiple > prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. > However, this adds ambiguity about the order in which the prefixes are > removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. > After this proposal, this can be spelled explicitly as > ``s.cutprefix('Foo').cutprefix('FooBar')``. > > Remove multiple copies of a prefix > ---------------------------------- > > This is the behavior that would be consistent with the aforementioned > expansion of the ``lstrip/rstrip`` API -- repeatedly applying the > function until the argument is unchanged. This behavior is attainable > from the proposed behavior via the following:: > > >>> s = 'foo' * 100 + 'bar' > >>> while s != (s := s.cutprefix("foo")): pass > >>> s > 'bar' > > The above can be modififed by chaining multiple ``cutprefix`` calls > together to achieve the full behavior of the ``lstrip``/``rstrip`` > generalization, while being explicit in the order of removal. > > While the proposed API could later be extended to include some of > these use cases, to do so before any observation of how these methods > are used in practice would be premature and may lead to choosing the > wrong behavior. > > > Raising an exception when not found > ----------------------------------- > > There was a suggestion that ``s.cutprefix(pre)`` should raise an > exception if ``not s.startswith(pre)``. However, this does not match > with the behavior and feel of other string methods. There could be > ``required=False`` keyword added, but this violates the KISS > principle. > > > Alternative Method Names > ------------------------ > > Several alternatives method names have been proposed. Some are listed > below, along with commentary for why they should be rejected in favor > of ``cutprefix`` (the same arguments hold for ``cutsuffix``) > > ``ltrim`` > "Trim" does in other languages (e.g. JavaScript, Java, Go, > PHP) what ``strip`` methods do in Python. > ``lstrip(string=...)`` > This would avoid adding a new method, but for different > behavior, it's better to have two different methods than one > method with a keyword argument that select the behavior. > ``cut_prefix`` > All of the other methods of the string API, e.g. > ``str.startswith()``, use ``lowercase`` rather than > ``lower_case_with_underscores``. > ``cutleft``, ``leftcut``, or ``lcut`` > The explicitness of "prefix" is preferred. > ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. > All of these might have been acceptable, but they have more > characters than ``cut``. Some suggested that the verb "cut" > implies mutability, but the string API already contains verbs > like "replace", "strip", "split", and "swapcase". > ``stripprefix`` > Users may benefit from the mnemonic that "strip" means working > with sets of characters, while other methods work with > substrings, so re-using "strip" here should be avoided. > > > Reference Implementation > ======================== > > See the pull request on GitHub [#pr]_. > > > References > ========== > > .. [#pr] GitHub pull request with implementation > (https://github.com/python/cpython/pull/18939) > .. [#pyid] Discussion on Python-Ideas > ( > https://mail.python.org/archives/list/python-id...@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/ > ) > .. [#confusion] Comment listing Bug Tracker and StackOverflow issues > ( > https://mail.python.org/archives/list/python-id...@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/ > ) > > > Copyright > ========= > > This document is placed in the public domain or under the > CC0-1.0-Universal license, whichever is more permissive. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/WFEWPAOVXOGY7UDXSKMYLVWBCFSHI3VT/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IC352XZBRIQA6EHQKKK2BTFANPRUEJIV/ Code of Conduct: http://python.org/psf/codeofconduct/