Re: [Python-ideas] Complicate str methods

Terry Reedy Sat, 03 Feb 2018 15:44:23 -0800

On 2/3/2018 5:04 PM, Franklin? Lee wrote:

Let s be a str. I propose to allow these existing str methods to takeparams in new forms.

Thanks for the honest title. As you sort of indicate, these can all bedone with re module. However, you imply loops are needed besides, whichis mostly not true. Your complications mostly translate to existingcalls and hence are not needed.

Perhaps 'Regular Expression HOWTO' could use more examples, or even asection on generalizing string methinds. Perhaps the string method docneeds suggestion to use re for multiple string args and references tothe re howto. Please consider taking a look at both.


>>> import re

s.replace(old, new):
     Allow passing in a collection of olds.


>>> re.sub('Franklin|Lee', 'user', 'Franklin? Lee')
'user? user'

Remembering the name change is a nuisance

     Allow passing in a single argument, a mapping of olds to news.

This needs to be a separate function, say 'dictsub', that joins the keyswith '|' and calls re.sub with a function that does the lookup as the2nd parameter. This would be a nice example for the howto.

As you noted, this is generalization of str.translate, and might beproposed as a new re module function.

     Allow the olds in the mapping to be tuples of strings.


A minor addition to dictsub.

s.split(sep), s.rsplit, s.partition:
     Allow sep to be a collection of separators.

re.split is already more flexible than non-whitespace str.split andstr.partition combined.


>>> re.split('a|e|i|o|u', 'Franklin? Lee')
['Fr', 'nkl', 'n? L', '', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee')  # multiple partition
['Fr', 'a', 'nkl', 'i', 'n? L', 'e', '', 'e', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee', 1) # single partition
['Fr', 'a', 'nklin? Lee']

re.split, and hence str.rsplit(collection) are very sensible.

s.startswith, s.endswith:
     Allow argument to be a collection of strings.

bool(re.match('|'.join(strings)) does exactly the proposed s.startswith,with the advantage that the actual match is available, and I think thatone would nearly always want to know that match.


>>> re.match('a|e|i|o|u', 'Franklin? Lee')
>>> re.match('f|F', 'Franklin? Lee')
<re.Match object; span=(0, 1), match='F'>

re.search with '^' at the beginning or '$' at the end covers bothproposals, with the added flexibility of using MULTILINE mode to matchat the beginning or end of lines within the string.

s.find, s.index, s.count, x in s:
     Similar.
These methods are also in `list`, which can't distinguish betweenitems, subsequences, and subsets. However, `str` is already inconsistentwith `list` here: list.M looks for an item, while str.M looks for asubsequence.

Comments above apply. re.search tells you which string matched as wellas where. bool(re.search) is 'x in s'. re.findall and re.finditer givemuch more info than merely a count ('sum(bool(re.finditer))').

s.[r|l]strip:
Sadly, these functions already interpret their str arguments ascollections of characters.


To avoid this, use re.sub with ^ or $ anchor and '' replacement.

>>> re.sub('(Frank|Lee)$', '', 'Franklin? Lee')
'Franklin? '

These new forms can be optimized internally, as a search for multiplecandidate substrings can be more efficient than searching for one at atime.


This is what re does with 's1|s2|...|sn' patterns.

https://stackoverflow.com/questions/3260962/algorithm-to-find-multiple-string-matches
The most significant change is on .replace. The others are simple enoughto simulate with a loop or something.


No loops needed.

It is harder to make multiplesimultaneous replacements using one .replace at a time, because previousreplacements can form new things that look like replaceables.

This problem exists for single string replacement also. The standardsolution is to not backtrack and not do overlapping replacements.

The easiest Python solution is to use regex

My claim above is that this is sufficient for all by one case, whichshould be a new function anyway.

or install some package, whichuses (if you're lucky) regex or (if unlucky) doesn't simulatesimultaneous replacements. (If possible, just use str.translate.)
I suppose .split on multiple separators is also annoying to simulate.The two-argument form of .split may be even more of a burden, though Idon't know when a limited multiple-separator split is useful. Thecurrent best solution is, like before, to use regex, or install apackage and hope for the best.


--
Terry Jan Reedy


_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Complicate str methods

Reply via email to