On 2/3/2018 5:04 PM, Franklin? Lee wrote:
Let s be a str. I propose to allow these existing str methods to take params in new forms.

Thanks for the honest title. As you sort of indicate, these can all be done with re module. However, you imply loops are needed besides, which is mostly not true. Your complications mostly translate to existing calls and hence are not needed.

Perhaps 'Regular Expression HOWTO' could use more examples, or even a section on generalizing string methinds. Perhaps the string method doc needs suggestion to use re for multiple string args and references to the re howto. Please consider taking a look at both.

>>> import re

s.replace(old, new):
     Allow passing in a collection of olds.

>>> re.sub('Franklin|Lee', 'user', 'Franklin? Lee')
'user? user'

Remembering the name change is a nuisance

     Allow passing in a single argument, a mapping of olds to news.

This needs to be a separate function, say 'dictsub', that joins the keys with '|' and calls re.sub with a function that does the lookup as the 2nd parameter. This would be a nice example for the howto.

As you noted, this is generalization of str.translate, and might be proposed as a new re module function.

     Allow the olds in the mapping to be tuples of strings.

A minor addition to dictsub.


s.split(sep), s.rsplit, s.partition:
     Allow sep to be a collection of separators.

re.split is already more flexible than non-whitespace str.split and str.partition combined.

>>> re.split('a|e|i|o|u', 'Franklin? Lee')
['Fr', 'nkl', 'n? L', '', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee')  # multiple partition
['Fr', 'a', 'nkl', 'i', 'n? L', 'e', '', 'e', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee', 1) # single partition
['Fr', 'a', 'nklin? Lee']

re.split, and hence str.rsplit(collection) are very sensible.

s.startswith, s.endswith:
     Allow argument to be a collection of strings.

bool(re.match('|'.join(strings)) does exactly the proposed s.startswith, with the advantage that the actual match is available, and I think that one would nearly always want to know that match.

>>> re.match('a|e|i|o|u', 'Franklin? Lee')
>>> re.match('f|F', 'Franklin? Lee')
<re.Match object; span=(0, 1), match='F'>

re.search with '^' at the beginning or '$' at the end covers both proposals, with the added flexibility of using MULTILINE mode to match at the beginning or end of lines within the string.

s.find, s.index, s.count, x in s:
     Similar.
    These methods are also in `list`, which can't distinguish between items, subsequences, and subsets. However, `str` is already inconsistent with `list` here: list.M looks for an item, while str.M looks for a subsequence.

Comments above apply. re.search tells you which string matched as well as where. bool(re.search) is 'x in s'. re.findall and re.finditer give much more info than merely a count ('sum(bool(re.finditer))').

s.[r|l]strip:
    Sadly, these functions already interpret their str arguments as collections of characters.

To avoid this, use re.sub with ^ or $ anchor and '' replacement.

>>> re.sub('(Frank|Lee)$', '', 'Franklin? Lee')
'Franklin? '

These new forms can be optimized internally, as a search for multiple candidate substrings can be more efficient than searching for one at a time.

This is what re does with 's1|s2|...|sn' patterns.

https://stackoverflow.com/questions/3260962/algorithm-to-find-multiple-string-matches

The most significant change is on .replace. The others are simple enough to simulate with a loop or something.

No loops needed.

It is harder to make multiple simultaneous replacements using one .replace at a time, because previous replacements can form new things that look like replaceables.

This problem exists for single string replacement also. The standard solution is to not backtrack and not do overlapping replacements.

The easiest Python solution is to use regex

My claim above is that this is sufficient for all by one case, which should be a new function anyway.

or install some package, which uses (if you're lucky) regex or (if unlucky) doesn't simulate simultaneous replacements. (If possible, just use str.translate.)

I suppose .split on multiple separators is also annoying to simulate. The two-argument form of .split may be even more of a burden, though I don't know when a limited multiple-separator split is useful. The current best solution is, like before, to use regex, or install a package and hope for the best.

--
Terry Jan Reedy


_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to