On 2/3/2018 5:04 PM, Franklin? Lee wrote:
Let s be a str. I propose to allow these existing str methods to take
params in new forms.
Thanks for the honest title. As you sort of indicate, these can all be
done with re module. However, you imply loops are needed besides, which
is mostly not true. Your complications mostly translate to existing
calls and hence are not needed.
Perhaps 'Regular Expression HOWTO' could use more examples, or even a
section on generalizing string methinds. Perhaps the string method doc
needs suggestion to use re for multiple string args and references to
the re howto. Please consider taking a look at both.
>>> import re
s.replace(old, new):
Allow passing in a collection of olds.
>>> re.sub('Franklin|Lee', 'user', 'Franklin? Lee')
'user? user'
Remembering the name change is a nuisance
Allow passing in a single argument, a mapping of olds to news.
This needs to be a separate function, say 'dictsub', that joins the keys
with '|' and calls re.sub with a function that does the lookup as the
2nd parameter. This would be a nice example for the howto.
As you noted, this is generalization of str.translate, and might be
proposed as a new re module function.
Allow the olds in the mapping to be tuples of strings.
A minor addition to dictsub.
s.split(sep), s.rsplit, s.partition:
Allow sep to be a collection of separators.
re.split is already more flexible than non-whitespace str.split and
str.partition combined.
>>> re.split('a|e|i|o|u', 'Franklin? Lee')
['Fr', 'nkl', 'n? L', '', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee') # multiple partition
['Fr', 'a', 'nkl', 'i', 'n? L', 'e', '', 'e', '']
>>> re.split('(a|e|i|o|u)', 'Franklin? Lee', 1) # single partition
['Fr', 'a', 'nklin? Lee']
re.split, and hence str.rsplit(collection) are very sensible.
s.startswith, s.endswith:
Allow argument to be a collection of strings.
bool(re.match('|'.join(strings)) does exactly the proposed s.startswith,
with the advantage that the actual match is available, and I think that
one would nearly always want to know that match.
>>> re.match('a|e|i|o|u', 'Franklin? Lee')
>>> re.match('f|F', 'Franklin? Lee')
<re.Match object; span=(0, 1), match='F'>
re.search with '^' at the beginning or '$' at the end covers both
proposals, with the added flexibility of using MULTILINE mode to match
at the beginning or end of lines within the string.
s.find, s.index, s.count, x in s:
Similar.
These methods are also in `list`, which can't distinguish between
items, subsequences, and subsets. However, `str` is already inconsistent
with `list` here: list.M looks for an item, while str.M looks for a
subsequence.
Comments above apply. re.search tells you which string matched as well
as where. bool(re.search) is 'x in s'. re.findall and re.finditer give
much more info than merely a count ('sum(bool(re.finditer))').
s.[r|l]strip:
Sadly, these functions already interpret their str arguments as
collections of characters.
To avoid this, use re.sub with ^ or $ anchor and '' replacement.
>>> re.sub('(Frank|Lee)$', '', 'Franklin? Lee')
'Franklin? '
These new forms can be optimized internally, as a search for multiple
candidate substrings can be more efficient than searching for one at a
time.
This is what re does with 's1|s2|...|sn' patterns.
https://stackoverflow.com/questions/3260962/algorithm-to-find-multiple-string-matches
The most significant change is on .replace. The others are simple enough
to simulate with a loop or something.
No loops needed.
It is harder to make multiple
simultaneous replacements using one .replace at a time, because previous
replacements can form new things that look like replaceables.
This problem exists for single string replacement also. The standard
solution is to not backtrack and not do overlapping replacements.
The easiest Python solution is to use regex
My claim above is that this is sufficient for all by one case, which
should be a new function anyway.
or install some package, which
uses (if you're lucky) regex or (if unlucky) doesn't simulate
simultaneous replacements. (If possible, just use str.translate.)
I suppose .split on multiple separators is also annoying to simulate.
The two-argument form of .split may be even more of a burden, though I
don't know when a limited multiple-separator split is useful. The
current best solution is, like before, to use regex, or install a
package and hope for the best.
--
Terry Jan Reedy
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/