Tim Chase wrote:
On 08/10/10 19:37, candide wrote:
Suppose you have a sequence s , a string for say, for instance this
one :
spppammmmegggssss
We want to split s into the following parts :
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
ie each part is a single repeated character word.
While I'm not sure it's idiomatic, the overabuse of regexps in Python
certainly seems prevalent enough to be idiomatic ;-)
As such, you can use:
import re
r = re.compile(r'((.)\1*)')
#r = re.compile(r'((\w)\1*)')
That should be \2, not \1.
Alternatively:
r = re.compile(r'(.)\1*')
#r = re.compile(r'(\w)\1*')
s = 'spppammmmegggssss'
results = [m.group(0) for m in r.finditer(s)]
Additionally, you have all the properties of the match-object (which
includes the start/end) available too if you need).
You don't specify what you want to have happen with non-letters
(whitespace, punctuation, etc). The above just treats them like any
other character, finding repeats. If you just want "word" characters,
you can use the 2nd ("\w") version, or adjust accordingly.
--
http://mail.python.org/mailman/listinfo/python-list