On 08/10/10 19:37, candide wrote:
Suppose you have a sequence s , a string for say, for instance this one :
spppammmmegggssss
We want to split s into the following parts :
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
ie each part is a single repeated character word.
While I'm not sure it's idiomatic, the overabuse of regexps in
Python certainly seems prevalent enough to be idiomatic ;-)
As such, you can use:
import re
r = re.compile(r'((.)\1*)')
#r = re.compile(r'((\w)\1*)')
s = 'spppammmmegggssss'
results = [m.group(0) for m in r.finditer(s)]
Additionally, you have all the properties of the match-object
(which includes the start/end) available too if you need).
You don't specify what you want to have happen with non-letters
(whitespace, punctuation, etc). The above just treats them like
any other character, finding repeats. If you just want "word"
characters, you can use the 2nd ("\w") version, or adjust
accordingly.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list