On 08/10/10 19:37, candide wrote:
Suppose you have a sequence s , a string  for say, for instance this one :

spppammmmegggssss

We want to split s into the following parts :

['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']

ie each part is a single repeated character word.

While I'm not sure it's idiomatic, the overabuse of regexps in Python certainly seems prevalent enough to be idiomatic ;-)

As such, you can use:

  import re
  r = re.compile(r'((.)\1*)')
  #r = re.compile(r'((\w)\1*)')
  s = 'spppammmmegggssss'
  results = [m.group(0) for m in r.finditer(s)]

Additionally, you have all the properties of the match-object (which includes the start/end) available too if you need).

You don't specify what you want to have happen with non-letters (whitespace, punctuation, etc). The above just treats them like any other character, finding repeats. If you just want "word" characters, you can use the 2nd ("\w") version, or adjust accordingly.

-tkc





--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to