Re: Splitting a sequence into pieces with identical elements

Tim Chase Tue, 10 Aug 2010 18:21:56 -0700

On 08/10/10 19:37, candide wrote:

Suppose you have a sequence s , a string  for say, for instance this one :


spppammmmegggssss

We want to split s into the following parts :

['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']

ie each part is a single repeated character word.

While I'm not sure it's idiomatic, the overabuse of regexps inPython certainly seems prevalent enough to be idiomatic ;-)


As such, you can use:

  import re
  r = re.compile(r'((.)\1*)')
  #r = re.compile(r'((\w)\1*)')
  s = 'spppammmmegggssss'
  results = [m.group(0) for m in r.finditer(s)]

Additionally, you have all the properties of the match-object(which includes the start/end) available too if you need).

You don't specify what you want to have happen with non-letters(whitespace, punctuation, etc). The above just treats them likeany other character, finding repeats. If you just want "word"characters, you can use the 2nd ("\w") version, or adjustaccordingly.


-tkc





--
http://mail.python.org/mailman/listinfo/python-list

Re: Splitting a sequence into pieces with identical elements

Reply via email to