Re: Splitting a sequence into pieces with identical elements

MRAB Tue, 10 Aug 2010 18:33:50 -0700

Tim Chase wrote:

On 08/10/10 19:37, candide wrote:
Suppose you have a sequence s , a string for say, for instance thisone :
spppammmmegggssss

We want to split s into the following parts :

['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']

ie each part is a single repeated character word.
While I'm not sure it's idiomatic, the overabuse of regexps in Pythoncertainly seems prevalent enough to be idiomatic ;-)
As such, you can use:

  import re
  r = re.compile(r'((.)\1*)')
  #r = re.compile(r'((\w)\1*)')


That should be \2, not \1.

Alternatively:

    r = re.compile(r'(.)\1*')
    #r = re.compile(r'(\w)\1*')

  s = 'spppammmmegggssss'
  results = [m.group(0) for m in r.finditer(s)]
Additionally, you have all the properties of the match-object (whichincludes the start/end) available too if you need).
You don't specify what you want to have happen with non-letters(whitespace, punctuation, etc). The above just treats them like anyother character, finding repeats. If you just want "word" characters,you can use the 2nd ("\w") version, or adjust accordingly.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Splitting a sequence into pieces with identical elements

Reply via email to