John Clark wrote: > Hi, > > I have a file that is a long list of records (roughly) in the format > > [EMAIL PROTECTED] > > So, for example: > > [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > .... > > What I would like to do is run a regular expression against this and > wind up with: > > [EMAIL PROTECTED]@[EMAIL PROTECTED]@data4 > [EMAIL PROTECTED]
Regular expressions aren't so good at dealing with repeating data like this. OTOH itertools.groupby() is perfect for this: # This represents your original data data = '''[EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]'''.splitlines() # Convert to a list of pairs of (id, data) data = [ line.split('@') for line in data ] from itertools import groupby from operator import itemgetter # groupby() will group them according to whatever key we specify # itemgetter(0) will pull out just the first item # the result of groupby() is a list of (key, sequence of items) for id, items in groupby(data, itemgetter(0)): print '[EMAIL PROTECTED]' % (id, '@'.join(item[1] for item in items)) I have a longer explanation of groupby() and itemgetter() here: http://www.pycs.net/users/0000323/weblog/2005/12/06.html > So, my questions are: > (1) Is there any way to get a single regular expression to handle > overlapping matches so that I get what I want in one call? I doubt it though I'd be happy to be proven wrong ;) > (2) Is there any way (without comparing the before and after strings) to > know if a re.sub(...) call did anything? Use re.subn() instead, it returns the new string and a count. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor