John Clark wrote:
> Hi,
>
> I have a file that is a long list of records (roughly) in the format
>
> [EMAIL PROTECTED]
>
> So, for example:
>
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> ....
>
> What I would like to do is run a regular expression against this and
> wind up with:
>
> [EMAIL PROTECTED]@[EMAIL PROTECTED]@data4
> [EMAIL PROTECTED]
Regular expressions aren't so good at dealing with repeating data like
this. OTOH itertools.groupby() is perfect for this:
# This represents your original data
data = '''[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]'''.splitlines()
# Convert to a list of pairs of (id, data)
data = [ line.split('@') for line in data ]
from itertools import groupby
from operator import itemgetter
# groupby() will group them according to whatever key we specify
# itemgetter(0) will pull out just the first item
# the result of groupby() is a list of (key, sequence of items)
for id, items in groupby(data, itemgetter(0)):
print '[EMAIL PROTECTED]' % (id, '@'.join(item[1] for item in items))
I have a longer explanation of groupby() and itemgetter() here:
http://www.pycs.net/users/0000323/weblog/2005/12/06.html
> So, my questions are:
> (1) Is there any way to get a single regular expression to handle
> overlapping matches so that I get what I want in one call?
I doubt it though I'd be happy to be proven wrong ;)
> (2) Is there any way (without comparing the before and after strings) to
> know if a re.sub(...) call did anything?
Use re.subn() instead, it returns the new string and a count.
Kent
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor