Hi,
I have a file that is a long list of records (roughly) in the format
[EMAIL PROTECTED]
So, for example:
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
....
What I would like to do is run a regular expression against this and
wind up with:
[EMAIL PROTECTED]@[EMAIL PROTECTED]@data4
[EMAIL PROTECTED]
So I ran the following regex against the string:
re.compile(r'([EMAIL PROTECTED])@(.*)\n\1@(.*)').sub(r'\1\2\3', string)
and I wound up with:
[EMAIL PROTECTED]@data2
[EMAIL PROTECTED]@data4
[EMAIL PROTECTED]
So, my questions are:
(1) Is there any way to get a single regular expression to handle
overlapping matches so that I get what I want in one call?
(2) Is there any way (without comparing the before and after strings) to
know if a re.sub(...) call did anything?
I suppose I could do something like:
pattern = re.compile(r'([EMAIL PROTECTED])@(.*)\n\1@(.*)')
while(pattern.search(string)):
string = pattern.sub(r'\1\2\3', string)
but I would like to avoid the explicit loop if possible...
Actually, should I be able to do something like that? If I execute it
in my debugger, my string gets really funky... like the re is losing
track of what the groups are... and I end up with a single really long
string rather than what I expect..
Any help on this would be appreciated.
-jdc
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor