Re: [Tutor] Newbie - regex question

Hugo Arts Mon, 30 Aug 2010 14:59:17 -0700

On Mon, Aug 30, 2010 at 10:52 PM, Sam M <[email protected]> wrote:
> Hi Guys,
>
> I'd like remove contents between tags <email> that matches pattern "WORD1"
> as follows:
>
> Change
> "stuff <email>[email protected]</email> more stuff
> <email>[email protected]</email> still more stuff
> <email>[email protected]</email> stuff after WORD2
> <email>[email protected]</email>"
>
> To
> "stuff  more stuff  still more stuff <email>[email protected]</email>
> stuff after WORD2 "
>
> The following did not work
> newl = re.sub (r'<email>WORD1-.*</email>',"",line)
>


This precise problem is actually described in the re documentation on
python.org:

http://docs.python.org/howto/regex.html#greedy-versus-non-greedy

In short: .* is greedy and gobbles up as much as it can. That means
</email> will resolve to the last </email> tag in the line, and all
the previous ones are simply eaten by .*

To solve, we have the non-greedy patterns. They eat not as much
possible, but as little as possible. To make a qualifier non-greedy,
simply add an asterix at its end:

r'<email>WORD1-.*?</email>'

Hugo
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Newbie - regex question

Reply via email to