On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee <xah...@gmail.com> wrote: > but in anycase, i can't see how this part would work > <p class="cpt">((?:[^<]|<(?!/p>))+)</p>
It's not that different from the pattern 「alt="[^"]+"」 earlier in the regex. The capture group accepts one or more characters that either aren't '<', or that are '<' but are not immediately followed by '/p>'. Thus it stops capturing when it sees exactly '</p>' without consuming the '<'. Using my regex with the example that you posted earlier demonstrates that it works: >>> import re >>> s = '''<div class="img"> ... <img src="jamie_cat.jpg" alt="jamie's cat" width="167" height="106"> ... <p class="cpt">jamie's cat! Her blog is <a href="http://example.com/ ... jamie/">http://example.com/jamie/</a></p> ... </div>''' >>> print re.sub(pattern, replace, s) <figure> <img src="jamie_cat.jpg" alt="jamie's cat" width="167" height="106"> <figcaption>jamie's cat! Her blog is <a href="http://example.com/ jamie/">http://example.com/jamie/</a></figcaption> </figure> Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list