Re: Regexp

Ant Mon, 19 Jan 2009 07:05:43 -0800

A 0-width positive lookahead is probably what you want here:

>>> s = """
... hdhd <a href="http://mysite.com/blah.html";>Test <i>String</i> OK</
a>
...
... """
>>> p = r'href="(http://mysite.com/[^"]+)">(.*)(?=</a>)'
>>> m = re.search(p, s)
>>> m.group(1)
'http://mysite.com/blah.html'
>>> m.group(2)
'Test <i>String</i> OK'


The (?=...) bit is the lookahead, and won't consume any of the string
you are searching. I've binned the named groups for clarity.

The beautiful soup answers are a better bet though - they've already
done the hard work, and after all, you are trying to roll your own
partial HTML parser here, which will struggle with badly formed html...
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regexp

Reply via email to