Re: Regexp

gervaz Mon, 19 Jan 2009 07:41:28 -0800

On Jan 19, 4:01 pm, Ant <[email protected]> wrote:
> A 0-width positive lookahead is probably what you want here:
>
> >>> s = """
>
> ... hdhd <a href="http://mysite.com/blah.html";>Test <i>String</i> OK</
> a>
> ...
> ... """>>> p = r'href="(http://mysite.com/[^"]+)">(.*)(?=</a>)'
> >>> m = re.search(p, s)
> >>> m.group(1)
>
> 'http://mysite.com/blah.html'>>> m.group(2)
>
> 'Test <i>String</i> OK'
>
> The (?=...) bit is the lookahead, and won't consume any of the string
> you are searching. I've binned the named groups for clarity.
>
> The beautiful soup answers are a better bet though - they've already
> done the hard work, and after all, you are trying to roll your own
> partial HTML parser here, which will struggle with badly formed html...


Ok, thank you all, I'll take a look at beautiful soup, albeit the
lookahead solution fits better for the little I have to do.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regexp

Reply via email to