"Frank Potter" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> pyparsing is cool.
> but use only re is also OK
> # -*- coding: UTF-8 -*-
> import urllib2
> html=urllib2.urlopen(ur"http://www.yahoo.com/").read()
>
> import re
> r=re.compile('<img\s+src="(?P<image>[^"]+)"[^>]*>',re.IGNORECASE)
> for m in r.finditer(html):
> print m.group('image')
>
Ouch - this fails to match any <img> tag that has some other attribute, such
as "height" or "width", before the "src" attribute. www.yahoo.com has
several such tags.
On the other hand, pyparsing's makeHTMLTags defines a starting tag
expression that looks for (conceptually):
< tagname ZeroOrMore(attrname '=' value) Optional('/') >
and does not assume that the first tag is "src", or anything else for that
matter.
The returned results make the tag attributes accessible as object attributes
or dictionary keys.
-- Paul
--
http://mail.python.org/mailman/listinfo/python-list