Re: webspider, regexp not working, why?

alex23 Fri, 23 May 2008 19:41:20 -0700

On May 24, 3:26 am, "Reedick, Andrew" <[EMAIL PROTECTED]> wrote:
> c)  If you're going to parse html/xml then bite the bullet and learn one
> of the libraries specifically designed to parse html/xml.  Many other
> regex gurus have learned this lesson.  Myself included.  =)


Agreed. The BeautifulSoup approach is particularly nice (although not
part of stdlib):

>>> import urllib
>>> from BeautifulSoup import BeautifulSoup
>>> html = urllib.urlopen('http://www.python.org/').read()
>>> soup = BeautifulSoup(html)
>>> links = [link['href'] for link in soup('link')]
>>> links[0]
u'http://www.python.org/channews.rdf'

- alex23

--
http://mail.python.org/mailman/listinfo/python-list

Re: webspider, regexp not working, why?

Reply via email to