Re: Help extracting info from HTML source ..

Nikita the Spider Fri, 26 Jan 2007 11:06:04 -0800

In article <[EMAIL PROTECTED]>,
 "Miki" <[EMAIL PROTECTED]> wrote:


> Hello Shelton,
> 
> >   I am learning Python, and have never worked with HTML.  However, I would
> > like to write a simple script to audit my 100+ Netware servers via their web
> > portal.
> Always use the right tool, BeautilfulSoup
> (http://www.crummy.com/software/BeautifulSoup/) is best for web
> scraping (IMO).
> 
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
> 
> html = urlopen("http://www.python.org";).read()
> soup = BeautifulSoup(html)
> for link in soup("a"):
>       print link["href"], "-->", link.contents

Agreed. HTML scraping is really complicated once you get into it. It 
might be interesting to write such a library just for your own 
satisfaction, but if you want to get something done then use a module 
that already written, like BeautifulSoup. Another module that will do 
the same job but works differently (and more simply, IMO) is HTMLData by 
Connelly Barnes:
http://oregonstate.edu/~barnesc/htmldata/

-- 
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help extracting info from HTML source ..

Reply via email to