nephish wrote:
> hey there gents,
> i am looking for a good place to start learning how to read a web page
> with python and pull out bits of information for an app i am doing for
> work. i have googled and looked at the docs. i looked at urllib and
> httplib so i think this a place to kinda start. Does anyone know of a
> good site with some examples or tutorials for this kind of thing ?
Using urllib to fetch a web page can be as simple as
>>> import urllib
>>> data = urllib.urlopen('http://www.google.com').read()
>>> data[:100]
'<html><head><meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1"><title>Google</t'
To parse the HTML and get the data you want from it try Beautiful Soup.
http://www.crummy.com/software/BeautifulSoup/
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(data)
>>> soup('title')
[<title>Google</title>]
>>> for a in soup('a'):
... print a.get('href'), a.string
...
/imghp?hl=en&tab=wi&ie=UTF-8 Images
http://groups.google.com/grphp?hl=en&tab=wg&ie=UTF-8 Groups
http://news.google.com/nwshp?hl=en&tab=wn&ie=UTF-8 News
http://froogle.google.com/frghp?hl=en&tab=wf&ie=UTF-8 Froogle
/lochp?hl=en&tab=wl&ie=UTF-8 Null
None Null
/intl/en/options/ more »
/advanced_search?hl=en Advanced Search
/preferences?hl=en Preferences
/language_tools?hl=en Language Tools
/ads/ Advertising Programs
/services Business Solutions
/intl/en/about.html About Google
Kent
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor