nephish wrote:
> hey there gents,
>       i am looking for a good place to start learning how to read a web page
> with python and pull out bits of information for an app i am doing for
> work. i have googled and looked at the docs. i looked at urllib and
> httplib so i think this a place to kinda start. Does anyone know of a
> good site with some examples or tutorials for this kind of thing ? 

Using urllib to fetch a web page can be as simple as
 >>> import urllib
 >>> data = urllib.urlopen('http://www.google.com').read()
 >>> data[:100]
'<html><head><meta http-equiv="content-type" content="text/html; 
charset=ISO-8859-1"><title>Google</t'

To parse the HTML and get the data you want from it try Beautiful Soup.
http://www.crummy.com/software/BeautifulSoup/

 >>> from BeautifulSoup import BeautifulSoup
 >>> soup = BeautifulSoup(data)
 >>> soup('title')
[<title>Google</title>]
 >>> for a in soup('a'):
 ...   print a.get('href'), a.string
 ...
/imghp?hl=en&tab=wi&ie=UTF-8 Images
http://groups.google.com/grphp?hl=en&tab=wg&ie=UTF-8 Groups
http://news.google.com/nwshp?hl=en&tab=wn&ie=UTF-8 News
http://froogle.google.com/frghp?hl=en&tab=wf&ie=UTF-8 Froogle
/lochp?hl=en&tab=wl&ie=UTF-8 Null
None Null
/intl/en/options/ more&nbsp;&raquo;
/advanced_search?hl=en Advanced Search
/preferences?hl=en Preferences
/language_tools?hl=en Language Tools
/ads/ Advertising&nbsp;Programs
/services Business Solutions
/intl/en/about.html About Google

Kent

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to