Re: HTML Parsing

Ayaz Ahmed Khan Sat, 10 Feb 2007 23:11:02 -0800

"mtuller" typed:

> I have also tried Beautiful Soup, but had trouble understanding the
> documentation


As Gabriel has suggested, spend a little more time going through the
documentation of BeautifulSoup. It is pretty easy to grasp.

I'll give you an example: I want to extract the text between the
following span tags in a large HTML source file.

<span class="title">Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow 
Vulnerability</span>

>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://www.someurl.tld/')) 
>>> title = soup.find(name='span', attrs={'class':'title'}, 
>>> text=re.compile(r'^Linux \w+'))
>>> title
u'Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability'

-- 
Ayaz Ahmed Khan

A witty saying proves nothing, but saying something pointless gets
people's attention.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTML Parsing

Reply via email to