"mtuller" typed: > I have also tried Beautiful Soup, but had trouble understanding the > documentation
As Gabriel has suggested, spend a little more time going through the documentation of BeautifulSoup. It is pretty easy to grasp. I'll give you an example: I want to extract the text between the following span tags in a large HTML source file. <span class="title">Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability</span> >>> import re >>> from BeautifulSoup import BeautifulSoup >>> from urllib2 import urlopen >>> soup = BeautifulSoup(urlopen('http://www.someurl.tld/')) >>> title = soup.find(name='span', attrs={'class':'title'}, >>> text=re.compile(r'^Linux \w+')) >>> title u'Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability' -- Ayaz Ahmed Khan A witty saying proves nothing, but saying something pointless gets people's attention. -- http://mail.python.org/mailman/listinfo/python-list