Bruno Desthuilliers wrote:
>> However, what makes it really useful is that it does a good job of
>> handling the "broken" html that is so commonly found on the web.
> 
> BeautifulSoup ?
> http://pypi.python.org/pypi/BeautifulSoup/3.0.7a
> 
> possibly with ElementSoup ?
> http://pypi.python.org/pypi/ElementSoup/rev452

It's actually debatable if BS is any better than lxml/libxml2 when parsing
broken HTML, as lxml tends to tidy things up pretty well. The only major
difference is in encoding detection, for which you can also use a separate
tool like chardet:

http://chardet.feedparser.org/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to