Bruno Desthuilliers wrote: >> However, what makes it really useful is that it does a good job of >> handling the "broken" html that is so commonly found on the web. > > BeautifulSoup ? > http://pypi.python.org/pypi/BeautifulSoup/3.0.7a > > possibly with ElementSoup ? > http://pypi.python.org/pypi/ElementSoup/rev452
It's actually debatable if BS is any better than lxml/libxml2 when parsing broken HTML, as lxml tends to tidy things up pretty well. The only major difference is in encoding detection, for which you can also use a separate tool like chardet: http://chardet.feedparser.org/ Stefan -- http://mail.python.org/mailman/listinfo/python-list