On Dec 28, 6:22 pm, Kenneth McDonald <kenneth.m.mcdon...@sbcglobal.net> wrote: > Ruby has a package called 'hpricot' which can perform limited xpath > queries, and CSS selector queries. However, what makes it really > useful is that it does a good job of handling the "broken" html that > is so commonly found on the web. Does Python have anything similar, > i.e. something that will not only do XPath queries, but will do so on > imperfect HTML?
Hpricot is a fine package but I prefer Nokogiri (see http://www.rubyinside.com/nokogiri-ruby-html-parser-and-xml-parser-1288.html) because it is based on libxml2 and therefore is faster, conforms to the full XPath 1.0 spec, works on imperfect HTML, and exposes the Hpricot API. In python, the equivalent is lxml (http://codespeak.net/lxml/), which is similarly based on libxml2, very fast, XPath-1.0 conformant, and exposes the now-standard ElementTree API. The main difference is that lxml doesn't have CSS selector syntax, but IMHO that's a gimmick when you have a full XPath 1.0 engine at your disposal. -- Mark. -- http://mail.python.org/mailman/listinfo/python-list