The ElementTidy library is an add-on to ElementTree that provides an alternative tree builder that can read (almost) arbitrary HTML, and turn it into well-formed XHTML element trees.
The ElementTidy library uses a library version of Dave Raggett's HTML Tidy utility to do the cleanup (source code is included), and does not rely on external utilities. The beta 1 release adds improved support for source document encoding, and more aggressive tidying (producing output also for seriously malformed HTML). For downloads and more information, see: http://effbot.org/downloads#elementtidy http://effbot.org/zone/element-tidylib.htm enjoy /F _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig