Has anyone tried to construct an HTML janitor script using BeautifulSoup? My situation:
I'm trying to convert a series of web pages from .html to palmdoc format, using plucker, which is written in python. The plucker project suggests passing html through "tidy", to get well-formed html for plucker to work with. However, some of the pages I want to convert are so bad that even tidy pukes on them. I was thinking that BeautifulSoup might be more tolerant of really bad html... Which led me to the question this article started out with. :) Thanks! -- http://mail.python.org/mailman/listinfo/python-list
