Has anyone tried to construct an HTML janitor script using BeautifulSoup?

My situation:

I'm trying to convert a series of web pages from .html to palmdoc format,
using plucker, which is written in python.  The plucker project suggests
passing html through "tidy", to get well-formed html for plucker to work
with.

However, some of the pages I want to convert are so bad that even tidy
pukes on them.

I was thinking that BeautifulSoup might be more tolerant of really bad
html...  Which led me to the question this article started out with.  :)

Thanks!


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to