HTML purifier using BeautifulSoup?

Dan Stromberg Tue, 21 Dec 2004 10:26:17 -0800

Has anyone tried to construct an HTML janitor script using BeautifulSoup?

My situation:


I'm trying to convert a series of web pages from .html to palmdoc format,
using plucker, which is written in python.  The plucker project suggests
passing html through "tidy", to get well-formed html for plucker to work
with.

However, some of the pages I want to convert are so bad that even tidy
pukes on them.

I was thinking that BeautifulSoup might be more tolerant of really bad
html...  Which led me to the question this article started out with.  :)

Thanks!


-- 
http://mail.python.org/mailman/listinfo/python-list

HTML purifier using BeautifulSoup?

Reply via email to