An alternative to tidy is my fix-html program:

http://dcs.nac.uci.edu/~strombrg/fix-html.html

I had a page that wasn't working with tidy (was it the palm addicts
website?), that fix-html sailed through.  It's based on the
BeautifulSoup python module, which appears to be pretty good at making
sense of bad html.

On Thu, 2005-03-03 at 10:06 +0100, Justus Piater wrote:
> Hi,
> 
> The issue of Web pages whose HTML is fouled up to the point of
> impluckability (add this to Merriam-Webster!) comes up over and over
> again.
> 
> The standard solution would be to use wget with the right options to
> download all that's needed, then run tidy on the file(s) in question,
> and then pluck the local files.  This is quite cumbersome, and one
> loses the original URL in the plucked PDB.
> 
> How about adding an option to plucker-build for filtering each
> downloaded file through tidy?
> 
> This should only be a minor hack, the tidying occurs in the right
> place in the pipeline, and it increases plucker-build's practical
> usability without placing additional burden on the user.
> 
> Justus
> 

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to