"Stewart C. Russell" said:

> Justin Mason wrote:
> > 
> > The OpenBSD Journal site looks OK to me -- in terms of its output anyway
> > :).  Unfortunately I think Slash-type news sites are always going to
> > require some PreProcessing or PostProcessing, due to all those "more
> > comments" and "less comments" links, so I would not be too concerned about
> > that.
> 
> further to this, has anyone had any success using Dave Raggett's "tidy"
> HTML cleaner (http://www.w3.org/People/Raggett/tidy/) as a preprocessor?
> Can it be wedged into the SiteScooper pipeline?
> 
> It only gives up on files that are seriously broken, otherwise produces
> clean, valid HTML files. Sadly, broken HTML is probably a decent %age of
> the sites out there [a text markup old fart writes].

It should be possible to pipe the text out to it in a HTMLPreProcess:
block.  I haven't tried it, as I'm a bit tidy-phobic after it thoroughly
mangled a page on me once :(

Another possibility would be to produce HTML output from sitescooper, and
then tidy that -- might be handy to avoid problems caused by invalid HTML
which is irrelevant anyway (for example outside the StoryStart and
StoryEnd range).

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to