I suspect that fixing the HTML should be done however a *browser* would
do it (there are many million of those in use!).

In particular, IE5 exposes its DOM, so it should be possible to run
large amounts of HTML through the browser, and through the HTML parser,
and then compare them.  In cases where the input may be ambiguous, I
think browsers would be a good "reference" implementation for this
purpose...

Mike

Tom Palmer wrote:
> 
> > My impression was that JTidy had to make a complete pass over the document
> > in order to tidy it.  This would preclude using it for a SAX
> (stream-based) parser.
> >
> If so, too bad.  (Of course, this wouldn't _preclude_ it, just make it
> extremely
> inefficient.)
> 
> I think it may get more complicated than Assaf listed in his algorithms, but
> I still think a knowledge of the stack of what tags are currently open is
> sufficient to fix the HTML.
> 
> - Tom Palmer

Reply via email to