Uwe Schmidt u...@fh-wedel.de writes:
The HTML parser in HXT is based on tagsoup. It's a lazy parser
(it does not use parsec) and it tries to parse everything as HTML.
But garbage in, garbage out, there is no approach to repair illegal HTML
as e.g. the Tidy parsers do. The parser uses tagsoup
Hi Ivan,
Uwe Schmidt u...@fh-wedel.de writes:
The HTML parser in HXT is based on tagsoup. It's a lazy parser
(it does not use parsec) and it tries to parse everything as HTML.
But garbage in, garbage out, there is no approach to repair illegal HTML
as e.g. the Tidy parsers do. The parser
Subject: Is XHT a good tool for parsing web pages?
I looked a little bit at XHT and it seems very elegant for writing
concise definitions of parsers by forms but I read that it fails if
the XML isn't strict and I know a lot of web pages don't use strict
XHTML. Therefore I wonder if it is an
On 27 April 2010 16:22, John Creighton johns2...@gmail.com wrote:
Subject: Is XHT a good tool for parsing web pages?
I looked a little bit at XHT and it seems very elegant for writing
concise definitions of parsers by forms but I read that it fails if
the XML isn't strict and I know a lot of
Is XHT a good tool for parsing web pages?
I read that it fails if the XML isn't strict and I know a lot of web
pages don't use strict XHTML.
Do you mean HXT rather than XHT?
I know that the HaXml library has a separate error-correcting HTML
parser that works around most of the common