Re: [Haskell-cafe] Is XHT a good tool for parsing web pages?

2010-04-28 Thread Ivan Lazar Miljenovic
Uwe Schmidt u...@fh-wedel.de writes: The HTML parser in HXT is based on tagsoup. It's a lazy parser (it does not use parsec) and it tries to parse everything as HTML. But garbage in, garbage out, there is no approach to repair illegal HTML as e.g. the Tidy parsers do. The parser uses tagsoup

Re: [Haskell-cafe] Is XHT a good tool for parsing web pages?

2010-04-28 Thread Uwe Schmidt
Hi Ivan, Uwe Schmidt u...@fh-wedel.de writes: The HTML parser in HXT is based on tagsoup. It's a lazy parser (it does not use parsec) and it tries to parse everything as HTML. But garbage in, garbage out, there is no approach to repair illegal HTML as e.g. the Tidy parsers do. The parser

[Haskell-cafe] Is XHT a good tool for parsing web pages?

2010-04-27 Thread John Creighton
Subject: Is XHT a good tool for parsing web pages? I looked a little bit at XHT and it seems very elegant for writing concise definitions of parsers by forms but I read that it fails if the XML isn't strict and I know a lot of web pages don't use strict XHTML. Therefore I wonder if it is an

Re: [Haskell-cafe] Is XHT a good tool for parsing web pages?

2010-04-27 Thread Peter Robinson
On 27 April 2010 16:22, John Creighton johns2...@gmail.com wrote: Subject: Is XHT a good tool for parsing web pages? I looked a little bit at XHT and it seems very elegant for writing concise definitions of parsers by forms but I read that it fails if the XML isn't strict and I know a lot of

Re: [Haskell-cafe] Is XHT a good tool for parsing web pages?

2010-04-27 Thread Malcolm Wallace
Is XHT a good tool for parsing web pages? I read that it fails if the XML isn't strict and I know a lot of web pages don't use strict XHTML. Do you mean HXT rather than XHT? I know that the HaXml library has a separate error-correcting HTML parser that works around most of the common