Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-21 Thread Henning Thielemann
On Mon, 14 May 2007, Malcolm Wallace wrote: Henning Thielemann [EMAIL PROTECTED] wrote: *Text.ParserCombinators.PolyLazy runParser (exactly 4 (satisfy Char.isAlpha)) (abc104++undefined) (*** Exception: Parse.satisfy: failed How can I rewrite the above example

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-14 Thread Henning Thielemann
On Fri, 11 May 2007, Malcolm Wallace wrote: *Text.ParserCombinators.PolyLazy runParser (exactly 4 (satisfy Char.isAlpha)) (abc104++undefined) (*** Exception: Parse.satisfy: failed This output is exactly correct. You asked for the first four characters provided that they were

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-14 Thread Malcolm Wallace
Henning Thielemann [EMAIL PROTECTED] wrote: *Text.ParserCombinators.PolyLazy runParser (exactly 4 (satisfy Char.isAlpha)) (abc104++undefined) (*** Exception: Parse.satisfy: failed How can I rewrite the above example that it returns (abc*** Exception: Parse.satisfy:

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-14 Thread Henning Thielemann
On Mon, 14 May 2007, Malcolm Wallace wrote: Perhaps I should just rewrite the 'exactly' combinator to have the behaviour you desire? Its current definition is: exactly 0 p = return [] exactly n p = do x - p xs - exactly (n-1) p return

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-14 Thread Henning Thielemann
On Mon, 14 May 2007, Malcolm Wallace wrote: Essentially, you need to return a constructor as soon as you know that the initial portion of parsed data is correct. Often the only sensible way to do that is to use the 'apply' combinator (as shown in the examples above), returning a constructor

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-14 Thread Malcolm Wallace
Henning Thielemann [EMAIL PROTECTED] wrote: exactly 0 p = return [] exactly n p = do x - p xs - exactly (n-1) p return (x:xs) Is there a difference between 'exactly' and 'replicateM' ? With this definition, clearly not. But when

[Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Henning Thielemann
I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy: *Text.XML.HaXml.Html.ParseLazy Text.XML.HaXml.Pretty.document $ htmlParse text $

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Henning Thielemann
On Fri, 11 May 2007, Neil Mitchell wrote: Depending on exactly what you want, TagSoup may be of interest to you. It is lazy, but it doesn't return a tree. It is very tollerant of errors, and will simply never fail to parse something. http://www-users.cs.york.ac.uk/~ndm/tagsoup/ That's an

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Neil Mitchell
Hi That's an interesting option. It could be used as a lexer for a full-blown HTML parser. Sometimes I need the tree structure. But why does this simple piece of code needs -fglasgow-exts? It doesn't. The released version 0.1 doesn't require extensions, and the next 0.2 won't either. In the

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Jules Bean
Henning Thielemann wrote: I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy: *Text.XML.HaXml.Html.ParseLazy Text.XML.HaXml.Pretty.document $

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Henning Thielemann
On Fri, 11 May 2007, Jules Bean wrote: Henning Thielemann wrote: I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy:

Re: [Haskell-cafe] Lazy HTML parsing with HXT, HaXML/polyparse, what else?

2007-05-11 Thread Malcolm Wallace
Henning Thielemann [EMAIL PROTECTED] wrote: HaXml has a so called lazy parser, but it is not what I consider lazy: Lazy parsing is rather subtle, and it is easy to write a too-strict parser when one intended to be more lazy. Equally, it can be easy to imagine that the parser is too strict,