Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-28 Thread S. Doaitse Swierstra

On 27 apr 2010, at 22:12, Jason Dusek wrote:

  So UU parsers can construct input?

The perform an editing action on the input so it becomes a sentence of the 
language recognised. 

 The presence of an
  empty list in the 2nd slot of the tuple is the only
  indicator of errors?

The parser wants to see a natural number, whch is a non-empty list of digits. 
So it inserts a single digit, which is any character from the range '0'-'9'. 
Since no default value is given here, it takes the first one from the range: 
'0'. Furthermore you get a list of errors, which tell you which correcting 
steps were taken. There is a special combinator with which you can ask for the 
errors produced since the last time you asked, and which you can use to control 
further parsing.

 
  For parsing datatypes without a sensible default value,
  what happens?

If you do nothing you get a less sensible default value; 
you may however provide (lower costs) extra alternatives which will be taken by 
the correcting process. There is a cost model which can be used to control the 
correction process. Tokens have a specific insertion cost and a specific 
deletion cost with which you can play. Usually this is not necessary. The 
typical process is that at first you do not pay attention to the correction 
process, and once you see things you really do not want, you provide an extra 
alternative, or rule out some alternatives by increasuig costs. 

In the UHC token like if have a high cost, since we think there is very 
little chance that people will forget to write them. A ')' can have a lower 
insertion and deletion cost, since people are more likely to have too many or 
not enough of them.



 Doaitse




 
 --
 Jason Dusek

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread Tom Hawkins
I had been using Parsec to parse VCD files, but needed to lazily parse
streaming data.  After stumbling on this thread below, I switch to
polyparse.

What a great library!  I was able to migrate from a strict to a
semi-lazy parser and many of my parse reductions didn't even need to
change.  Thanks Malcolm!

In addition to lazy VCD parsing, this version of vcd [1] also includes
step', which forces a step regardless if variables have changed or not
-- helpful for realtime simulation.

(BTW, parsec is a great library too.)

-Tom

[1] http://hackage.haskell.org/package/vcd-0.1.4



On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
malcolm.wall...@cs.york.ac.uk wrote:

 I don't know whether you will be willing to change over to polyparse
 library, but here are some hints about how you might use it.

 Given that you want the input to be a simple character stream, rather than
 use a more elaborate lexer, the first thing to do is to specialise the
 parser type for your purposes:

 type TextParser a = Parser Char a

 Now, to recognise a mere digit,

 digit :: TextParser Char
 digit = satisfy Char.isDigit

 and for a sequence of digits forming an unsigned integer:

 integer :: TextParser Integer
 integer = do ds - many1 digit
              return (foldl1 (\n d- n*10+d)
                             (map (fromIntegral.digitToInt) ds))
           `adjustErr` (++(expected one or more digits))

 I mean I'd like to be able to turn 12.05.2009 into something like (12,
 5, 2009) and got no clue what the code would have to look like. I do know
 almost every variation what the code must not look like :).

 date = do a - integer
           satisfy (=='.')
           b - integer
           satisfy (=='.')
           c - integer
           return (a,b,c)

 Of course, that is just the standard (strict) monadic interface used by many
 combinator libraries.  Your original desire was for lazy parsing, and to
 achieve that, you must move over to the applicative interface.  The key
 difference is that you cannot name intermediate values, but must construct
 larger values directly from smaller ones by something like function
 application.

 lazydate = return (,,) `apply` integer `discard` dot
                        `apply` integer `discard` dot
                        `apply` integer
    where dot = satisfy (=='.')

 The (,,) is the constructor function for triples.  The `discard` combinator
 ensures that its second argument parses OK, but throws away its result,
 keeping only the result of its first argument.

 Apart from lazy space behaviour, the main observable difference between
 date and lazydate is when errors are reported on incorrect input.  For
 instance:

   fst $ runParser date 12.05..2009
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

   fst $ runParser lazydate 12.05..2009
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

 Notice how the lazy parser managed to build the first two elements of the
 triple, whilst the strict parser gave no value at all.

 I know that the error messages shown here are not entirely satisfactory, but
 they can be improved significantly just by making greater use of the
 `adjustErr` combinator in lots more places (it is rather like Parsec's ?).
  Errors containing positional information about the input can be constructed
 by introducing a separate lexical tokenizer, which is also not difficult.

 Regards,
    Malcolm

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread S. Doaitse Swierstra
How about:

import Text.ParserCombinators.UU.Parsing
import Text.ParserCombinators.UU.Examples


pDate :: Pars (Int,Int,Int)
pDate = (,,) $ pNatural * pDot * pNatural * pDot * pNatural
where pDot = pSym '.'

and then:

*Main test pDate 3.4.5
Loading package syb-0.1.0.2 ... linking ... done.
Loading package base-3.0.3.2 ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package filepath-1.1.0.3 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package unix-2.4.0.0 ... linking ... done.
Loading package directory-1.0.1.0 ... linking ... done.
Loading package process-1.0.1.2 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.3.1 ... linking ... done.
((3,4,5),[])
*Main test pDate 3..7
((3,0,7),[
Inserted '0' at position 2 expecting '0'..'9'])
*Main test pDate 
((0,0,0),[
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9'])
*Main test pDate 3.4.2010
((3,4,2010),[])
*Main

Doaitse


On 27 apr 2010, at 13:23, Tom Hawkins wrote:

 I had been using Parsec to parse VCD files, but needed to lazily parse
 streaming data.  After stumbling on this thread below, I switch to
 polyparse.
 
 What a great library!  I was able to migrate from a strict to a
 semi-lazy parser and many of my parse reductions didn't even need to
 change.  Thanks Malcolm!
 
 In addition to lazy VCD parsing, this version of vcd [1] also includes
 step', which forces a step regardless if variables have changed or not
 -- helpful for realtime simulation.
 
 (BTW, parsec is a great library too.)
 
 -Tom
 
 [1] http://hackage.haskell.org/package/vcd-0.1.4
 
 
 
 On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
 malcolm.wall...@cs.york.ac.uk wrote:
 
 I don't know whether you will be willing to change over to polyparse
 library, but here are some hints about how you might use it.
 
 Given that you want the input to be a simple character stream, rather than
 use a more elaborate lexer, the first thing to do is to specialise the
 parser type for your purposes:
 
 type TextParser a = Parser Char a
 
 Now, to recognise a mere digit,
 
 digit :: TextParser Char
 digit = satisfy Char.isDigit
 
 and for a sequence of digits forming an unsigned integer:
 
 integer :: TextParser Integer
 integer = do ds - many1 digit
  return (foldl1 (\n d- n*10+d)
 (map (fromIntegral.digitToInt) ds))
   `adjustErr` (++(expected one or more digits))
 
 I mean I'd like to be able to turn 12.05.2009 into something like (12,
 5, 2009) and got no clue what the code would have to look like. I do know
 almost every variation what the code must not look like :).
 
 date = do a - integer
   satisfy (=='.')
   b - integer
   satisfy (=='.')
   c - integer
   return (a,b,c)
 
 Of course, that is just the standard (strict) monadic interface used by many
 combinator libraries.  Your original desire was for lazy parsing, and to
 achieve that, you must move over to the applicative interface.  The key
 difference is that you cannot name intermediate values, but must construct
 larger values directly from smaller ones by something like function
 application.
 
 lazydate = return (,,) `apply` integer `discard` dot
`apply` integer `discard` dot
`apply` integer
where dot = satisfy (=='.')
 
 The (,,) is the constructor function for triples.  The `discard` combinator
 ensures that its second argument parses OK, but throws away its result,
 keeping only the result of its first argument.
 
 Apart from lazy space behaviour, the main observable difference between
 date and lazydate is when errors are reported on incorrect input.  For
 instance:
 
   fst $ runParser date 12.05..2009
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits
 
   fst $ runParser lazydate 12.05..2009
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits
 
 Notice how the lazy parser managed to build the first two elements of the
 triple, whilst the strict parser gave no value at all.
 
 I know that the error messages shown here are not entirely satisfactory, but
 they can be improved significantly just by making greater use of the
 `adjustErr` combinator in lots more places (it is rather like Parsec's ?).
  Errors containing positional information about the input can be constructed
 by introducing a separate lexical tokenizer, which is also not difficult.
 
 Regards,
Malcolm

Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread Jason Dusek
  So UU parsers can construct input? The presence of an
  empty list in the 2nd slot of the tuple is the only
  indicator of errors?

  For parsing datatypes without a sensible default value,
  what happens?

--
Jason Dusek
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread Stephen Tetley
Hi Günther

The code below should work for your simple example, provided it hasn't
lost formatting when I pasted it in to the email.

I was a bit surprised that there is no pSatisfy in this library, but
there are parsers for digits, lower case, upper case letters etc. in
the Examples module that would otherwise be achieved with pSatisfy.

Best wishes

Stephen



{-# LANGUAGE FlexibleContexts   #-}

module Demo1 where

import Text.ParserCombinators.UU.Examples
import Text.ParserCombinators.UU.Parsing


-- here's a simple character '@' parser
pAtSym :: Symbol p Char Char = p Char
pAtSym = pSym '@'

test_simple_char  = test pAtSym @
test_simple_char2 = test pAtSym @


-- pDigit is supplied in Text.ParserCombinators.UU.Examples
test_any_digit= test pDigit 6

-- pNatural is supplied in Text.ParserCombinators.UU.Examples
-- It looks like the most likely candidate to parse a
-- sequence of digits...

test_natural   = test pNatural 1234

--  ... and it is!

-- parse a date 12.05.2009 as a triple (Int,Int,Int)
pDateTriple :: (Symbol p (Char,Char) Char, Applicative p, ExtApplicative p st,
Provides st Char Char)
= p (Int,Int,Int)
pDateTriple = (,,) $ pNatural * pDot * pNatural * pDot * pNatural

pDot :: (Symbol p Char Char, Applicative p) = p [Char]
pDot = lift $ pSym '.'

test_date = test pDateTriple 12.05.2009
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread Malcolm Wallace
It is my pleasure to announce that after 5 days of experimenting  
with uu-parsinglib I have absolutely no clue, whatsoever, on how to  
use it.


I do not even manage to write a parser for even a mere digit or a  
simple character.


I don't know whether you will be willing to change over to polyparse  
library, but here are some hints about how you might use it.


Given that you want the input to be a simple character stream, rather  
than use a more elaborate lexer, the first thing to do is to  
specialise the parser type for your purposes:


 type TextParser a = Parser Char a

Now, to recognise a mere digit,

 digit :: TextParser Char
 digit = satisfy Char.isDigit

and for a sequence of digits forming an unsigned integer:

 integer :: TextParser Integer
 integer = do ds - many1 digit
  return (foldl1 (\n d- n*10+d)
 (map (fromIntegral.digitToInt) ds))
   `adjustErr` (++(expected one or more digits))

I mean I'd like to be able to turn 12.05.2009 into something like  
(12, 5, 2009) and got no clue what the code would have to look like.  
I do know almost every variation what the code must not look like :).


 date = do a - integer
   satisfy (=='.')
   b - integer
   satisfy (=='.')
   c - integer
   return (a,b,c)

Of course, that is just the standard (strict) monadic interface used  
by many combinator libraries.  Your original desire was for lazy  
parsing, and to achieve that, you must move over to the applicative  
interface.  The key difference is that you cannot name intermediate  
values, but must construct larger values directly from smaller ones by  
something like function application.


 lazydate = return (,,) `apply` integer `discard` dot
`apply` integer `discard` dot
`apply` integer
where dot = satisfy (=='.')

The (,,) is the constructor function for triples.  The `discard`  
combinator ensures that its second argument parses OK, but throws away  
its result, keeping only the result of its first argument.


Apart from lazy space behaviour, the main observable difference  
between date and lazydate is when errors are reported on incorrect  
input.  For instance:


   fst $ runParser date 12.05..2009
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

   fst $ runParser lazydate 12.05..2009
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

Notice how the lazy parser managed to build the first two elements of  
the triple, whilst the strict parser gave no value at all.


I know that the error messages shown here are not entirely  
satisfactory, but they can be improved significantly just by making  
greater use of the `adjustErr` combinator in lots more places (it is  
rather like Parsec's ?).  Errors containing positional information  
about the input can be constructed by introducing a separate lexical  
tokenizer, which is also not difficult.


Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread Guenther Schmidt

Dear Doaitse,

thank you very much for your help.



I am curious to know what made you go wrong with the tutorial, and 
caused that you could not find the solution below?


Well let's first agree that I'm not very bright. I hate to admit it, but 
it's a simple fact ;-).


Second let's agree that the uu-parsinglib is a *very* sophisticated 
beast, I have not seen anything else like it out there, my sincere 
congratulations for it. Thirdly the tutorial is also a very 
sophisticated beast, and forthly, well just see point 1 :-).


And I just figured out why I was unable to write even that simple parser.

The code you sent me works just fine, I copied and pasted it, no problems.

But, as soon as I comment out the main function the type checker 
complains, because now the ghci's type checker can no longer infer the 
types of pDate or pDot. And this is exactly what happened. I kept 
getting error messages from ghci, eventhough I had defined my parsers 
possible correctly, but, *minus* the type signatures *and* minus any 
main function that called it.


In hindsight I realize that this is a trap I have walked into many times 
before, I guess I still have not acquired a Haskellers intuition.


I promise to do better next time :)

Günther


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread Gü?nther Schmidt

Dear Malcom,

thanks for helping.

I had actually come to Haskell originally because of a parsing problem. 
I had been using Smalltalk until I started a project which required 
parsing files. Until then I had not done any RW parsing.


Well the route was more a Parsec - Haskell, wtf is Haskell? Anyway 
eventually I dropped Smalltalk and got addicted to Haskell. And managed 
 familiarize myself with Haskell and Parsec, the latter as it turned 
out I didn't even need to solve my original problem.


Anyway polyparse certainly is an option, but there are a few things that 
despite my list of failures to use it give uu-parsinglib a special 
appeal, the breadth-first approach with choice, I find that terrible 
elegant. Due to some kicks in my behind it seems that I might be able to 
use Doaitse's combinators now, some more details on that are in another 
post.



Günther


Malcolm Wallace schrieb:
It is my pleasure to announce that after 5 days of experimenting with 
uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.


I do not even manage to write a parser for even a mere digit or a 
simple character.


I don't know whether you will be willing to change over to polyparse 
library, but here are some hints about how you might use it.


Given that you want the input to be a simple character stream, rather 
than use a more elaborate lexer, the first thing to do is to specialise 
the parser type for your purposes:


  type TextParser a = Parser Char a

Now, to recognise a mere digit,

  digit :: TextParser Char
  digit = satisfy Char.isDigit

and for a sequence of digits forming an unsigned integer:

  integer :: TextParser Integer
  integer = do ds - many1 digit
   return (foldl1 (\n d- n*10+d)
  (map (fromIntegral.digitToInt) ds))
`adjustErr` (++(expected one or more digits))

I mean I'd like to be able to turn 12.05.2009 into something like 
(12, 5, 2009) and got no clue what the code would have to look like. I 
do know almost every variation what the code must not look like :).


  date = do a - integer
satisfy (=='.')
b - integer
satisfy (=='.')
c - integer
return (a,b,c)

Of course, that is just the standard (strict) monadic interface used by 
many combinator libraries.  Your original desire was for lazy parsing, 
and to achieve that, you must move over to the applicative interface.  
The key difference is that you cannot name intermediate values, but must 
construct larger values directly from smaller ones by something like 
function application.


  lazydate = return (,,) `apply` integer `discard` dot
 `apply` integer `discard` dot
 `apply` integer
 where dot = satisfy (=='.')

The (,,) is the constructor function for triples.  The `discard` 
combinator ensures that its second argument parses OK, but throws away 
its result, keeping only the result of its first argument.


Apart from lazy space behaviour, the main observable difference between 
date and lazydate is when errors are reported on incorrect input.  
For instance:


   fst $ runParser date 12.05..2009
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

   fst $ runParser lazydate 12.05..2009
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

Notice how the lazy parser managed to build the first two elements of 
the triple, whilst the strict parser gave no value at all.


I know that the error messages shown here are not entirely satisfactory, 
but they can be improved significantly just by making greater use of the 
`adjustErr` combinator in lots more places (it is rather like Parsec's 
?).  Errors containing positional information about the input can be 
constructed by introducing a separate lexical tokenizer, which is also 
not difficult.


Regards,
Malcolm



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread Stephen Tetley
Hi Günther


I suspect the problem you were having is that there are various
'parsers' (more correctly 'parser types') defined in
Text.ParserCombinators.UU.Parsing and the code you had in your running
example didn't always have enough information to allow GHC to pick a
particular one.

The /test/ function in Examples demands the parser to be of type 'P_m
state a' [1], so if you were running your parsers with /test/ in your
main function this would be give the parsers a concrete, inferable
type (if you hadn't given then a type signature). Once you comment out
main, the parsers have a more general type than 'P_m state a', which
can't be inferred due to class constraints.

Maybe the 'haskeller's inituition' in this instance is to define the
type signatures and the functions at the same time, admittedly this
can be difficult for functions with heavy use of type classes.

Best wishes

Stephen


[1] I'm afraid I don't know the intricacies of the particular types in
the new UU parsing library, until this morning I'd only used the
previous version in uulib.


2009/5/31 Guenther Schmidt gue.schm...@web.de:
 Dear Doaitse,

 thank you very much for your help.


 I am curious to know what made you go wrong with the tutorial, and caused
 that you could not find the solution below?

 Well let's first agree that I'm not very bright. I hate to admit it, but
 it's a simple fact ;-).

 Second let's agree that the uu-parsinglib is a *very* sophisticated beast, I
 have not seen anything else like it out there, my sincere congratulations
 for it. Thirdly the tutorial is also a very sophisticated beast, and
 forthly, well just see point 1 :-).

 And I just figured out why I was unable to write even that simple parser.

 The code you sent me works just fine, I copied and pasted it, no problems.

 But, as soon as I comment out the main function the type checker
 complains, because now the ghci's type checker can no longer infer the types
 of pDate or pDot. And this is exactly what happened. I kept getting error
 messages from ghci, eventhough I had defined my parsers possible correctly,
 but, *minus* the type signatures *and* minus any main function that called
 it.

 In hindsight I realize that this is a trap I have walked into many times
 before, I guess I still have not acquired a Haskellers intuition.

 I promise to do better next time :)

 Günther


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy Parsing

2009-05-31 Thread S . Doaitse Swierstra

Dear Gunther,


I am providing my solution, on which one can of course specialise in  
making sure that a valid date is parsed, which would be a bit more  
cumbersome; how should e.g. error correction be done. I prefer to test  
afterwards in such situations.


Best,
Doaitse



module Guenther where
import Text.ParserCombinators.UU.Parsing
import Text.ParserCombinators.UU.BasicInstances
import Text.ParserCombinators.UU.Examples hiding (main)
import Control.Applicative hiding ((*), (*), ($))

{- The first decision we have to make is what kind of input we are  
providing. The simplest case is just to assume simple characters,  
hence for our input type we will use the standard provided stream of  
Characters: Str Char, so we use the type of our parsers to be the type  
used  in the Examples module; since we do not know whether we wil be  
using the parsers in a monadic mode too we stay on the safe side ans  
use the type P_m -}


type GP a = P_m (Str Char) a  -- GP stands for GuenterParser

{- Once we know that our input contains characters, but that in our  
output we what to have integer values, we start out by building a  
parser for a single integer , for which we use the function pNatural  
form the examples-}


pDate = (,,) $ pNatural * pDot * pNatural * pDot *  
(pNatural ::GP Int)

pDot  = pSym '.'
{-
main = do print (test pDate 3.4.1900)
  print (test pDate 3 4 1900)
  print (test pDate ..1900)-}

-- end of Module Guenther

By playing with insertion and deletion costs (e.g. by building a more  
picky pNatural) one can control the error recovery. Another option to  
get better error recovery would be to define a specialised instance of  
Provides which removes spaces. You might even temporarily pSwitch to  
the use of this state





Period.

I do not even manage to write a parser for even a mere digit or a  
simple character. I have read the tutorial from a to a to z and from  
z to a and there were a few words I recognized.


I mean I'd like to be able to turn 12.05.2009 into something like  
(12, 5, 2009) and got no clue what the code would have to look like.  
I do know almost every variation what the code must not look like :).


I am guessing here that when one does define a parsing function,  
since all the parser combinators aren't function but methods, one  
*must* also provide a type signature so that the compiler knows the  
actual *instance* method?



Günther


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Lazy Parsing

2009-05-30 Thread GüŸnther Schmidt

Dear Doaitse,

It is my pleasure to announce that after 5 days of experimenting with 
uu-parsinglib I have absolutely no clue, whatsoever, on how to use it.


Period.

I do not even manage to write a parser for even a mere digit or a simple 
character. I have read the tutorial from a to a to z and from z to a and 
there were a few words I recognized.


I mean I'd like to be able to turn 12.05.2009 into something like (12, 
5, 2009) and got no clue what the code would have to look like. I do 
know almost every variation what the code must not look like :).


I am guessing here that when one does define a parsing function, since 
all the parser combinators aren't function but methods, one *must* also 
provide a type signature so that the compiler knows the actual 
*instance* method?



Günther


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing

2009-05-29 Thread S. Doaitse Swierstra
Lazy parsing has been the default for the last ten years in uulib, and  
is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib 
). The whole design of the latter in described in a technical report  
to which references are given on the web page. It provides also error  
correction, the ability to use several different kinds of input  
tokens, and (with some help) ambiguities. If speed is an issue you can  
insert extra hints which locally change the breadth-first parsing  
process locally into a somewhat more depth-first form. When compared  
with Parsec the good news is that usually you do not have to put  
annotations to get nice results.


The older uulib version also performs an abstract interpretation which  
basically changes the search for which alternative to take from a  
linear to a logarithmic complexity, but does not provide a monadic  
structure, in which you use results recognised thus far to construct  
new parsers.


Both the old uulib version and the new version have always had an  
applicative interface.


In the near future elements of the abstract interpretation of the old  
uulib version will migrate into the new version. It is the advent of  
GADT's which made this new version feasable.


An example of the error correction at work at the following example  
code:


pa, pb, paz :: P_m (Str  Char) [Char]
pa = lift $ pSym 'a'
pb = lift $ pSym 'b'
p ++ q = (++) $ p * q
pa2 =   pa ++ pa
pa3 =   pa ++ pa2

pCount p = (\ a b - b+1) $ p * pCount p | pReturn 0
pExact 0 p = pReturn []
pExact n p = (:) $ p * pExact (n-1) p

paz = pMany (pSym ('a', 'z'))

paz' = pSym (\t - 'a' = t  t = 'z', a .. z, 'k')

main :: IO ()
main = do print (test pa a)
  print (test pa b)
  print (test pa2 bbab)
  print (test pa ba)
  print (test pa aa)
  print (test  (do  l - pCount pa
pExact l pb) aaacabbb)
  print (test (amb ( (++) $ pa2 * pa3 | (++) $ pa3  
* pa2))  aaabaa)

  print (test paz ab1z7)
  print (test paz' m)
  print (test paz' )


is

loeki:~ doaitse$ ghci -package uu-parsinglib
GHCi, version 6.10.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Loading package syb ... linking ... done.
Loading package array-0.2.0.0 ... linking ... done.
Loading package filepath-1.1.0.1 ... linking ... done.
Loading package old-locale-1.0.0.1 ... linking ... done.
Loading package old-time-1.0.0.1 ... linking ... done.
Loading package unix-2.3.1.0 ... linking ... done.
Loading package directory-1.0.0.2 ... linking ... done.
Loading package process-1.0.1.1 ... linking ... done.
Loading package random-1.0.0.1 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.0.0 ... linking ... done.
Prelude :m Text.ParserCombinators.UU.Examples
Prelude Text.ParserCombinators.UU.Examples main
(a,[])
(a,[
Deleted  'b' at position 0 expecting one of ['a'],
Inserted 'a' at position 1 expecting one of ['a']])
(aa,[
Deleted  'b' at position 0 expecting one of ['a'],
Deleted  'b' at position 1 expecting one of ['a'],
Deleted  'b' at position 3 expecting one of ['a'],
Inserted 'a' at position 4 expecting one of ['a']])
(a,[
Deleted  'b' at position 0 expecting one of ['a']])
(a,[
The token 'a'was not consumed by the parsing process.])
([b,b,b,b],[
Deleted  'c' at position 3 expecting one of ['a','b'],
Inserted 'b' at position 8 expecting one of ['b']])
([a],[
Deleted  'b' at position 3 expecting one of ['a','a']])
(abz,[
Deleted  '1' at position 2 expecting one of ['a'..'z'],
The token '7'was not consumed by the parsing process.])
('m',[])
('k',[
Inserted 'k' at position 0 expecting one of [a .. z]])
Prelude Text.ParserCombinators.UU.Examples

Doaitse Swierstra





On 27 mei 2009, at 01:52, GüŸnther Schmidt wrote:


Hi all,

is it possible to do lazy parsing with Parsec? I understand that one  
can do that with polyparse, don't know about uulib, but I happen to  
be already somewhat familiar with Parsec, so before I do switch to  
polyparse I rather make sure I actually have to.


The files it has to parse is anywhere from 500 MB to 5 GB.


Günther

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing

2009-05-29 Thread S. Doaitse Swierstra
In the uu-parsinglib we actually have two versions of parsers: lazy  
ones and strict ones, which have different types. So by giving a type  
annotation you can select the one you want. Notice that in the left- 
hand side of a monadic construct it does not make sense to use a lazy  
parser, since its result will be used as a parameter to the right-hand  
side operator, so in case of a monad our library system automagically  
selects the strict version for the left hand side. For the right hand  
side it depends on the type of the overall expression. Unfortunately  
in Haskell both the left and right hand side of a bind need the to be  
elements of the same monad, whereas in the case of a lazy oevrall  
parser this is not the case.  We solve this problem by tupling the two  
parsers (NOT the parsing results), so still the do-notation can be used.


The use of the library is free of any trickery!

Doaitse Swierstra


On 28 mei 2009, at 11:41, Malcolm Wallace wrote:


Henning Thielemann schlepp...@henning-thielemann.de wrote:


I don't think that it is in general possible to use the same parser
for lazy and strict parsing, just because of the handling of parser
failure.


Polyparse demonstrates that you can mix-and-match lazy parsers with
strict parsers in the different parts of a grammar (by choosing  
whether
to use applicative or monadic style).  You can also switch between  
lazy

or strict interpretations of the applicative parts of your grammar (by
changing the import that decides which version of the parser  
primitives

is in scope).

I also used polyparse for lazy parsing, but I found it unintuitive  
how

to make a parser lazy.


It can certainly be tricky, and requires a certain amount of
experimentation.  I think the difficulties are mainly due to the mix  
of

lazy (applicative) and strict (monadic) styles in different
non-terminals.  A parser that you intend to be lazy, may turn out to  
be
stricter than you hope, because of the strictness of another parser  
that

it depends upon.

Regards,
   Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing

2009-05-29 Thread Guenther Schmidt

Dear Doaitse,

In the days since my original post I had already come to favor the
uu-parsing package. I have printed the report and read it every day to
figure out how to use it. I cannot follow everything yet, and also hope
that won't be necessary in order to use it. :-)

My progress is a bit slow, but I'm not giving up. What I do like most,
over the other combinatory packages, is the approach of using
breadth-first when it comes to choice, the idea is certainly
enlightening. The packages capability to do online- / partial parsing
is essential for me.

I am a bit surprised about it's raw state. The basic combinators and
primitives are there but combinators like pChain, pDigit etc. are not
predefined and merely present in the examples package.

I had gotten quite comfortable with parsec and need to find the right
way to translate my parsec code to your package.

Anyway let me thank you for your work, I really appreciate it very much.


Günther




S. Doaitse Swierstra schrieb:
Lazy parsing has been the default for the last ten years in uulib, and 
is now available in the simple uu-parsinglib 
(http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib). 
The whole design of the latter in described in a technical report to 
which references are given on the web page. It provides also error 
correction, the ability to use several different kinds of input 
tokens, and (with some help) ambiguities. If speed is an issue you can 
insert extra hints which locally change the breadth-first parsing 
process locally into a somewhat more depth-first form. When compared 
with Parsec the good news is that usually you do not have to put 
annotations to get nice results.


The older uulib version also performs an abstract interpretation which 
basically changes the search for which alternative to take from a 
linear to a logarithmic complexity, but does not provide a monadic 
structure, in which you use results recognised thus far to construct 
new parsers.


Both the old uulib version and the new version have always had an 
applicative interface.


In the near future elements of the abstract interpretation of the old 
uulib version will migrate into the new version. It is the advent of 
GADT's which made this new version feasable.


An example of the error correction at work at the following example code:

pa, pb, paz :: P_m (Str Char) [Char]
pa = lift $ pSym 'a'
pb = lift $ pSym 'b'
p ++ q = (++) $ p * q
pa2 = pa ++ pa
pa3 = pa ++ pa2

pCount p = (\ a b - b+1) $ p * pCount p | pReturn 0
pExact 0 p = pReturn []
pExact n p = (:) $ p * pExact (n-1) p

paz = pMany (pSym ('a', 'z'))

paz' = pSym (\t - 'a' = t  t = 'z', a .. z, 'k')

main :: IO ()
main = do print (test pa a)
print (test pa b)
print (test pa2 bbab)
print (test pa ba)
print (test pa aa)
print (test (do l - pCount pa
pExact l pb) aaacabbb)
print (test (amb ( (++) $ pa2 * pa3 | (++) $ pa3 * pa2)) 
aaabaa)

print (test paz ab1z7)
print (test paz' m)
print (test paz' )


is

loeki:~ doaitse$ ghci -package uu-parsinglib
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Loading package syb ... linking ... done.
Loading package array-0.2.0.0 ... linking ... done.
Loading package filepath-1.1.0.1 ... linking ... done.
Loading package old-locale-1.0.0.1 ... linking ... done.
Loading package old-time-1.0.0.1 ... linking ... done.
Loading package unix-2.3.1.0 ... linking ... done.
Loading package directory-1.0.0.2 ... linking ... done.
Loading package process-1.0.1.1 ... linking ... done.
Loading package random-1.0.0.1 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.0.0 ... linking ... done.
Prelude :m Text.ParserCombinators.UU.Examples
Prelude Text.ParserCombinators.UU.Examples main
(a,[])
(a,[
Deleted 'b' at position 0 expecting one of ['a'],
Inserted 'a' at position 1 expecting one of ['a']])
(aa,[
Deleted 'b' at position 0 expecting one of ['a'],
Deleted 'b' at position 1 expecting one of ['a'],
Deleted 'b' at position 3 expecting one of ['a'],
Inserted 'a' at position 4 expecting one of ['a']])
(a,[
Deleted 'b' at position 0 expecting one of ['a']])
(a,[
The token 'a'was not consumed by the parsing process.])
([b,b,b,b],[
Deleted 'c' at position 3 expecting one of ['a','b'],
Inserted 'b' at position 8 expecting one of ['b']])
([a],[
Deleted 'b' at position 3 expecting one of ['a','a']])
(abz,[
Deleted '1' at position 2 expecting one of ['a'..'z'],
The token '7'was not consumed by the parsing process.])
('m',[])
('k',[
Inserted 'k' at position 0 expecting one of [a .. z]])
Prelude Text.ParserCombinators.UU.Examples

Doaitse Swierstra





On 27 mei 2009, at 01:52, GüŸnther Schmidt wrote:


Hi all,

is it possible to do lazy parsing with Parsec? I understand that one 
can do that with polyparse, don't know about uulib, but I happen to 
be already

Re: [Haskell-cafe] Lazy Parsing

2009-05-28 Thread Malcolm Wallace
Henning Thielemann schlepp...@henning-thielemann.de wrote:

 I don't think that it is in general possible to use the same parser
 for lazy and strict parsing, just because of the handling of parser
 failure.

Polyparse demonstrates that you can mix-and-match lazy parsers with
strict parsers in the different parts of a grammar (by choosing whether
to use applicative or monadic style).  You can also switch between lazy
or strict interpretations of the applicative parts of your grammar (by
changing the import that decides which version of the parser primitives
is in scope).

 I also used polyparse for lazy parsing, but I found it unintuitive how
 to make a parser lazy.

It can certainly be tricky, and requires a certain amount of
experimentation.  I think the difficulties are mainly due to the mix of
lazy (applicative) and strict (monadic) styles in different
non-terminals.  A parser that you intend to be lazy, may turn out to be
stricter than you hope, because of the strictness of another parser that
it depends upon.

Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing

2009-05-27 Thread Henning Thielemann
GüŸnther Schmidt schrieb:
 Hi all,
 
 is it possible to do lazy parsing with Parsec? I understand that one can
 do that with polyparse, don't know about uulib, but I happen to be
 already somewhat familiar with Parsec, so before I do switch to
 polyparse I rather make sure I actually have to.
 
 The files it has to parse is anywhere from 500 MB to 5 GB.

I don't think that it is in general possible to use the same parser for
lazy and strict parsing, just because of the handling of parser failure.
If parser failure is denoted by a Left constructor in (Either Reason
Result) then the whole parsing process must be finished, before the
parser knows whether the answer is Left or Right.
I also used polyparse for lazy parsing, but I found it unintuitive how
to make a parser lazy. I tried to do better in tagchup, where I make
explicit in the type, whether a parser can fail or not. In the first
case in cannot be lazy, in the second case it can. I also did lazy
parsing in 'midi' package and in 'spreadsheet'. I liked to factor out a
lazy parser library from them, but I failed to unify all these
applications. At least I have factored out handling of lazy failure (aka
asnychronous exceptions) in explicit-exception package.

Btw. a good place to discuss such issues is our local Haskell meeting
that takes place on 2009-06-12:
   http://iba-cg.de/hal4.html

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing

2009-05-27 Thread Henning Thielemann


On Wed, 27 May 2009, Gü?nther Schmidt wrote:

is it possible to do lazy parsing with Parsec? I understand that one can do 
that with polyparse, don't know about uulib, but I happen to be already 
somewhat familiar with Parsec, so before I do switch to polyparse I rather 
make sure I actually have to.


Also see
  http://www.haskell.org/haskellwiki/Maintaining_laziness___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lazy Parsing

2009-05-26 Thread GüŸnther Schmidt

Hi all,

is it possible to do lazy parsing with Parsec? I understand that one can 
do that with polyparse, don't know about uulib, but I happen to be 
already somewhat familiar with Parsec, so before I do switch to 
polyparse I rather make sure I actually have to.


The files it has to parse is anywhere from 500 MB to 5 GB.


Günther

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Lazy Parsing

2002-02-28 Thread Joe English


Brandon Michael Moore wrote:

 I'm wondering if there are any libraries out there for creating parsers
 that lazily build up their result. I know I could thread the remaining
 input through a parser by hand, but it seems like someone should have
 already done it.

This turns out to be rather difficult to do in the general case
(but see below -- XML is a special case).

If you have

 type Parser sym result = [sym] - Maybe (result, [sym])

a Parser can't decide whether to return 'Just (result,rest)'
or 'Nothing' until it has successfully parsed the complete result.
So pattern matching on the parser's return value will force
the entire production.  Variations on the theme -- Either instead
of Maybe, list-of-successes, continuation-passing combinators, etc --
all face a similar problem.

However, if your top-level grammar is of the form:

things :: empty | thing things {- == thing* -}

then instead of:

case runParser (pMany pThing) input of Just (result,[]) - ...

you can use something like

unfoldr (runParser pThing) input

to build the result list incrementally.  This will be less eager;
instead of parsing and returning an entire list of Things, it
parses one Thing at a time.

Another thing to watch out for is heap drag.  The list-of-successes
approach tends to retain the entire input, just in case the parser
needs to backtrack.  Parsec [1] and UU_Parsing [?] solve this
by severely restricting the amount of required lookahead.

 I'd like to be able to turn a stream of XML into a lazy tree of tags
 (probably Maybe tags, or Either errors tags), but I don't think HaXml and
 the like do that sort of thing.

That's exactly how HXML [2] works.  The  parser returns a lazy
list of tokens (analogous to SAX events), which are folded up
into a tree by a separate function.  In addition it uses a CPS
parser library so (as with Parsec), there is minimal heap drag.

[1] Parsec: URL: http://www.cs.ruu.nl/~daan/parsec.html 
[1] HXML:   URL: http://www.flightlab.com/~joe/hxml 

(Note: HXML release 0.2 will be ready Real Soon Now, and there have been
many incompatible changes since 0.1.  The main thing left to be finished
is the documentation, if you can live without that let me know and I'll
put a snapshot up.)

--Joe English

  [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Lazy Parsing

2002-02-27 Thread Brandon Michael Moore

I'm wondering if there are any libraries out there for creating parsers
that lazily build up their result. I know I could thread the remaining
input through a parser by hand, but it seems like someone should have
already done it.

I'd like to be able to turn a stream of XML into a lazy tree of tags
(probably Maybe tags, or Either errors tags), but I don't think HaXml and
the like do that sort of thing.

Branodn Moore

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: Lazy Parsing

2002-02-27 Thread John Hughes

There's a combinator which Phil Wadler called guarantee which makes a
parser lazy -- guarantee p succeeds at once, with a result which will
be produced, when demanded, by p. Many parsing libraries include it under
one name or another...

John
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe