Re: Parser.y rewrite with parser combinators

2018-10-16 Thread Simon Marlow
I personally love to hack things up with parser combinators, but for
anything longer term where I want a degree of confidence that changes
aren't going to introduce new problems I'd still use Happy. Yes it's a
total pain sometimes, and LALR(1) is very restrictive, but I wouldn't want
to lose the guarantees of unambiguity and performance. We have *always* had
to shoehorn the Haskell grammar into LALR(1) - patterns and expressions had
to be parsed using the same grammar fragment from the start due to the list
comprehension syntax. And some post-processing is inevitable - it's
technically not possible to parse Haskell without rearranging infix
expressions later, because you don't know the fixities of imported
operators.  And layout is truly horrible to deal with - Happy's error token
is designed purely to handle the layout rule, and it differs in semantics
from yacc's error token for this reason (that is, if yacc's error token has
a semantics, I could never figure out what it was supposed to do). Dealing
with layout using parser combinators would probably require at least one
layer of backtracking in addition to whatever other backtracking you needed
to handle the other parts of the grammar.

Cheers
Simon


On Tue, 9 Oct 2018 at 15:18, Sven Panne  wrote:

> Am Di., 9. Okt. 2018 um 15:45 Uhr schrieb Richard Eisenberg <
> r...@cs.brynmawr.edu>:
>
>> [...] What I'm trying to say here is that tracking the backtracking level
>> in types doesn't seem like it will fly (tempting though it may be).
>>
>
> ... and even if it did fly, parser combinators with backtracking have a
> strong tendency to introduce space leaks: To backtrack, you have too keep
> previous input somehow, at least up to some point. So to keep the memory
> requirements sane, you have to explicitly commit to one parse or another at
> some point. Different combinator libraries have different ways to do that,
> but you have to do that by hand somehow, and that's where the beauty and
> maintainability of the combinator approach really suffers.
>
> Note that I'm not against parser combinators, far from it, but I don't
> think they are necessarily the right tool for the problem at hand. The
> basic problem is: Haskell's syntax, especially with all those extensions,
> is quite tricky, and this will be reflected in any parser for it. IMHO a
> parser generator is the lesser evil here, at least it points you to the
> ugly places of your language (on a syntactic level). If Haskell had a few
> more syntactic hints, reading code would be easier, not only for a
> compiler, but (more importantly) for humans, too. Richard's code snippet is
> a good example where some hint would be very useful for the casual reader,
> in some sense humans have to "backtrack", too, when reading such code.
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Spiwack, Arnaud
On Tue, Oct 9, 2018 at 1:09 PM Vladislav Zavialov 
wrote:

> In fact, we do have a fair share of boilerplate in our current grammar
> due to lack of parametrisation. That's another issue that would be
> solved by parser combinators (or by a fancier parser generator, but
> I'm not aware of such one).
>

There is the Menhir [1] parser generator. It provides decent abstraction
mechanism. It also generate LR parsers, hence is more flexible than Happy.
And generates incremental parsers.

Of course, one would have to make it output Haskell a Haskell parser first.
But considering the effort it seem to have been to add a Coq backend, I'd
assume it would be less of an effort than porting the entire GHC grammar to
parser combinators. (one hard thing would be to decide how to replace the
ML functor mechanism that Menhir uses to generate functions from parsers to
parsers)

[1]: http://gallium.inria.fr/~fpottier/menhir/
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Sven Panne
Am Di., 9. Okt. 2018 um 15:45 Uhr schrieb Richard Eisenberg <
r...@cs.brynmawr.edu>:

> [...] What I'm trying to say here is that tracking the backtracking level
> in types doesn't seem like it will fly (tempting though it may be).
>

... and even if it did fly, parser combinators with backtracking have a
strong tendency to introduce space leaks: To backtrack, you have too keep
previous input somehow, at least up to some point. So to keep the memory
requirements sane, you have to explicitly commit to one parse or another at
some point. Different combinator libraries have different ways to do that,
but you have to do that by hand somehow, and that's where the beauty and
maintainability of the combinator approach really suffers.

Note that I'm not against parser combinators, far from it, but I don't
think they are necessarily the right tool for the problem at hand. The
basic problem is: Haskell's syntax, especially with all those extensions,
is quite tricky, and this will be reflected in any parser for it. IMHO a
parser generator is the lesser evil here, at least it points you to the
ugly places of your language (on a syntactic level). If Haskell had a few
more syntactic hints, reading code would be easier, not only for a
compiler, but (more importantly) for humans, too. Richard's code snippet is
a good example where some hint would be very useful for the casual reader,
in some sense humans have to "backtrack", too, when reading such code.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
I agree with you. This example puts a nail on the coffin of the
backtracking approach.

I will have to think of something else, and at this point a full
rewrite to parser combinators does not seem as appealing.

Thanks!

On Tue, Oct 9, 2018 at 4:45 PM Richard Eisenberg  wrote:
>
> I think one problem is that we don't even have bounded levels of 
> backtracking, because (with view patterns) you can put expressions into 
> patterns.
>
> Consider
>
> > f = do K x (z -> ...
>
> Do we have a constructor pattern with a view pattern inside it? Or do we have 
> an expression with a required visible type application and a function type? 
> (This last bit will be possible only with visible dependent quantification in 
> terms, but I'm confident that Vlad will appreciate the example.) We'll need 
> nested backtracking to sort this disaster out -- especially if we have 
> another `do` in the ...
>
> What I'm trying to say here is that tracking the backtracking level in types 
> doesn't seem like it will fly (tempting though it may be).
>
> Richard
>
> > On Oct 9, 2018, at 7:08 AM, Vladislav Zavialov  
> > wrote:
> >
> > It's a nice way to look at the problem, and we're facing the same
> > issues as with insufficiently powerful type systems. LALR is the Go of
> > parsing in this case :)
> >
> > I'd rather write Python and have a larger test suite than deal with
> > lack of generics in Go, if you allow me to take the analogy that far.
> >
> > In fact, we do have a fair share of boilerplate in our current grammar
> > due to lack of parametrisation. That's another issue that would be
> > solved by parser combinators (or by a fancier parser generator, but
> > I'm not aware of such one).
> >
> > On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones  
> > wrote:
> >>
> >> We all love strong guarantees offered by type checking, but somehow most 
> >> people shy away from "syntactic type checking" offered by parser 
> >> generators. Parser combinators are the Python of parsing: Easy to use 
> >> initially, but a maintenance hell in the long run for larger projects...
> >>
> >> I’d never thought of it that way before – interesting.
> >>
> >>
> >>
> >> Simon
> >>
> >>
> >>
> >> From: ghc-devs  On Behalf Of Sven Panne
> >> Sent: 09 October 2018 08:23
> >> To: vlad.z.4...@gmail.com
> >> Cc: GHC developers 
> >> Subject: Re: Parser.y rewrite with parser combinators
> >>
> >>
> >>
> >> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov 
> >> :
> >>
> >> [...] That's true regardless of implementation technique, parsers are 
> >> rather
> >> delicate.
> >>
> >>
> >>
> >> I think it's not the parsers themselves which are delicate, it is the 
> >> language that they should recognize.
> >>
> >>
> >>
> >> A LALR-based parser generator does provide more information
> >> when it detects shift/reduce and reduce/reduce conflicts, but I never
> >> found this information useful. It was always quite the opposite of
> >> being helpful - an indication that a LALR parser could not handle my
> >> change and I had to look for workarounds. [...]
> >>
> >>
> >>
> >> Not that this would help at this point, but: The conflicts reported by 
> >> parser generators like Happy are *extremely* valuable, they hint at 
> >> tricky/ambiguous points in the grammar, which in turn is a strong hint 
> >> that the language you're trying to parse has dark corners. IMHO every 
> >> language designer and e.g. everybody proposing a syntactic extension to 
> >> GHC should try to fit this into a grammar for Happy *before* proposing 
> >> that extension. If you get conflicts, it is a very strong hint that the 
> >> language is hard to parse by *humans*, too, which is the most important 
> >> thing to consider. Haskell already has tons of syntactic warts which can 
> >> only be parsed by infinite lookahead, which is only a minor technical 
> >> problem, but a major usablity problem. "Programs are meant to be read by 
> >> humans and only incidentally for computers to execute." (D.E.K.)  
> >> ;-)
> >>
> >>
> >>
> >> The situation is a bit strange: We all love strong guarantees offered by 
> >> type checking, but somehow most people shy away from "syntactic type 
> >> checking" offered by parser generators. Parser combinators are the Python 
> >> of parsing: Easy to use initially, but a maintenance hell in the long run 
> >> for larger projects...
> >>
> >>
> >>
> >> Cheers,
> >>
> >>   S.
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Richard Eisenberg
I think one problem is that we don't even have bounded levels of backtracking, 
because (with view patterns) you can put expressions into patterns.

Consider

> f = do K x (z -> ...

Do we have a constructor pattern with a view pattern inside it? Or do we have 
an expression with a required visible type application and a function type? 
(This last bit will be possible only with visible dependent quantification in 
terms, but I'm confident that Vlad will appreciate the example.) We'll need 
nested backtracking to sort this disaster out -- especially if we have another 
`do` in the ...

What I'm trying to say here is that tracking the backtracking level in types 
doesn't seem like it will fly (tempting though it may be).

Richard

> On Oct 9, 2018, at 7:08 AM, Vladislav Zavialov  wrote:
> 
> It's a nice way to look at the problem, and we're facing the same
> issues as with insufficiently powerful type systems. LALR is the Go of
> parsing in this case :)
> 
> I'd rather write Python and have a larger test suite than deal with
> lack of generics in Go, if you allow me to take the analogy that far.
> 
> In fact, we do have a fair share of boilerplate in our current grammar
> due to lack of parametrisation. That's another issue that would be
> solved by parser combinators (or by a fancier parser generator, but
> I'm not aware of such one).
> 
> On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones  
> wrote:
>> 
>> We all love strong guarantees offered by type checking, but somehow most 
>> people shy away from "syntactic type checking" offered by parser generators. 
>> Parser combinators are the Python of parsing: Easy to use initially, but a 
>> maintenance hell in the long run for larger projects...
>> 
>> I’d never thought of it that way before – interesting.
>> 
>> 
>> 
>> Simon
>> 
>> 
>> 
>> From: ghc-devs  On Behalf Of Sven Panne
>> Sent: 09 October 2018 08:23
>> To: vlad.z.4...@gmail.com
>> Cc: GHC developers 
>> Subject: Re: Parser.y rewrite with parser combinators
>> 
>> 
>> 
>> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov 
>> :
>> 
>> [...] That's true regardless of implementation technique, parsers are rather
>> delicate.
>> 
>> 
>> 
>> I think it's not the parsers themselves which are delicate, it is the 
>> language that they should recognize.
>> 
>> 
>> 
>> A LALR-based parser generator does provide more information
>> when it detects shift/reduce and reduce/reduce conflicts, but I never
>> found this information useful. It was always quite the opposite of
>> being helpful - an indication that a LALR parser could not handle my
>> change and I had to look for workarounds. [...]
>> 
>> 
>> 
>> Not that this would help at this point, but: The conflicts reported by 
>> parser generators like Happy are *extremely* valuable, they hint at 
>> tricky/ambiguous points in the grammar, which in turn is a strong hint that 
>> the language you're trying to parse has dark corners. IMHO every language 
>> designer and e.g. everybody proposing a syntactic extension to GHC should 
>> try to fit this into a grammar for Happy *before* proposing that extension. 
>> If you get conflicts, it is a very strong hint that the language is hard to 
>> parse by *humans*, too, which is the most important thing to consider. 
>> Haskell already has tons of syntactic warts which can only be parsed by 
>> infinite lookahead, which is only a minor technical problem, but a major 
>> usablity problem. "Programs are meant to be read by humans and only 
>> incidentally for computers to execute." (D.E.K.)  ;-)
>> 
>> 
>> 
>> The situation is a bit strange: We all love strong guarantees offered by 
>> type checking, but somehow most people shy away from "syntactic type 
>> checking" offered by parser generators. Parser combinators are the Python of 
>> parsing: Easy to use initially, but a maintenance hell in the long run for 
>> larger projects...
>> 
>> 
>> 
>> Cheers,
>> 
>>   S.
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
It's a nice way to look at the problem, and we're facing the same
issues as with insufficiently powerful type systems. LALR is the Go of
parsing in this case :)

I'd rather write Python and have a larger test suite than deal with
lack of generics in Go, if you allow me to take the analogy that far.

In fact, we do have a fair share of boilerplate in our current grammar
due to lack of parametrisation. That's another issue that would be
solved by parser combinators (or by a fancier parser generator, but
I'm not aware of such one).

On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones  wrote:
>
> We all love strong guarantees offered by type checking, but somehow most 
> people shy away from "syntactic type checking" offered by parser generators. 
> Parser combinators are the Python of parsing: Easy to use initially, but a 
> maintenance hell in the long run for larger projects...
>
> I’d never thought of it that way before – interesting.
>
>
>
> Simon
>
>
>
> From: ghc-devs  On Behalf Of Sven Panne
> Sent: 09 October 2018 08:23
> To: vlad.z.4...@gmail.com
> Cc: GHC developers 
> Subject: Re: Parser.y rewrite with parser combinators
>
>
>
> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov 
> :
>
> [...] That's true regardless of implementation technique, parsers are rather
> delicate.
>
>
>
> I think it's not the parsers themselves which are delicate, it is the 
> language that they should recognize.
>
>
>
> A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds. [...]
>
>
>
> Not that this would help at this point, but: The conflicts reported by parser 
> generators like Happy are *extremely* valuable, they hint at tricky/ambiguous 
> points in the grammar, which in turn is a strong hint that the language 
> you're trying to parse has dark corners. IMHO every language designer and 
> e.g. everybody proposing a syntactic extension to GHC should try to fit this 
> into a grammar for Happy *before* proposing that extension. If you get 
> conflicts, it is a very strong hint that the language is hard to parse by 
> *humans*, too, which is the most important thing to consider. Haskell already 
> has tons of syntactic warts which can only be parsed by infinite lookahead, 
> which is only a minor technical problem, but a major usablity problem. 
> "Programs are meant to be read by humans and only incidentally for computers 
> to execute." (D.E.K.)  ;-)
>
>
>
> The situation is a bit strange: We all love strong guarantees offered by type 
> checking, but somehow most people shy away from "syntactic type checking" 
> offered by parser generators. Parser combinators are the Python of parsing: 
> Easy to use initially, but a maintenance hell in the long run for larger 
> projects...
>
>
>
> Cheers,
>
>S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Parser.y rewrite with parser combinators

2018-10-09 Thread Simon Peyton Jones via ghc-devs
We all love strong guarantees offered by type checking, but somehow most people 
shy away from "syntactic type checking" offered by parser generators. Parser 
combinators are the Python of parsing: Easy to use initially, but a maintenance 
hell in the long run for larger projects...
I’d never thought of it that way before – interesting.

Simon

From: ghc-devs  On Behalf Of Sven Panne
Sent: 09 October 2018 08:23
To: vlad.z.4...@gmail.com
Cc: GHC developers 
Subject: Re: Parser.y rewrite with parser combinators

Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov 
mailto:vlad.z.4...@gmail.com>>:
[...] That's true regardless of implementation technique, parsers are rather
delicate.

I think it's not the parsers themselves which are delicate, it is the language 
that they should recognize.

A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds. [...]

Not that this would help at this point, but: The conflicts reported by parser 
generators like Happy are *extremely* valuable, they hint at tricky/ambiguous 
points in the grammar, which in turn is a strong hint that the language you're 
trying to parse has dark corners. IMHO every language designer and e.g. 
everybody proposing a syntactic extension to GHC should try to fit this into a 
grammar for Happy *before* proposing that extension. If you get conflicts, it 
is a very strong hint that the language is hard to parse by *humans*, too, 
which is the most important thing to consider. Haskell already has tons of 
syntactic warts which can only be parsed by infinite lookahead, which is only a 
minor technical problem, but a major usablity problem. "Programs are meant to 
be read by humans and only incidentally for computers to execute." (D.E.K.) 
 ;-)

The situation is a bit strange: We all love strong guarantees offered by type 
checking, but somehow most people shy away from "syntactic type checking" 
offered by parser generators. Parser combinators are the Python of parsing: 
Easy to use initially, but a maintenance hell in the long run for larger 
projects...

Cheers,
   S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
>  backtrack while backtracking <...> I can almost guarantee that this will 
> happen unless you use formal methods

That is a great idea, I can track backtracking depth in a type-level
natural number and make sure it doesn't go over 1 (or add
justification with performance analysis when it does). Formal methods
for the win :-)
On Tue, Oct 9, 2018 at 10:27 AM Sven Panne  wrote:
>
> Am Di., 9. Okt. 2018 um 09:18 Uhr schrieb Vladislav Zavialov 
> :
>>
>> [...] With parser combinators
>>
>> 1. Parse into an expression (linear in the amount of tokens)
>> 2. If it turns out we needed a pattern, backtrack and parse into a
>> pattern (linear in the amount of tokens) [...]
>
>
> In a larger grammar implemented by parser combinators it is quite hard to 
> guarantee that you don't backtrack while backtracking, which would easily 
> result in exponential runtime. And given the size of the language GHC 
> recognizes, I can almost guarantee that this will happen unless you use 
> formal methods. :-)
>
> Cheers,
>S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
> which in turn is a strong hint that the language you're trying to parse has 
> dark corners. IMHO every language designer and e.g. everybody proposing a 
> syntactic extension to GHC should try to fit this into a grammar for Happy 
> *before* proposing that extension

I do agree here! Having a language that has a context-free grammar
would be superb. The issue is that Haskell with GHC extensions is
already far from this point and it isn't helping to first pretend that
it is, and then do half of the parsing in post-processing because it
has no such constraints.
On Tue, Oct 9, 2018 at 10:23 AM Sven Panne  wrote:
>
> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov 
> :
>>
>> [...] That's true regardless of implementation technique, parsers are rather
>> delicate.
>
>
> I think it's not the parsers themselves which are delicate, it is the 
> language that they should recognize.
>
>>
>> A LALR-based parser generator does provide more information
>> when it detects shift/reduce and reduce/reduce conflicts, but I never
>> found this information useful. It was always quite the opposite of
>> being helpful - an indication that a LALR parser could not handle my
>> change and I had to look for workarounds. [...]
>
>
> Not that this would help at this point, but: The conflicts reported by parser 
> generators like Happy are *extremely* valuable, they hint at tricky/ambiguous 
> points in the grammar, which in turn is a strong hint that the language 
> you're trying to parse has dark corners. IMHO every language designer and 
> e.g. everybody proposing a syntactic extension to GHC should try to fit this 
> into a grammar for Happy *before* proposing that extension. If you get 
> conflicts, it is a very strong hint that the language is hard to parse by 
> *humans*, too, which is the most important thing to consider. Haskell already 
> has tons of syntactic warts which can only be parsed by infinite lookahead, 
> which is only a minor technical problem, but a major usablity problem. 
> "Programs are meant to be read by humans and only incidentally for computers 
> to execute." (D.E.K.)  ;-)
>
> The situation is a bit strange: We all love strong guarantees offered by type 
> checking, but somehow most people shy away from "syntactic type checking" 
> offered by parser generators. Parser combinators are the Python of parsing: 
> Easy to use initially, but a maintenance hell in the long run for larger 
> projects...
>
> Cheers,
>S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Sven Panne
Am Di., 9. Okt. 2018 um 09:18 Uhr schrieb Vladislav Zavialov <
vlad.z.4...@gmail.com>:

> [...] With parser combinators
>
> 1. Parse into an expression (linear in the amount of tokens)
> 2. If it turns out we needed a pattern, backtrack and parse into a
> pattern (linear in the amount of tokens) [...]
>

In a larger grammar implemented by parser combinators it is quite hard to
guarantee that you don't backtrack while backtracking, which would easily
result in exponential runtime. And given the size of the language GHC
recognizes, I can almost guarantee that this will happen unless you use
formal methods. :-)

Cheers,
   S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Sven Panne
Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <
vlad.z.4...@gmail.com>:

> [...] That's true regardless of implementation technique, parsers are
> rather
> delicate.


I think it's not the parsers themselves which are delicate, it is the
language that they should recognize.


> A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds. [...]
>

Not that this would help at this point, but: The conflicts reported by
parser generators like Happy are *extremely* valuable, they hint at
tricky/ambiguous points in the grammar, which in turn is a strong hint that
the language you're trying to parse has dark corners. IMHO every language
designer and e.g. everybody proposing a syntactic extension to GHC should
try to fit this into a grammar for Happy *before* proposing that extension.
If you get conflicts, it is a very strong hint that the language is hard to
parse by *humans*, too, which is the most important thing to consider.
Haskell already has tons of syntactic warts which can only be parsed by
infinite lookahead, which is only a minor technical problem, but a major
usablity problem. "Programs are meant to be read by humans and only
incidentally for computers to execute." (D.E.K.)  ;-)

The situation is a bit strange: We all love strong guarantees offered by
type checking, but somehow most people shy away from "syntactic type
checking" offered by parser generators. Parser combinators are the Python
of parsing: Easy to use initially, but a maintenance hell in the long run
for larger projects...

Cheers,
   S.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
> For example, if we see `do K x y z ...`, we don't know whether we're parsing 
> an expression or a pattern before we can see what's in the ..., which is 
> arbitrarily later than the ambiguity starts. Of course, while we can write a 
> backtracking parser with combinators, doing so doesn't seem like a 
> particularly swell idea.

Backtracking is exactly what I wanted to do here. Perhaps it is lack
of theoretical background on my behalf showing, but I do not see
downsides to it. It supposedly robs us of linear time guarantee, but
consider this.

With 'happy' and post-processing we

1. Parse into an expression (linear in the amount of tokens)
2. If it turns out we needed a pattern, rejig (linear in the size of expression)

With parser combinators

1. Parse into an expression (linear in the amount of tokens)
2. If it turns out we needed a pattern, backtrack and parse into a
pattern (linear in the amount of tokens)

Doesn't post-processing that we do today mean that we don't actually
take advantage of the linearity guarantee?

On Tue, Oct 9, 2018 at 3:31 AM Richard Eisenberg  wrote:
>
> I, too, have wondered about this.
>
> A pair of students this summer were working on merging the type-level and 
> term-level parsers, in preparation for, e.g., visible dependent 
> quantification in terms (not to mention dependent types). If successful, this 
> would have been an entirely internal refactor. In any case, it seemed 
> impossible to do in an LALR parser, so the students instead parsed into a new 
> datatype Term, which then got converted either to an HsExpr, an HsPat, or an 
> HsType. The students never finished. But the experience suggests that moving 
> away from LALR might be a good move.
>
> All that said, I'm not sure how going to parser combinators stops us from 
> needing an intermediate datatype to parse expressions/patterns into before we 
> can tell whether they are expressions or patterns. For example, if we see `do 
> K x y z ...`, we don't know whether we're parsing an expression or a pattern 
> before we can see what's in the ..., which is arbitrarily later than the 
> ambiguity starts. Of course, while we can write a backtracking parser with 
> combinators, doing so doesn't seem like a particularly swell idea. This isn't 
> an argument against using parser combinators, but fixing the 
> pattern/expression ambiguity was a "pro" listed for them -- except I don't 
> think this is correct.
>
> Come to think of it, the problem with parsing expressions vs. types would 
> persist just as much in the combinator style as it does in the LALR style, so 
> perhaps I've talked myself into a corner. Nevertheless, it seems awkward to 
> do half the parsing in one language (happy) and half in another.
>
> Richard
>
> > On Oct 8, 2018, at 6:38 PM, Vladislav Zavialov  
> > wrote:
> >
> > That is a very good point, thank you! I have not thought about
> > incremental parsing. That's something I need to research before I
> > start the rewrite.
> > On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman  
> > wrote:
> >>
> >> I am not against this proposal,  but want to raise a possible future 
> >> concern.
> >>
> >> As part of improving the haskell tooling environment I am keen on making 
> >> GHC incremental, and have started a proof of concept based in the same 
> >> techniques as used in the tree-sitter library.
> >>
> >> This is achieved by modifying happy, and requires minimal changes to the 
> >> existing Parser.y.
> >>
> >> It would be unfortunate if this possibility was prevented by this rewrite.
> >>
> >> Alan
> > ___
> > ghc-devs mailing list
> > ghc-devs@haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-09 Thread Vladislav Zavialov
> I'm also not sure what exactly parser combinators provide over Happy.

Parser combinators offer backtracking. With 'happy' we get the
guarantee that we parse in linear time, but we lose it because of
post-processing that is not guaranteed to be linear. I think it'd be
easier to backtrack in the parser itself rather than in a later pass.


On Tue, Oct 9, 2018 at 6:47 AM Vanessa McHale  wrote:
>
> I actually have some experience in this department, having authored both 
> madlang and language-ats. Parsers using combinators alone are more brittle 
> than parsers using Happy, at least for human-facing languages.
>
> I'm also not sure what exactly parser combinators provide over Happy. It has 
> macros that can emulate e.g. between, many. Drawing up a minimal example 
> might be a good idea.
>
>
> On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
>
> complex parsers written using parsing combinators is that they tend to be 
> quite difficult to modify and have any kind of assurance that now you haven't 
> broken something else
>
> That's true regardless of implementation technique, parsers are rather
> delicate. A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds.
>
> With a combinator based parser, you basically have to do program 
> verification, or more pragmatically, have a large test suite and hope that 
> you tested everything.
>
> Even when doing modifications to Parser.y, I relied mainly on the test
> suite to determine whether my change was right (and the test suite
> always caught many issues). A large test suite is the best approach
> both for 'happy'-based parsers and for combinator-based parsers.
>
> and then have a separate pass that validates and fixes up the results
>
> That's where my concern lies. This separate pass is confusing (at
> least for me - it's not the most straightforward thing to parse
> something incorrectly and then restructure it), it is hard to modify,
> it does not handle corner cases (e.g. #1087).
>
> Since we have all this Haskell code that does a significant portion of
> processing, why even bother with having a LALR pass before it?
>
> namely we can report better parse errors
>
> I don't think that's true, we can achieve better error messages with
> recursive descent.
>
> Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.
>
> Not completely true, GhcPs-parametrized structures are the final
> output of parsing, so at least the renamer will face these
> constructors.
>
> On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki  
> wrote:
>
> Hello,
>
> my experience with complex parsers written using parsing combinators is that 
> they tend to be quite difficult to modify and have any kind of assurance that 
> now you haven't broken something else.   While reduce-reduce errors are 
> indeed annoying, you at least know that there is some sort of issue you need 
> to address.   With a combinator based parser, you basically have to do 
> program verification, or more pragmatically, have a large test suite and hope 
> that you tested everything.
>
> I think the current approach is actually quite reasonable:  use the Happy 
> grammar to parse out the basic structure of the program, without trying to be 
> completely precise, and then have a separate pass that validates and fixes up 
> the results.   While this has the draw-back of some constructors being in the 
> "wrong place", there are also benefits---namely we can report better parse 
> errors.  Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.
>
> -Iavor
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
>  wrote:
>
> I'm no parser expert, but a parser that was easier to understand and modify, 
> and was as fast as the current one, sounds good to me.
>
> It's a tricky area though; e.g. the layout rule.
>
> Worth talking to Simon Marlow.
>
> Simon
>
>
>
> | -Original Message-
> | From: ghc-devs  On Behalf Of Vladislav
> | Zavialov
> | Sent: 08 October 2018 21:44
> | To: ghc-devs 
> | Subject: Parser.y rewrite with parser combinators
> |
> | Hello devs,
> |
> | Recently I've been working on a couple of parsing-related issues in
> | GHC

Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Vanessa McHale
I actually have some experience in this department, having authored both
madlang <http://hackage.haskell.org/package/madlang> and language-ats
<http://hackage.haskell.org/package/language-ats>. Parsers using
combinators alone are more brittle than parsers using Happy, at least
for human-facing languages.

I'm also not sure what exactly parser combinators provide over Happy. It
has macros that can emulate e.g. between, many. Drawing up a minimal
example might be a good idea.


On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
>> complex parsers written using parsing combinators is that they tend to be 
>> quite difficult to modify and have any kind of assurance that now you 
>> haven't broken something else
> That's true regardless of implementation technique, parsers are rather
> delicate. A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds.
>
>> With a combinator based parser, you basically have to do program 
>> verification, or more pragmatically, have a large test suite and hope that 
>> you tested everything.
> Even when doing modifications to Parser.y, I relied mainly on the test
> suite to determine whether my change was right (and the test suite
> always caught many issues). A large test suite is the best approach
> both for 'happy'-based parsers and for combinator-based parsers.
>
>> and then have a separate pass that validates and fixes up the results
> That's where my concern lies. This separate pass is confusing (at
> least for me - it's not the most straightforward thing to parse
> something incorrectly and then restructure it), it is hard to modify,
> it does not handle corner cases (e.g. #1087).
>
> Since we have all this Haskell code that does a significant portion of
> processing, why even bother with having a LALR pass before it?
>
>> namely we can report better parse errors
> I don't think that's true, we can achieve better error messages with
> recursive descent.
>
>> Also, with the new rewrite of HsSyn, we should be able to mark such 
>> constructors as only usable in the parsing pass, so later passes wouldn't 
>> need to worry about them.
> Not completely true, GhcPs-parametrized structures are the final
> output of parsing, so at least the renamer will face these
> constructors.
>
> On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki  
> wrote:
>> Hello,
>>
>> my experience with complex parsers written using parsing combinators is that 
>> they tend to be quite difficult to modify and have any kind of assurance 
>> that now you haven't broken something else.   While reduce-reduce errors are 
>> indeed annoying, you at least know that there is some sort of issue you need 
>> to address.   With a combinator based parser, you basically have to do 
>> program verification, or more pragmatically, have a large test suite and 
>> hope that you tested everything.
>>
>> I think the current approach is actually quite reasonable:  use the Happy 
>> grammar to parse out the basic structure of the program, without trying to 
>> be completely precise, and then have a separate pass that validates and 
>> fixes up the results.   While this has the draw-back of some constructors 
>> being in the "wrong place", there are also benefits---namely we can report 
>> better parse errors.  Also, with the new rewrite of HsSyn, we should be able 
>> to mark such constructors as only usable in the parsing pass, so later 
>> passes wouldn't need to worry about them.
>>
>> -Iavor
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
>>  wrote:
>>> I'm no parser expert, but a parser that was easier to understand and 
>>> modify, and was as fast as the current one, sounds good to me.
>>>
>>> It's a tricky area though; e.g. the layout rule.
>>>
>>> Worth talking to Simon Marlow.
>>>
>>> Simon
>>>
>>>
>>>
>>> | -Original Message-
>>> | From: ghc-devs  On Behalf Of Vladislav
>>> | Zavialov
>>> | Sent: 08 October 2018 21:44
>>> | To: ghc-devs 
>>> | Subject: Parser.y rewrite with parser combinators
>>> |
>>> | Hello devs,
>>> |
>>> | Recently I've been working on a couple of parsing-related issues in
>>> | GHC. I implemented support for the -XStarIsTy

Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Niklas Hambüchen
Another thing that may be of interest is that parser generators can guarantee 
you complexity bounds of parsing time (as usual, the goal is linear).

Some of the conflicts that annoy us about parser generators are often hints on 
this topic; if the parser generator succeeds, you are guaranteed to have a 
linear parser.

If backtracking is allowed in parser combinators, it is comparatively easy to 
get that wrong.

Niklas
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Richard Eisenberg
I, too, have wondered about this.

A pair of students this summer were working on merging the type-level and 
term-level parsers, in preparation for, e.g., visible dependent quantification 
in terms (not to mention dependent types). If successful, this would have been 
an entirely internal refactor. In any case, it seemed impossible to do in an 
LALR parser, so the students instead parsed into a new datatype Term, which 
then got converted either to an HsExpr, an HsPat, or an HsType. The students 
never finished. But the experience suggests that moving away from LALR might be 
a good move.

All that said, I'm not sure how going to parser combinators stops us from 
needing an intermediate datatype to parse expressions/patterns into before we 
can tell whether they are expressions or patterns. For example, if we see `do K 
x y z ...`, we don't know whether we're parsing an expression or a pattern 
before we can see what's in the ..., which is arbitrarily later than the 
ambiguity starts. Of course, while we can write a backtracking parser with 
combinators, doing so doesn't seem like a particularly swell idea. This isn't 
an argument against using parser combinators, but fixing the pattern/expression 
ambiguity was a "pro" listed for them -- except I don't think this is correct.

Come to think of it, the problem with parsing expressions vs. types would 
persist just as much in the combinator style as it does in the LALR style, so 
perhaps I've talked myself into a corner. Nevertheless, it seems awkward to do 
half the parsing in one language (happy) and half in another.

Richard

> On Oct 8, 2018, at 6:38 PM, Vladislav Zavialov  wrote:
> 
> That is a very good point, thank you! I have not thought about
> incremental parsing. That's something I need to research before I
> start the rewrite.
> On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman  
> wrote:
>> 
>> I am not against this proposal,  but want to raise a possible future concern.
>> 
>> As part of improving the haskell tooling environment I am keen on making GHC 
>> incremental, and have started a proof of concept based in the same 
>> techniques as used in the tree-sitter library.
>> 
>> This is achieved by modifying happy, and requires minimal changes to the 
>> existing Parser.y.
>> 
>> It would be unfortunate if this possibility was prevented by this rewrite.
>> 
>> Alan
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Vladislav Zavialov
That is a very good point, thank you! I have not thought about
incremental parsing. That's something I need to research before I
start the rewrite.
On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman  wrote:
>
> I am not against this proposal,  but want to raise a possible future concern.
>
> As part of improving the haskell tooling environment I am keen on making GHC 
> incremental, and have started a proof of concept based in the same techniques 
> as used in the tree-sitter library.
>
> This is achieved by modifying happy, and requires minimal changes to the 
> existing Parser.y.
>
> It would be unfortunate if this possibility was prevented by this rewrite.
>
> Alan
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Vladislav Zavialov
> complex parsers written using parsing combinators is that they tend to be 
> quite difficult to modify and have any kind of assurance that now you haven't 
> broken something else

That's true regardless of implementation technique, parsers are rather
delicate. A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds.

> With a combinator based parser, you basically have to do program 
> verification, or more pragmatically, have a large test suite and hope that 
> you tested everything.

Even when doing modifications to Parser.y, I relied mainly on the test
suite to determine whether my change was right (and the test suite
always caught many issues). A large test suite is the best approach
both for 'happy'-based parsers and for combinator-based parsers.

> and then have a separate pass that validates and fixes up the results

That's where my concern lies. This separate pass is confusing (at
least for me - it's not the most straightforward thing to parse
something incorrectly and then restructure it), it is hard to modify,
it does not handle corner cases (e.g. #1087).

Since we have all this Haskell code that does a significant portion of
processing, why even bother with having a LALR pass before it?

> namely we can report better parse errors

I don't think that's true, we can achieve better error messages with
recursive descent.

> Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.

Not completely true, GhcPs-parametrized structures are the final
output of parsing, so at least the renamer will face these
constructors.

On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki  wrote:
>
> Hello,
>
> my experience with complex parsers written using parsing combinators is that 
> they tend to be quite difficult to modify and have any kind of assurance that 
> now you haven't broken something else.   While reduce-reduce errors are 
> indeed annoying, you at least know that there is some sort of issue you need 
> to address.   With a combinator based parser, you basically have to do 
> program verification, or more pragmatically, have a large test suite and hope 
> that you tested everything.
>
> I think the current approach is actually quite reasonable:  use the Happy 
> grammar to parse out the basic structure of the program, without trying to be 
> completely precise, and then have a separate pass that validates and fixes up 
> the results.   While this has the draw-back of some constructors being in the 
> "wrong place", there are also benefits---namely we can report better parse 
> errors.  Also, with the new rewrite of HsSyn, we should be able to mark such 
> constructors as only usable in the parsing pass, so later passes wouldn't 
> need to worry about them.
>
> -Iavor
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
>  wrote:
>>
>> I'm no parser expert, but a parser that was easier to understand and modify, 
>> and was as fast as the current one, sounds good to me.
>>
>> It's a tricky area though; e.g. the layout rule.
>>
>> Worth talking to Simon Marlow.
>>
>> Simon
>>
>>
>>
>> | -Original Message-
>> | From: ghc-devs  On Behalf Of Vladislav
>> | Zavialov
>> | Sent: 08 October 2018 21:44
>> | To: ghc-devs 
>> | Subject: Parser.y rewrite with parser combinators
>> |
>> | Hello devs,
>> |
>> | Recently I've been working on a couple of parsing-related issues in
>> | GHC. I implemented support for the -XStarIsType extension, fixed
>> | parsing of the (!) type operator (Trac #15457), allowed using type
>> | operators in existential contexts (Trac #15675).
>> |
>> | Doing these tasks required way more engineering effort than I expected
>> | from my prior experience working with parsers due to complexities of
>> | GHC's grammar.
>> |
>> | In the last couple of days, I've been working on Trac #1087 - a
>> | 12-year old parsing bug. After trying out a couple of approaches, to
>> | my dismay I realised that fixing it properly (including support for
>> | bang patterns inside infix constructors, etc) would require a complete
>> | rewrite of expression and pattern parsing logic.
>> |
>> | Worse yet, most of the work would be done outside Parser.y in Haskell
>> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
>> | 

Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Alan & Kim Zimmerman
I am not against this proposal,  but want to raise a possible future
concern.

As part of improving the haskell tooling environment I am keen on making
GHC incremental, and have started a proof of concept based in the same
techniques as used in the tree-sitter library.

This is achieved by modifying happy, and requires minimal changes to the
existing Parser.y.

It would be unfortunate if this possibility was prevented by this rewrite.

Alan
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Parser.y rewrite with parser combinators

2018-10-08 Thread Simon Peyton Jones via ghc-devs
use the Happy grammar to parse out the basic structure of the program, without 
trying to be completely precise, and then have a separate pass that validates 
and fixes up the results.

Incidentally, we use this for operator fixity and precedence, where the fixup 
is done in the renamer, and for that purpose it works really well.



From: Iavor Diatchki 
Sent: 08 October 2018 23:00
To: Simon Peyton Jones 
Cc: vlad.z.4...@gmail.com; ghc-devs 
Subject: Re: Parser.y rewrite with parser combinators

Hello,

my experience with complex parsers written using parsing combinators is that 
they tend to be quite difficult to modify and have any kind of assurance that 
now you haven't broken something else.   While reduce-reduce errors are indeed 
annoying, you at least know that there is some sort of issue you need to 
address.   With a combinator based parser, you basically have to do program 
verification, or more pragmatically, have a large test suite and hope that you 
tested everything.

I think the current approach is actually quite reasonable:  use the Happy 
grammar to parse out the basic structure of the program, without trying to be 
completely precise, and then have a separate pass that validates and fixes up 
the results.   While this has the draw-back of some constructors being in the 
"wrong place", there are also benefits---namely we can report better parse 
errors.  Also, with the new rewrite of HsSyn, we should be able to mark such 
constructors as only usable in the parsing pass, so later passes wouldn't need 
to worry about them.

-Iavor











On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs 
mailto:ghc-devs@haskell.org>> wrote:
I'm no parser expert, but a parser that was easier to understand and modify, 
and was as fast as the current one, sounds good to me.

It's a tricky area though; e.g. the layout rule.

Worth talking to Simon Marlow.

Simon



| -Original Message-
| From: ghc-devs 
mailto:ghc-devs-boun...@haskell.org>> On Behalf 
Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs mailto:ghc-devs@haskell.org>>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can 

Re: Parser.y rewrite with parser combinators

2018-10-08 Thread Iavor Diatchki
Hello,

my experience with complex parsers written using parsing combinators is
that they tend to be quite difficult to modify and have any kind of
assurance that now you haven't broken something else.   While reduce-reduce
errors are indeed annoying, you at least know that there is some sort of
issue you need to address.   With a combinator based parser, you basically
have to do program verification, or more pragmatically, have a large test
suite and hope that you tested everything.

I think the current approach is actually quite reasonable:  use the Happy
grammar to parse out the basic structure of the program, without trying to
be completely precise, and then have a separate pass that validates and
fixes up the results.   While this has the draw-back of some constructors
being in the "wrong place", there are also benefits---namely we can report
better parse errors.  Also, with the new rewrite of HsSyn, we should be
able to mark such constructors as only usable in the parsing pass, so later
passes wouldn't need to worry about them.

-Iavor











On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <
ghc-devs@haskell.org> wrote:

> I'm no parser expert, but a parser that was easier to understand and
> modify, and was as fast as the current one, sounds good to me.
>
> It's a tricky area though; e.g. the layout rule.
>
> Worth talking to Simon Marlow.
>
> Simon
>
>
>
> | -Original Message-
> | From: ghc-devs  On Behalf Of Vladislav
> | Zavialov
> | Sent: 08 October 2018 21:44
> | To: ghc-devs 
> | Subject: Parser.y rewrite with parser combinators
> |
> | Hello devs,
> |
> | Recently I've been working on a couple of parsing-related issues in
> | GHC. I implemented support for the -XStarIsType extension, fixed
> | parsing of the (!) type operator (Trac #15457), allowed using type
> | operators in existential contexts (Trac #15675).
> |
> | Doing these tasks required way more engineering effort than I expected
> | from my prior experience working with parsers due to complexities of
> | GHC's grammar.
> |
> | In the last couple of days, I've been working on Trac #1087 - a
> | 12-year old parsing bug. After trying out a couple of approaches, to
> | my dismay I realised that fixing it properly (including support for
> | bang patterns inside infix constructors, etc) would require a complete
> | rewrite of expression and pattern parsing logic.
> |
> | Worse yet, most of the work would be done outside Parser.y in Haskell
> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
> | Parser.y, in every design direction I face reduce/reduce conflicts.
> |
> | The reduce/reduce conflicts are the worst.
> |
> | Perhaps it is finally time to admit that Haskell syntax with all of
> | the GHC cannot fit into a LALR grammar?
> |
> | The extent of hacks that we have right now just to make parsing
> | possible is astonishing. For instance, we have dedicated constructors
> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
> | type checker operates on) has four additional constructors that exist
> | due to a reduce/reduce conflict between patterns and expressions.
> |
> | I propose a complete rewrite of GHC's parser to use recursive descent
> | parsing with monadic parser combinators.
> |
> | 1. We could significantly simplify parsing logic by doing things in a
> | more direct manner. For instance, instead of parsing patterns as
> | expressions and then post-processing them, we could have separate
> | parsing logic for patterns and expressions.
> |
> | 2. We could fix long-standing parsing bugs like Trac #1087 because
> | recursive descent offers more expressive power than LALR (at the cost
> | of support for left recursion, which is not much of a loss in
> | practice).
> |
> | 3. New extensions to the grammar would require less engineering effort.
> |
> | Of course, this rewrite is a huge chunk of work, so before I start, I
> | would like to know that this work would be accepted if done well.
> | Here's what I want to achieve:
> |
> | * Comparable performance. The new parser could turn out to be faster
> | because it would do less post-processing, but it could be slower
> | because 'happy' does all the sorts of low-level optimisations. I will
> | consider this project a success only if comparable performance is
> | achieved.
> |
> | * Correctness. The new parser should handle 100% of the syntactic
> | constructs that the current parser can handle.
> |
> | * Error messages. The new error messages should be of equal or better
> | quality than existing ones.
> |
> | * Elegance. The new parser should bring simplification to other parts
> | of

Parser.y rewrite with parser combinators

2018-10-08 Thread Vladislav Zavialov
Hello devs,

Recently I've been working on a couple of parsing-related issues in
GHC. I implemented support for the -XStarIsType extension, fixed
parsing of the (!) type operator (Trac #15457), allowed using type
operators in existential contexts (Trac #15675).

Doing these tasks required way more engineering effort than I expected
from my prior experience working with parsers due to complexities of
GHC's grammar.

In the last couple of days, I've been working on Trac #1087 - a
12-year old parsing bug. After trying out a couple of approaches, to
my dismay I realised that fixing it properly (including support for
bang patterns inside infix constructors, etc) would require a complete
rewrite of expression and pattern parsing logic.

Worse yet, most of the work would be done outside Parser.y in Haskell
code instead, in RdrHsSyn helpers. When I try to keep the logic inside
Parser.y, in every design direction I face reduce/reduce conflicts.

The reduce/reduce conflicts are the worst.

Perhaps it is finally time to admit that Haskell syntax with all of
the GHC cannot fit into a LALR grammar?

The extent of hacks that we have right now just to make parsing
possible is astonishing. For instance, we have dedicated constructors
in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
EViewPat, ELazyPat). That is, one of the fundamental types (that the
type checker operates on) has four additional constructors that exist
due to a reduce/reduce conflict between patterns and expressions.

I propose a complete rewrite of GHC's parser to use recursive descent
parsing with monadic parser combinators.

1. We could significantly simplify parsing logic by doing things in a
more direct manner. For instance, instead of parsing patterns as
expressions and then post-processing them, we could have separate
parsing logic for patterns and expressions.

2. We could fix long-standing parsing bugs like Trac #1087 because
recursive descent offers more expressive power than LALR (at the cost
of support for left recursion, which is not much of a loss in
practice).

3. New extensions to the grammar would require less engineering effort.

Of course, this rewrite is a huge chunk of work, so before I start, I
would like to know that this work would be accepted if done well.
Here's what I want to achieve:

* Comparable performance. The new parser could turn out to be faster
because it would do less post-processing, but it could be slower
because 'happy' does all the sorts of low-level optimisations. I will
consider this project a success only if comparable performance is
achieved.

* Correctness. The new parser should handle 100% of the syntactic
constructs that the current parser can handle.

* Error messages. The new error messages should be of equal or better
quality than existing ones.

* Elegance. The new parser should bring simplification to other parts
of the compiler (e.g. removal of pattern constructors from HsExpr).
And one of the design principles is to represent things by dedicated
data structures, in contrast to the current state of affairs where we
represent patterns as expressions, data constructor declarations as
types (before D5180), etc.

Let me know if this is a good/acceptable direction of travel. That's
definitely something that I personally would like to see happen.

All the best,
- Vladislav
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs