Re: [PEG] Left Recursion in PEG

2016-11-25 Thread Juancarlo Añez
On Sat, Oct 29, 2016 at 5:10 PM, Nicolas Laurent <
nicolas.laur...@uclouvain.be> wrote:

> Interesting, why do you think the furthest-position heuristic may report
> incorrect information?
>
> My perspective is that it returns one possible error among the set of all
> possible errors given an erroneous input. In all generality, finding out
> the actual error requires knowing the programmer's intent.
>
> Maybe you think about errors that say, for instance, "expecting operator
> X", when in fact there are a number of other operators than X that may have
> worked in that position?
>

Interesting question...

My best answer is that I've found that it is best if the parser reports an
error at the position in the input in which, after having forced to commit
by *cut* operators, there are no options in a choice that match the
remaining input. Without *cut* that choice will often be the start rule, or
other *high level* rules, or rules applied early in the parse.

Generally, for a PEG grammar to parse the intended language, the options
with the expected largest parse must come first, but there are other
options because the first one is not always the right one.


> I see cut operators it as a way to generator better error messages rather
> than unearthing different errors.
>

Yes.


> Inserting a cut operator can change the semantics of a grammar.
>

It can, if the *cut* commits the parser to an option that will eventually
fail for the input, when others could have suceeded.


> But with automatic insertion, the point is precisely to retain the
> existing semantics.
>

Exactly.


> In this case, the furthest error will always occur on the branch where the
> cut operator occurs.
>
> What do you think?
>

I think I need an example of furthest error being wrong. I can't think of
one off the top of my head, but the issues arise in ambiguous languages,
like AG Natural,

The problem is that we must assume that the grammar is correct, and that it
describes the intended language, so when an error is reported it is against
the input string. Talking about furthest-error or furthest-with-cut error
makes sense because just rejecting the input string is not enough in
practical applications. The expected error message is of the kind of: "the
string appeared to be part of the language up to _here_, and...".


-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-29 Thread Nicolas Laurent
>
> A quick note on a quick scan of the first pages: There's no mention of the
> role of *cut* operators in controlling memoization, when Mizushima and
> others have been work on automated insertion of *cut. *
>
> The *cut* operator is essential in my professional work with PEG. Because
> of the backtracking, parse errors may end up being reported against the
> start rule if there are no *cuts* to commit the parser to the current
> parse tree. Heuristics based on the furthest position reached on the input
> may report incorrect information about the input, or the grammar.


Interesting, why do you think the furthest-position heuristic may report
incorrect information?

My perspective is that it returns one possible error among the set of all
possible errors given an erroneous input. In all generality, finding out
the actual error requires knowing the programmer's intent.

Maybe you think about errors that say, for instance, "expecting operator
X", when in fact there are a number of other operators than X that may have
worked in that position?

I see cut operators it as a way to generator better error messages rather
than unearthing different errors.

Inserting a cut operator can change the semantics of a grammar. But with
automatic insertion, the point is precisely to retain the existing
semantics. In this case, the furthest error will always occur on the branch
where the cut operator occurs.

What do you think?

Cheers,

Nicolas LAURENT

On 28 October 2016 at 01:36, Juancarlo Añez  wrote:

>
> On Thu, Oct 27, 2016 at 10:19 AM, Nicolas Laurent <
> nicolas.laur...@uclouvain.be> wrote:
>
>> I have a paper (http://norswap.com/pubs/sle2015.pdf) that discusses
>> left-recursion in PEG, as well as a matching implementation (
>> https://github.com/norswap/autumn) (which is now however completely
>> different from what the paper describes).
>>
>
> I will study your work.
>
> A quick note on a quick scan of the first pages: There's no mention of the
> role of *cut* operators in controlling memoization, when Mizushima and
> others have been work on automated insertion of *cut. *
>
> The *cut* operator is essential in my professional work with PEG. Because
> of the backtracking, parse errors may end up being reported against the
> start rule if there are no *cuts* to commit the parser to the current
> parse tree. Heuristics based on the furthest position reached on the input
> may report incorrect information about the input, or the grammar.
>
> Cheers,
> --
> Juancarlo *Añez*
>
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-28 Thread Juancarlo Añez
On Fri, Oct 28, 2016 at 9:21 AM, Sérgio Medeiros 
wrote:

> This would be approach B, and my point is that for some LR(k) CFGs
> there are no equivalent LL(k) CFGs.
>

Grammar transformations beyond the simple refactoring of direct left
recursion are out of the question in my case.

More complex transformations would lose the relation between the original
grammar and the parsing process from the user's perspective. Parse traces
would become meaningless to the user, and the traces are indispensable when
debugging a grammar with, f.i., >400 rules.

A grammar G is a program that (hopefully) recognizes language L. In
practical applications, G must be debuggable.

Cheers,
-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-28 Thread Sérgio Medeiros
On Fri, Oct 28, 2016 at 9:44 AM, Juancarlo Añez  wrote:
>
> On Fri, Oct 28, 2016 at 8:05 AM, Sérgio Medeiros 
> wrote:
>>
>> This works for some grammars, but not for all, because
>> LL(k) is a proper subset of LR(k):
>> https://en.wikipedia.org/wiki/LR_parser
>>
>> Anyway, this approach may fit your users' needs.
>
>
> The target grammar language would be PEG.
>
> Isn't PEG > CFG an open question?

Yes, it is an open question.

Sorry, I was not clear, let me try again.
In case your approach is:
  1. Given an LR(k) CFG G, first eliminate direct left recursion and
  obtain an LR(k) CFG G'
  2. Then adapt the approach described in https://arxiv.org/abs/1304.3177
  to convert G' to an equivalent PEG P

This would be approach A, and your main work would be to adapt
the conversion from strong-LL(k) grammars into equivalent PEGs.

In case your approach is:
  1. Given an LR(k) CFG G, first eliminate direct left recursion and
  obtain an LR(k) CFG G'
  2. Convert G' into an equivalent LL(k) CFG G''
  3. Then use the approach described in https://arxiv.org/abs/1304.3177
  to convert G'' to an equivalent PEG P

This would be approach B, and my point is that for some LR(k) CFGs
there are no equivalent LL(k) CFGs.

However, maybe your users write LR CFGs that you can almost
always convert to equivalent LL CFGs.


Cheers,
-- 
Sérgio

___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-28 Thread Juancarlo Añez
On Fri, Oct 28, 2016 at 8:05 AM, Sérgio Medeiros 
wrote:

> This works for some grammars, but not for all, because
> LL(k) is a proper subset of LR(k):
> https://en.wikipedia.org/wiki/LR_parser
>
> Anyway, this approach may fit your users' needs.
>

The target grammar language would be PEG.

Isn't PEG > CFG an open question?

As to failure, an LR or LL parser generator would report when a grammar
doesn't meet the criteria for generating a parser with the expected
semantics. It is reasonable to expect the same from PEG: that the semantics
of the produced parser are known at grammar analysis time.

That the parser will not run endlessly  is not good enough when the input
is finite and the worst-case algorithms known are of exponential order.
Nothing less than rejecting the grammar or producing a parser that parses
in O(g[]) time is good enough.

Cheers,

-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-27 Thread Juancarlo Añez
On Thu, Oct 27, 2016 at 10:19 AM, Nicolas Laurent <
nicolas.laur...@uclouvain.be> wrote:

> I have a paper (http://norswap.com/pubs/sle2015.pdf) that discusses
> left-recursion in PEG, as well as a matching implementation (
> https://github.com/norswap/autumn) (which is now however completely
> different from what the paper describes).
>

I will study your work.

A quick note on a quick scan of the first pages: There's no mention of the
role of *cut* operators in controlling memoization, when Mizushima and
others have been work on automated insertion of *cut. *

The *cut* operator is essential in my professional work with PEG. Because
of the backtracking, parse errors may end up being reported against the
start rule if there are no *cuts* to commit the parser to the current parse
tree. Heuristics based on the furthest position reached on the input may
report incorrect information about the input, or the grammar.

Cheers,
-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-27 Thread Juancarlo Añez
On Thu, Oct 27, 2016 at 11:48 AM, Sérgio Medeiros 
wrote:

>
> This approach may be easier than try to get a PEG without left-recursive
> rules directly from a left-recursive CFG.
>
> Either way, I think you have a tough task :-)
>

Actually, I see a light at the end of the tunnel...

This is the algorithm for elimination in direct left recursion in a CFG:

A = A alfa | beta
_

A = beta A'

A' = alfa A' | eps


When the target grammar is PEG, the above translates to:

A = beta (alfa)*


Somehow adding the above to your algorithm would require some sort of rule
rewriting, but it would remove the left recursion being bound by the number
of choices, and would probably result in implementations much simpler than
those of Warth's.


> What exactly do you mean by "fail"?
> As I said before, the semantics should give a result.
>

I explained that.

Users expect the left-recursive grammar to parse input as if the underlying
algorithm was stack-based CFG, so a user reports the parser as failing if
the input string is in the CF language and the generated parser doesn't
parse it.


> In case the semantics gives a result, but the user was expecting
> a different parsing tree, then I think it would be preferable to say
> that the semantics does not give the expected result.
>
> Nonetheless, the semantics allows the user to change the precedence
> of the operators, and then get the desired associativity.
>

I understand (there's no misunderstanding). Do you understand my case with
my tool's users?


> > Personally, because I publish a tool that can be used for
> > professional/commercial work, I'm inclined to remove support for left
> > recursive grammars until I can provide rules to construct "valid"
> > left-recursive grammars.
>
> Sure, it is a valid option.
> You may also disable the support for left-recursion and set
> this as the default mode.


Yes. It is currently disabled by default, it is labeled as "experimental",
and it uses Warth's, because that handles more cases of left-recursive
grammars over more inputs.

The rewrite I mention above is simple to implement. It would only cover
direct left recursion, but the semantics would be defined and (mostly) as
expected (because of associativity) for all grammars accepted by the tool.

Cheers,

-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-27 Thread Sérgio Medeiros
On Wed, Oct 26, 2016 at 10:09 PM, Juancarlo Añez  wrote:
>> I have worked on this subject (https://arxiv.org/abs/1304.3177), but we
>> did not establish a relation between an LR grammar interpreted as a CFG
>> and what would be an equivalent PEG.
>
>
> Thanks for the reference. I'll study it.
>
> My work is in parsing applied to large collections of text in legacy
> programming languages. Finding an algorithm that finds the equivalent PEG
> grammar for an LR one, or the likes, is more interesting than PEG+Left.

Hi, Juancarlo.

As LR Grammars are usually written using left-recursion, in case
you did not want to use PEG + left I think you will need first to
eliminate left recusion from a LR CFG grammar G, this will give you G',
and then try to obtain a PEG equivalent to G'.

This approach may be easier than try to get a PEG without left-recursive
rules directly from a left-recursive CFG.

Either way, I think you have a tough task :-)


>> In case you have examples of left-recursive grammars which do not give
>> the expected result when interpreted as a PEG, you can post them here.
>
>
> The implementation of Warth's algorithm in Grako drops information about
> associativity when it enters the iterative step. I hadn't noticed until I
> implemented your algorithm, which does recover associativity, because I
> don't use left recursion in my work.
>
> Left or right associativity doesn't matter because it is easy to convert one
> to the other when the parsing semantics are none. If there's no
> associativity (the AST is flat), then another parsing step would be required
> to impose the associativity.
>
> If you care about the examples, I can recover some from the unit tests for
> left-associativity in Grako. Your algorithm passes the cases included in bug
> reports against Grako, but fails to parse cases that the current Warth
> implementation passes.

What exactly do you mean by "fail"?
As I said before, the semantics should give a result.
I would say that it "fails" when it does not give a result
(e.g., it keeps calling the same rule multiple times
without stopping the recursion).

In case the semantics gives a result, but the user was expecting
a different parsing tree, then I think it would be preferable to say
that the semantics does not give the expected result.

Nonetheless, the semantics allows the user to change the precedence
of the operators, and then get the desired associativity.


> Personally, because I publish a tool that can be used for
> professional/commercial work, I'm inclined to remove support for left
> recursive grammars until I can provide rules to construct "valid"
> left-recursive grammars.

Sure, it is a valid option.
You may also disable the support for left-recursion and set
this as the default mode.

-- 
Sérgio

___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-27 Thread Nicolas Laurent
Hello, Juancarlo.

I have a paper (http://norswap.com/pubs/sle2015.pdf) that discusses
left-recursion in PEG, as well as a matching implementation (
https://github.com/norswap/autumn) (which is now however completely
different from what the paper describes).

So maybe I can shed light on some aspects of left-recursion within PEG for
you.

The solution follows, in spirit, Warth's idea of "growing a seed".

The algorithm can actually be described quite succinctly. It requires all
left-recursive rules to be marked as such. I suspect it may sound familiar,
but I'll include the description anyway to make sure we are on the same
page.

The idea is to iteratively invoke the parser for the left-recursive rule
(let's call it R) until as much input as possible has been matched. To do
so, you start by forbidding left-recursion. This is how you obtain your
"seed" (let's call it R0). You then try R again but this time, upon
left-recursion, the parser will pretend as though the seed (R0) has just
been matched. You call the new result R1. Then you iterate R again, but
this time you use R1 upon left-recursion. Etc, until the result (the amount
of matched input) stops growing.

This works well, but there is a well-noted pitfall: rules that are both
left- and right-recursive will be parsed right-recursively.

This is not by itself a problem. A rule like "E = E + E" is naturally
ambiguous. I strongly object to people who say that one associativity is
more correct than the other.

However, if your grammar is annotated with the required associativity, then
you can use a modified version of the algorithm to support
right-associativity.

The trick is to forbid all recursion (instead of only left-recursion) in
the algorithm described above. This forces the parser to fall back on the
non-recursive case for the right side, and hence to parse left-recursively.

There is however a problem (note that this is the *only* pitfall in the
final scheme): it precludes well-defined rules with "inner" recursion to
work, e.g. "E = E (E) E". The middle "E" is not ambiguous and should be
allowed to recurse. The usual fix is to add an "escape hatch operator" that
allows recursion to resume within a recursive rule. This is only required
for rules where (1) right-associativity is required, (2) there is "inner"
recursion.

This whole scheme is due to Seaton, who used it in his Katahdin language (
http://chrisseaton.com/katahdin/), although my own paper is probably a
better starting point on this particular issue.

After contorting this algorithm in many directions, a have yet to find a
case where it does not work reliably. It supports indirect  and hidden
left-recursion just fine, as well as nested left-recursive rules. In my
recent work (http://norswap.com/pubs/sle2016.pdf), I even use it with
context-sensitive parsing.

I hope this helps.

Cheers,

Nicolas LAURENT

On 27 October 2016 at 03:09, Juancarlo Añez  wrote:

>
> On Mon, Oct 10, 2016 at 8:54 AM, Sérgio Medeiros 
> wrote:
>
>> We need to take some care here. It is difficult to establish a general
>> relation between CFGs and PEGs in order to guarantee that a grammar
>> G will give us the same result when interpreted as a CFG and when
>> interpreted as a PEG.
>>
>
> Indeed.
>
> I've mean to refer to the "set of languages expressible with grammars of
> type X". It's likely I wasn't precise in what I said.
>
>
>> I have worked on this subject (https://arxiv.org/abs/1304.3177), but we
>> did not establish a relation between an LR grammar interpreted as a CFG
>> and what would be an equivalent PEG.
>>
>
> Thanks for the reference. I'll study it.
>
> My work is in parsing applied to large collections of text in legacy
> programming languages. Finding an algorithm that finds the equivalent PEG
> grammar for an LR one, or the likes, is more interesting than PEG+Left.
>
> I've kept left recursion in Grako because I reckon the tool is used in
> teaching, where allowing some simple forms of left recursion may be useful,
> but just mentioning the experimental support for left recursive grammars
> sets the wrong expectations. For example, when limits are found in the
> current implementation of Warth's algorithm are found, they are reported as
> bugs against the parser generator.
>
>
>> Well, I think you can say the same about parsers based on CFGs, you can
>> build an AST with left-associativity operators without a left-recursive
>> CFG.
>>
>> Usually, it is easier to build a left-associative with a left-recursive
>> grammar
>> than with a right-recursive one, but you can choose which strategy to use.
>>
>
> "The 'correct' associativity" is just one of the expectations of users new
> to parsing.
>
>
>> In case you have examples of left-recursive grammars which do not give
>> the expected result when interpreted as a PEG, you can post them here.
>>
>
> The implementation of Warth's algorithm in Grako drops information about
> associativity when it enters the 

Re: [PEG] Left Recursion in PEG

2016-10-26 Thread Juancarlo Añez
On Thu, Oct 13, 2016 at 10:13 PM, Peter Cashin 
wrote:

> Yes, I seem to remember that Bryan Ford pointed out the “middle finder”
> grammar.
>
> It is CF but not PEG:
>
> s = x s x / x
>
> i.e an odd number of x’s. There is a similar one for an even number.
>

I'm a bit lost Are we distinguishing between the grammars and the languages
expressible by the grammar systems?

A PEG grammar that recognizes the language "an odd number of x" is:

  s = (xx)* x

Another version:

 s = xx s / x

Is there an algorithm that can deduce the PEG from the CF?

Does every CF have a PEG equivalent?

-- 
Juancarlo *Añez*
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-13 Thread Peter Cashin
Hi

Yes, I seem to remember that Bryan Ford pointed out the “middle finder” grammar.

It is CF but not PEG:

s = x s x / x

i.e an odd number of x’s. There is a similar one for an even number.

I was interested to discover that this can be done with an extended PEG grammar.

Cheers,
Peter.


> On Oct 13, 2016, at 7:43 PM, Aaron Moss  wrote:
> 
> We know this:
> ·  LL < PEG < LR < CF (yes?_
> This isn’t strictly true; the lookahead capabilities of PEGs allow some 
> languages that are not context free to be matched (for instance, a^n b^n c^n, 
> using the grammar below). Conversely, there is a conjecture (to my knowledge 
> as yet unproven) that there exist some context free languages for which no 
> PEG can be devised (given that packrat parsing can match any PEG in linear 
> time, but there is no known linear-time algorithm for general-purpose CFG 
> parsing).
>  
> a^n b^n c^n grammar:
> S := &(A ‘c’) ‘a’+ B !.
> A := ‘a’ A? ‘b’
> B := ‘b’ B? ‘c’
>  
> Regards,
> Aaron Moss
>  
> ___
> PEG mailing list
> PEG@lists.csail.mit.edu 
> https://lists.csail.mit.edu/mailman/listinfo/peg 
> 
___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-13 Thread Aaron Moss
We know this:

*  LL < PEG < LR < CF (yes?_

This isn’t strictly true; the lookahead capabilities of PEGs allow some 
languages that are not context free to be matched (for instance, a^n b^n c^n, 
using the grammar below). Conversely, there is a conjecture (to my knowledge as 
yet unproven) that there exist some context free languages for which no PEG can 
be devised (given that packrat parsing can match any PEG in linear time, but 
there is no known linear-time algorithm for general-purpose CFG parsing).

 

a^n b^n c^n grammar:

S := &(A ‘c’) ‘a’+ B !.

A := ‘a’ A? ‘b’

B := ‘b’ B? ‘c’

 

Regards,

Aaron Moss

 

___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


Re: [PEG] Left Recursion in PEG

2016-10-10 Thread Sérgio Medeiros
On Sat, Oct 8, 2016 at 11:41 PM, Juancarlo Añez  wrote:
> I implemented a version  Medeiro et al's[3] algorithm in Grako. I don't know
> what I did that for, because it should be easy to prove that an algorithm
> that sets a bound to recursion will fail for certain grammars and inputs.

Hi, Juancarlo.

Lemma 2 (or Lemma 3.2 in https://arxiv.org/abs/1207.0443) states
that the semantics always gives a result (a matching succeeds or fails).

For some grammars the semantics may give a result which does not
seem correct for a given application, or it is not similar to the result
we would get by using a CFG instead of a PEG.


> What disappoints me about the papers is that they don't characterize the
> grammars for which the algorithms do work.
>
> Not all left-recursive grammars are LR, and only LR grammars will parse with
> LR algorithms.

We need to take some care here. It is difficult to establish a general
relation between CFGs and PEGs in order to guarantee that a grammar
G will give us the same result when interpreted as a CFG and when
interpreted as a PEG.

I have worked on this subject (https://arxiv.org/abs/1304.3177), but we
did not establish a relation between an LR grammar interpreted as a CFG
and what would be an equivalent PEG.

The paper about left-recursion has examples where the semantics gives an
useful result for common left-recursive idioms from Context-Free Grammars,
but there is no general guarantee.


> Which set of grammars are PEG+Left_Recursion_Alg_X?
>
> The need for left-associativity in expressions can be handled in several
> different ways in a PEG parser, without trying to make a PEG parser parse
> with left-recursive grammars.

Well, I think you can say the same about parsers based on CFGs, you can
build an AST with left-associativity operators without a left-recursive CFG.

Usually, it is easier to build a left-associative with a left-recursive grammar
than with a right-recursive one, but you can choose which strategy to use.

In case you have examples of left-recursive grammars which do not give
the expected result when interpreted as a PEG, you can post them here.



-- 
Sérgio

___
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg