Re: Unwanted failure and FAILGOAL

2016-05-11 Thread Moritz Lenz

Hi,

On 05/11/2016 07:45 AM, Richard Hainsworth wrote:

I have the following in a grammar

 rule TOP{ ^ + $ };

 rule statement  {  '=' 
  | { { self.panic($/, "Declaration syntax
incorrect") } }
 };

 rule endvalue   {  '(' ~ ')' 
  | { self.panic($/, "Invalid declaration.") }
 }

The grammar parses a correct input file up until the end of the file. At
that point even if there is no un-consumed input, there is an attempt to
match , which fails. The failure causes the panic with 'Declaration
syntax'.

Am I missing something simple here?

I would have thought  (though this is only a very newbie assumption)
that if the end of the input being sent to the grammar has been reached
after the last  has been matched, then there should be no
reason for the parse method to try to match  again, and if it
fails to test for the end of input.


This is not how regexes or grammars work.

The + quantifier tries as many times as possible to match the regex. It 
doesn't look ahead to see if more characters are available, and it 
doesn't know about the end-of-string anchor that comes next in the grammar.


In fact, it doesn't know if the rule it quantifies might have a way to 
match zero characters. In this case, it would be wrong behavior to not 
do a zero-width at the end of the string.


As for improving the error reporting from within a grammar, there are 
lots of way to get creative, and I'd urge you to read Perl 6's own 
grammar, which is a good inspiration for that.

See https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp

One thing you could do is structure the statement rule differently:

rule statement {

   [  '=' 
   || { self.panic($/, "Invalid declaration.")
   ]
}

And maybe also TOP:

rule TOP{ ^ [  || . { self.panic($/, "Expected a 
statement") } ] $ };


That extra dot before the panic ensures it's not called at the end of 
the string. If you don't want that, you could also do


[  || $ || { self.panic(...) } ]

Cheers,
Moritz


Re: Unwanted failure and FAILGOAL

2016-05-11 Thread Damian Conway
Hi Richard,

Not a complete answer to your question;
just an observation about your grammar:

>  rule TOP{ ^ + $ };
>
>  rule statement  {  '=' 
>  | { { self.panic($/, "Declaration syntax incorrect") } }
>  };
>
>  rule endvalue   {  '(' ~ ')' 
>  | { self.panic($/, "Invalid declaration.") }
>  }

That's more or less the equivalent of:

sub TOP   {
die if !at_start_of_input();
loop { last unless try statement() };
die if !at_end_of_input();
  }

sub statement {
try { id(); match_literal('='); endvalue() }
  or
die "Declaration syntax incorrect";
  }

sub endvalue  {
try { keyword(); match_literal('('); pairlist();
match_literal(')') }
  or
die "Invalid declaration."
  }

In which case, would you really have expected a call to TOP()
NOT to throw an exception from statement(), the first time statement()
couldn't match (as it inevitably won't if we're at the end of the input)???

If these were subroutines, I suspect you'd have written something
more like:

sub statement {
try { id(); match_literal('='); endvalue() }
  or

*!at_end_of_input()&& die
"Declaration syntax incorrect"*
  }

sub endvalue  {
try { keyword(); match_literal('('); pairlist();
match_literal(')' }
  or

*!at_end_of_input()&& die
"Invalid declaration.";*
  }

which, in a regex, would be something like:

   rule TOP{ ^ + $ };

   rule statement  {
  '=' 
   |
* \S  **# ...means we found something else, so...*
* { self.panic($/, "Declaration syntax incorrect") }*
   };

   rule endvalue   {
  '(' ~ ')' 
   |
* \S  **# ...means we found something else, so...*
* { self.panic($/, "Invalid declaration.") }*
   }

Though, personally, I'd have been inclined to write it like this:

rule TOP{ ^  +  *[ $ |  ]*  }

rule statement  { 'ID' '='  }

rule endvalue   { 'keyword' '(' ~ ')' 'pairlist' }



*rule unexpected { $ = (\N+)  {
self.panic($/,"Expected statement but found '$'")
}}*

In other words: after the statements, we're either at the end of the input,
or else we found something unexpected, so capture it and then report it.

HTH,

Damian