Hi,

On 05/11/2016 07:45 AM, Richard Hainsworth wrote:
I have the following in a grammar

     rule TOP        { ^ <statement>+ $ };

     rule statement  { <id> '=' <endvalue>
                      | { { self.panic($/, "Declaration syntax
incorrect") } }
                     };

     rule endvalue   { <keyword> '(' ~ ')' <pairlist>
                      | { self.panic($/, "Invalid declaration.") }
                     }

The grammar parses a correct input file up until the end of the file. At
that point even if there is no un-consumed input, there is an attempt to
match <id>, which fails. The failure causes the panic with 'Declaration
syntax'.

Am I missing something simple here?

I would have thought  (though this is only a very newbie assumption)
that if the end of the input being sent to the grammar has been reached
after the last <statement> has been matched, then there should be no
reason for the parse method to try to match <statement> again, and if it
fails to test for the end of input.

This is not how regexes or grammars work.

The + quantifier tries as many times as possible to match the regex. It doesn't look ahead to see if more characters are available, and it doesn't know about the end-of-string anchor that comes next in the grammar.

In fact, it doesn't know if the rule it quantifies might have a way to match zero characters. In this case, it would be wrong behavior to not do a zero-width at the end of the string.

As for improving the error reporting from within a grammar, there are lots of way to get creative, and I'd urge you to read Perl 6's own grammar, which is a good inspiration for that.
See https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp

One thing you could do is structure the statement rule differently:

rule statement {
    <id>
   [  '=' <endvalue>
   || { self.panic($/, "Invalid declaration.")
   ]
}

And maybe also TOP:

rule TOP { ^ [ <statement> || . { self.panic($/, "Expected a statement") } ] $ };

That extra dot before the panic ensures it's not called at the end of the string. If you don't want that, you could also do

[ <statement> || $ || { self.panic(...) } ]

Cheers,
Moritz

Reply via email to