Re: Unwanted failure and FAILGOAL
Hi, On 05/11/2016 07:45 AM, Richard Hainsworth wrote: I have the following in a grammar rule TOP{ ^ + $ }; rule statement { '=' | { { self.panic($/, "Declaration syntax incorrect") } } }; rule endvalue { '(' ~ ')' | { self.panic($/, "Invalid declaration.") } } The grammar parses a correct input file up until the end of the file. At that point even if there is no un-consumed input, there is an attempt to match , which fails. The failure causes the panic with 'Declaration syntax'. Am I missing something simple here? I would have thought (though this is only a very newbie assumption) that if the end of the input being sent to the grammar has been reached after the last has been matched, then there should be no reason for the parse method to try to match again, and if it fails to test for the end of input. This is not how regexes or grammars work. The + quantifier tries as many times as possible to match the regex. It doesn't look ahead to see if more characters are available, and it doesn't know about the end-of-string anchor that comes next in the grammar. In fact, it doesn't know if the rule it quantifies might have a way to match zero characters. In this case, it would be wrong behavior to not do a zero-width at the end of the string. As for improving the error reporting from within a grammar, there are lots of way to get creative, and I'd urge you to read Perl 6's own grammar, which is a good inspiration for that. See https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp One thing you could do is structure the statement rule differently: rule statement { [ '=' || { self.panic($/, "Invalid declaration.") ] } And maybe also TOP: rule TOP{ ^ [ || . { self.panic($/, "Expected a statement") } ] $ }; That extra dot before the panic ensures it's not called at the end of the string. If you don't want that, you could also do [ || $ || { self.panic(...) } ] Cheers, Moritz
Re: Unwanted failure and FAILGOAL
Hi Richard, Not a complete answer to your question; just an observation about your grammar: > rule TOP{ ^ + $ }; > > rule statement { '=' > | { { self.panic($/, "Declaration syntax incorrect") } } > }; > > rule endvalue { '(' ~ ')' > | { self.panic($/, "Invalid declaration.") } > } That's more or less the equivalent of: sub TOP { die if !at_start_of_input(); loop { last unless try statement() }; die if !at_end_of_input(); } sub statement { try { id(); match_literal('='); endvalue() } or die "Declaration syntax incorrect"; } sub endvalue { try { keyword(); match_literal('('); pairlist(); match_literal(')') } or die "Invalid declaration." } In which case, would you really have expected a call to TOP() NOT to throw an exception from statement(), the first time statement() couldn't match (as it inevitably won't if we're at the end of the input)??? If these were subroutines, I suspect you'd have written something more like: sub statement { try { id(); match_literal('='); endvalue() } or *!at_end_of_input()&& die "Declaration syntax incorrect"* } sub endvalue { try { keyword(); match_literal('('); pairlist(); match_literal(')' } or *!at_end_of_input()&& die "Invalid declaration.";* } which, in a regex, would be something like: rule TOP{ ^ + $ }; rule statement { '=' | * \S **# ...means we found something else, so...* * { self.panic($/, "Declaration syntax incorrect") }* }; rule endvalue { '(' ~ ')' | * \S **# ...means we found something else, so...* * { self.panic($/, "Invalid declaration.") }* } Though, personally, I'd have been inclined to write it like this: rule TOP{ ^ + *[ $ | ]* } rule statement { 'ID' '=' } rule endvalue { 'keyword' '(' ~ ')' 'pairlist' } *rule unexpected { $ = (\N+) { self.panic($/,"Expected statement but found '$'") }}* In other words: after the statements, we're either at the end of the input, or else we found something unexpected, so capture it and then report it. HTH, Damian