On Tue, 31 Mar 2015 13:17:41 -0700, [email protected] wrote:
> OS: Ubuntu 14.04 LTS on VirtualBox
> Host OS: Windows 8
> Rakudo version: Current as of 25/03/2015
>
> This is a simple parser for function argument syntax.
>
> With this there are two surprising behaviors for the price of one. The
> first is in the token TOP. As the script stands, the test passes.
>
> token <term> expands to either a <compound-term> or an <integer>, and of
> course the first alternative matches, as this trace shows:
>
> --cut here--
> 「foo(1)」
> term => 「foo(1)」
> compound-term => 「foo(1)」
> atom => 「foo」
> argument-list => 「1」
> integer => 「1」
> --cut here--
>
> The commented-out line should simply bypass expanding <term> into
> <compound-term>, but instead parsing fails. Note that it's using the
> same quantifiers in both cases.
>
> The other waterbed-style issue is in the second set of commented-out
> lines.
>
> Just like above, the uncommented line works correctly, and expands to
> the match tree shown above. However, if you write out the <integer>* %
> ',' inline in the compound-term directly, the match fails. Since actions
> don't run on a failed parse (a good thing from the point of view of side
> effects) I don't have much of a way to debug the situation, but I'll
> look at parser internals later. Something like a regex debugger and/or
> REPL would be an excellent idea, and I've already started binding Linux
> libreadline in perl6.
>
> Anyway, thoughts for consideration. I'm not certain why the behavior
> manifests itself, but I'm going to spend some time poking around.
>
> --cut here--
> use v6;
> grammar Bug {
>
> #token TOP { <compound-term>* % \n }
> token TOP { <term>* % \n }
>
> token term {
> <compound-term>
> | <integer>
> }
>
> token atom { <[a..z]>+ }
> token integer { <[0..9]>+ }
>
> token argument-list { <integer>* % ',' }
>
> token compound-term {
> #<atom> '(' <integer>* % ',' ')' # This term should be the expanded
> form of
> <atom> '(' <argument-list> ')' # This term here, yet the above
> generates an error.
> }
> }
>
> use Test;
> ok Bug.parse('foo(1)');
> --cut here--
Update:
I was only able to replicate this when keeping the first comment
and uncommenting the second one. No other combination failed the
parse.
The failing case golfs down to:
$ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," ")" | 1 } };
Bug.parse("f(1)"); $/.say;'
Nil
$ perl6 -e 'grammar NoBug { token TOP { f "(" <b> ")" | 1 }; token b { 1* % ","
} }; NoBug.parse("f(1)"); $/.say;'
「f(1)」
b => 「1」
A sequence point will "work around" the bug:
$ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," {} ")" | 1 } };
Bug.parse("f(1)"); $/.say;'
「f(1)」
...so will using 1+ instead of 1*. This would make me suspect that this was
merely a
problem with backtracking... ISTR there was some sort of issue when using
patterns
that can match 0 chars.
However, I tried changing <b> to a regex instead of a token to remove the
ratchet,
and that failed to reproduce the problem. So -^o^-