On Mon Feb 18 14:00:49 2008, particle wrote:
> in rakudo's perl6doc parser
> (languages/perl6/src/utils/perl6doc/grammar.pg), i have the following:
>
> token pod_delimited_block {
> ^^ '=' <.unsp>? 'begin' <.ws> <block_type> <pod_option>* \n
> .*?
> ^^ '=' <.unsp>? 'end' <.ws> $<block_type> \N*
> {*}
> }
>
> i'd like to capture '.*?' either via an alias or better, via a
> subrule. however, modifying the grammar to something that will
> capture, like
> (.*?)
> or
> $<body>=[.*?]
> or
> <some_subrule>
>
> causes the match to fail. smells like a pge bug to me.
Turns out that this isn't a bug, although it is a somewhat unexpected
artifact of :ratchet. When :ratchet is active within a regex (as would
be the case for 'token' or 'rule'), then placing a grouping construct
around .*? effectively makes it non-backtracking. Or, to be more
precise, the grouping construct doesn't have an explicit quantifier
on it (even though the thing it contains does have one), and thus
once the group matches something then :ratchet prevents us from
backtracking into it.
So, in this specific instance of a token (i.e., :ratchet is in
effect), the expression C<< .*? >> performs backtracking and
will eagerly match any sequence, but C<< (.*?) >> and C<< [.*?] >>
always match exactly the null string because there is an
assumed "cut" operation after the parens or brackets.
There was a short discussion on IRC about possibly changing this
to be somewhat less surprising, but I think we concluded that
the current behavior is the "least bad" one for now.
Closing ticket.
Pm