I don't have an example handy, but I can categorically say that
Perl 6 grammars are designed to support exactly this form of parsing.
It's almost exactly what I did in "pynie" -- a Python implementation
on top of Perl 6. The parsing was done using a Perl 6 grammar.
If I remember correctly, Pynie had <indent>, <indent_same>, and <dedent>
grammar rules. The grammar kept a stack of known indentation levels.
The <indent> rule was a zero-width match that would succeed when it
found leading whitespace greater than the current indentation level
(and push the new level onto the stack). The <indent_same> rule
was a zero-width match that succeed when the leading whitespace
exactly matched the current indentation level. And the <dedent>
rule would be called when <indent> and <indent_same> no longer
matched, popping the top level off the stack.
So the grammar rule to match an indented block ended up looking
something like (I've shortened the example here):
token suite {
<indent> <statement>
[ <indent_same> <statement> ]*
[ <dedent> | <die("IndentationError: unindent does not match any
outer indentation level")> ]
}
A python "if statement" then looked like:
rule if_stmt {
'if' <expression> ':' <suite>
[ 'elif' <expression> ':' <suite> ]*
[ 'else' ':' <suite> ]?
}
where the <suite> subrules would match the statements or block
of statements indented within the "if" statement.
However, all of <indent>, <indent_same>, and <dedent> were written using
"normal" (non-regular expression) code. Perl 6 makes this easy; since
grammar rules are just methods in a class (that have a different code
syntax), you can create your own methods to emulate a grammar rule.
The methods simply need to follow the Cursor protocol; that is,
return Match objects indicating success/failure/length of whatever has
been parsed at that point.
I hope this is a little useful. If I can dig up or recreate a more
complete Python implementation example sometime, I'll post it.
Pm
On Tue, Sep 13, 2016 at 01:13:45PM +0200, Theo van den Heuvel wrote:
> Hi all,
>
> I am beginning to appreciate the power of grammars and the Match class. This
> is truly a major asset within Perl6.
>
> I have a question on an edge case. I was hoping to use a grammar for an
> input that has meaningful indented blocks.
> I was trying something like this:
>
> token element { <.lm> [ <linetail> | $<ind>=[ ' '+ ] <level($<ind>)> ] }
> token lm { ^^ ' '**{$cur-indent} } # skip up to current indent level
>
> My grammar has a method called within the level rule that maintains a stack
> of indentations and sets a $cur-indent.
> I can imagine that the inner workings of the parser (i.e. optimization)
> frustrate this approach.
> Is there a way to make something like this work?
>
> Thanks,
> Theo
>
> --
> Theo van den Heuvel
> Van den Heuvel HLT Consultancy