Re: grammars and indentation of input

Patrick R. Michaud Tue, 13 Sep 2016 09:55:39 -0700

I don't have an example handy, but I can categorically say that
Perl 6 grammars are designed to support exactly this form of parsing.
It's almost exactly what I did in "pynie" -- a Python implementation
on top of Perl 6.  The parsing was done using a Perl 6 grammar.

If I remember correctly, Pynie had <indent>, <indent_same>, and <dedent>
grammar rules.  The grammar kept a stack of known indentation levels.
The <indent> rule was a zero-width match that would succeed when it
found leading whitespace greater than the current indentation level
(and push the new level onto the stack).  The <indent_same> rule
was a zero-width match that succeed when the leading whitespace
exactly matched the current indentation level.  And the <dedent>
rule would be called when <indent> and <indent_same> no longer 
matched, popping the top level off the stack.

So the grammar rule to match an indented block ended up looking
something like (I've shortened the example here):

    token suite {
        <indent> <statement>
            [ <indent_same> <statement> ]*
            [ <dedent> | <die("IndentationError: unindent does not match any 
outer indentation level")> ]
    }

A python "if statement" then looked like:

    rule if_stmt {
        'if' <expression> ':' <suite>
        [ 'elif' <expression> ':' <suite> ]*
        [ 'else' ':' <suite> ]?
    }

where the <suite> subrules would match the statements or block
of statements indented within the "if" statement.

However, all of <indent>, <indent_same>, and <dedent> were written using
"normal" (non-regular expression) code.  Perl 6 makes this easy; since 
grammar rules are just methods in a class (that have a different code
syntax), you can create your own methods to emulate a grammar rule.  
The methods simply need to follow the Cursor protocol; that is, 
return Match objects indicating success/failure/length of whatever has 
been parsed at that point.

I hope this is a little useful.  If I can dig up or recreate a more 
complete Python implementation example sometime, I'll post it.

Pm

On Tue, Sep 13, 2016 at 01:13:45PM +0200, Theo van den Heuvel wrote:
> Hi all,
> 
> I am beginning to appreciate the power of grammars and the Match class. This
> is truly a major asset within Perl6.
> 
> I have a question on an edge case. I was hoping to use a grammar for an
> input that has meaningful indented blocks.
> I was trying something like this:
> 
>   token element { <.lm> [ <linetail> | $<ind>=[ ' '+ ] <level($<ind>)> ] }
>   token lm { ^^ ' '**{$cur-indent} } # skip up to current indent level
> 
> My grammar has a method called within the level rule that maintains a stack
> of indentations and sets a $cur-indent.
> I can imagine that the inner workings of the parser (i.e. optimization)
> frustrate this approach.
> Is there a way to make something like this work?
> 
> Thanks,
> Theo
> 
> -- 
> Theo van den Heuvel
> Van den Heuvel HLT Consultancy

Re: grammars and indentation of input

Reply via email to