Grammars: parse tags of which only some need closing tags

Moritz Lenz Sat, 28 Aug 2010 14:28:25 -0700

Hi,

I'm currently in the progress of porting a template system [1] to Perl 6
[2].


It's fun to write the parser as a Perl 6 grammar, but there's one thing
that I don't know how to solve elegantly.

The markup format allows arbitrary text, and optionally some tags
interleaved. Some of them stand on their own, like

[% setvar title Grammars: parse tags of which only some ... %]

And others have opening/closing pairs, and their proper nesting needs to
be enforced, for example

[% ifvar title %]
    <h1>[% readvar title %]
[% endifvar %]

(yes, the syntax is horrible, but when you write that stuff, 90% is
normal text, and only 10% markup or so, so that's OK, more or less).

Additionally, what goes between nested tags depends on the tags, for
example the [% verbatim %] ... [% endverbatim %] tag pair allows
unmatched [% in between (but that's the only case, so I could cheat a
bit if necessary).


I've started to parse the simple tags like this:

our $open  = '[%';
our $close = '%]';

...

token chunk { <literal> | <directive> }

token directive {
    $open ~ $close
    [<.ws> <command> <.ws> ]
}

proto token command { <...> }
token command:sym<comment> { <sym>  [ <!before $close> .]* }
token command:sym<include> { <sym> <.ws> <arg> }
rule  command:sym<setvar>  { <sym> <name> '='? <slurpy_arg> }
rule  command:sym<readvar> { <sym> <name> }


which seems to be a fairly idiomatic way, and factors out the matching
of open/closing delimiters. But that way, I don't see how I can properly
check for closing tags that follow some of the opening tags.

I could cheat, and say

rule  command:sym<ifvar>   { <sym> <name> '%]' <chunks>* '[%' 'endifvar' }

But that would be, well, cheating (and would probably mess up the
backtracking control).


Another idea is to <directive> a proto token, and have each branch match
its own '[%'. But that's rather repetitive.

Another idea is to have nested_command and a single_command rules, and
use them as alternations, thus duplicating the matching of the delimiter
only twice. (Upon more reflection, this seems like the best approach so
far).


Can anybody think of a pattern that solves this problem even more
elegantly, hopefully without any repetition?

Cheers,
Moritz

[1] http://perlgeek.de/en/software/mowyw
[2] http://github.com/moritz/6mowyw

Grammars: parse tags of which only some need closing tags

Reply via email to