I have been working on my own TPDL parser (I make the distinction from PEG 
because while it works in approximately the same way, it does not support the 
fancy Regular-Expression-like syntax, and right now I am working on a token 
level rather than a character level) in C (imagine that!) as part of my 
summer goal of building a compiler. The parser is working very well, I have 
it support left recursion directly (per a paper listen on Bryan's website), 
and I am all-around very happy with it.

I wanted to mention one of the way in which I have been building an AST from
my parse tree while the tree itself is being constructed (a more appropriate 
term might be: reduced parse tree). I use one rule within the parser and also 
two operators within the grammar. One of these operators is already well-
understood and has been mentioned on this list several times.

My two operators are:

i) A non-excludable operator (-), which requires that a non/terminal appear 
in the reduced parse tree. If no such operator is used, then terminals are 
*not* added to the tree as leafs, and non-terminals, where only one child 
exists, are replaced with their child (my automatic reduction rule). 

ii) A raise children operator (^), which puts the derivations of a particular
non-terminal in a production in place of the non-terminal. This allows me to 
collect lists of nodes.

Note: an identifier surrounded with '<' and '>' signals a token. The empty 
'<>' represents the empty token (epsilon transition). The notation I use for 
my grammars is fairly straightforward.

Here are some simple things I am able to do with my one rule and two 
operators:

Things
    : Thing ^Things
    : <>
    ;

ListOfThings
    : -Things
    ;

When applied with 'ListOfThings' as the starting production, this grammar 
will result in a tree of 'Thing' nodes, rooted at a 'Things' node. Note: 
'ListOfThings' will not appear in the tree.

One normally expects terminals / tokens to appear as the leaves of the tree; 
however, with these rules we can force non-terminals to appear as leaves. In 
my case, this has helpful as a flag. For example, if a certain token appears
in the terminal stream then I can have a non-terminal appear as a leaf node
in the parse tree; however, if the token does not appear then no such non-
terminal will appear.

Rule
    : RuleFlag -<non_terminal>
    : RuleFlag -<terminal>
    : RuleFlag -<epsilon>
    ;

RuleFlag
    : -NonExcludable
    : -Subsumable
    : <>
    ;

NonExcludable : <dash> ;
Subsumable : <up_arrow> ;

The above grammar is actually describing the syntax of the grammar itself, 
in this case, how a non/terminal looks in the right-hand-side of a 
production. Subsumable might not be the most appropriate word; however, it 
is what I thought of at the time. In the above example, if a '-' appears 
before a non_terminal token then a 'NonExcludable' non-terminal node will 
appear as a leaf in the parse tree.

I was wondering if anyone has done similar things, other than to have a 
similar operator to my non-excludable (-) one. Also, does anyone have any 
thoughts on my approach; am I possibly missing an obvious pitfalls?



_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to