On 5/31/10 3:11 AM, Tyler Curtis wrote:
Part of the purpose of my GSoC project is to develop tools to make
writing PAST::Optimizations not just possible, but actually
convenient.

Excellent, it's a much needed task.


There are existing special tools for tree transformations: for
example, the TGE or Tree Grammar Engine.
TGE provides a syntax for transformations on trees. To some extent.
More specifically, it provides a syntax for dispatch based on the type
of node in the tree to a PIR sub that does the transformation work. It
also has the ability to define subs to be executed on the attributes
of nodes of a certain type whenever a transform is performed on such a
node.

Agreed, TGE doesn't have a syntax, so you can't look there for syntax. In fact, a syntax is the single biggest thing TGE needs, and you could be the person to write that syntax. It also doesn't have the tree matching syntax that was planned for it (allowing more detailed matching than "this is a PAST::Op" node, to selecting individual features).

If working on the next version of TGE inspires you, what you should know is that it's just an implementation of basic Attribute Grammars, which have been written about since the dawn of computing theory (first mentioned in an ancient paper by Don Knuth). They are a way of transforming trees by traversing up and down the tree building up "attributes" (i.e. annotations) on the tree.

In actual practice of writing compilers, I've found that you pretty much never need to traverse up the tree, so a huge simplification is possible by only implementing downward traversals. For optimizers there's a chance you might need some upward traversals.

Caveat: Even though they've got a great deal of history behind them, I'm no longer convinced that attribute grammars are the best way to write compilers. They push you into thinking about annotating the existing tree, when you really should be thinking about how to build the replacement tree (or replacement node for optimizations). That is, they force your brain into the wrong flow.


What other options do we have for syntaxes?

One that has been mentioned to me is XPath. XPath is a syntax for
matching sub-trees of an XML document.

Yes, I also considered XPath and agreed, it's not the right fit. We can do better.

http://pastie.org/985050 is an example of how a HLL compiler could
apply optimizations to its generated code. Line 8 in Compiler.pm is
commented out, but if some of the language-specific optimizations in
Optimizer.pm were particularly complex, there could be a separate
"SomeLang::Optimizer::Actions" class that would hold action methods
similar to those used for normal NQP-rx grammars.

HLL::Compiler is definitely the place to hook in your optimization stages when run as part of a larger compiler. But, HLL::Compiler accepts any arbitrary stage of any arbitrary code, so this convenience is possible no matter what the optimizations are written in.

* Not easy to programmatically construct optimizations outside of NQP.
  Is this actually useful? I'm not sure.

Yes, it's useful.

I'm curious what others think of the two syntax proposals
included(XPath and Regexes), and if anyone has any other ideas of
possible syntaxes.

If you are inspired by Perl 6 Regexes, then go ahead and do it that way. As pmichaud said, it is part of the plan for Regexes so someone will do it eventually, and that someone could be you.

There are really two parts of a tree transformation engine, one is matching (finding) the nodes of the existing tree, and the other is building the nodes of the new tree.

I have doubts at how well Perl 6 Regexes will do at matching nodes. Not from practical experience of trying it out, mind you, but from a theoretical perspective. In general, a system that tries to do two completely separate things will end up doing both of them poorly. And, from a straight complexity perspective, Perl 6 Regexes are already enormously complex from trying to be good at natural language processing and computer language processing. Adding another dimension of being good at tree processing is a combinatorial explosion of complexity, which just may push it over the edge. Also, even with Perl 6 Regexes, you'd really be inventing new syntax anyway, you'd just be trying to force it into a string matching mold. (http://pastie.org/983989 is a good example of this, where the intent of the rule is buried under the fact that it's trying to act like string matching.)

On the side of building up new nodes, I find the NQP syntax (not the Regex part, but the plain old tree transformation part, which is a crippled subset of Perl 6) hopelessly byzantine, and in my most recent compilers have gone back to writing the tree transformation rules in pure PIR. (Not even TGE+PIR, just plain old PIR classes with methods for each transformation rule.) Where I used to spend hours in frustration trying to get NQP to do what I wanted, I now spend minutes writing the PIR. Which is *not* to say that PIR is the best tool for the job, but to say that we can do so much better than what we're doing now.

------

Rather than looking to adopt a existing syntax, can you take a step back and think "What do I want to do and how do I make it easy?" A quick list of the operations I regularly perform when writing a tree transformation are:

- Decide if this is the node I want to process (matching, which could be declarative syntax).

- Access attributes/children of a node: either class attribute names, hash keys, array keys, or iterating over hashes/arrays of attributes/children. (Could be declarative, but want the option of procedural for more complex cases.)

 - Construct a new node (could be declarative syntax).

- Recursively transform children of this node (could be declarative syntax), into the children of the new node.

And, unique to the optimization case:

- Replace an existing node (and the tree of nodes under it) with a new optimized node (which may have a tree of nodes under it). (Might be declarative, certainly easy with procedural.)


I think if you had a simple declarative syntax for matching nodes, accessing attributes of nodes, and building new nodes, plus the ability to drop in a block of procedural code (in PIR/NQP/random-HLL, basically anything that compiles to a Parrot Sub), you'd have made a huge contribution to Parrot, and to compiler writing techniques all over the world. It doesn't have to be fancy (in fact, better if it isn't fancy), just something plain like:


transform as_post
PAST::Op(:pirop('add'),:childcount(2),:childreturns('PAST::Val'|'PAST::Var'))
->
POST::Op(:name(tree.name),:pos(tree.pos),:children(tree.left.as_post, tree.right.as_post))

which translates down to something like:

.sub 'as_post' :method :multi('PAST::Op')
    .param pmc tree
    .local pmc post

    # because parrot only multidispatches on type, rather than
    # value, have to handle the values inside the multimethod.
    $S0 = tree.'pirop'()
    if $S0 == 'add' goto matched_pirop
        # not a match, try other alternates of type 'PAST::Op'
        # if the tree grammar has them, otherwise, return failed match
  matched_pirop:

     # test number of children and types of returns here

     post = new ['PAST';'Op']
     $S1 = tree.'name'()
     post.'name'($S1)

     $I0 = tree.'pos'()
     post.'pos'($I0)

     $P0 = tree.'left'()
     # or $P0 = tree[0]
     $P1 = self.'as_post'($P0)
     post.'push'($P1)

     # same for second child

    .return (post)
.end


Or:

replace op
PAST::Op(:pirop('add'),:childcount(2),:childreturns('PAST::Val'|'PAST::Var'))
->
{ # code that returns an optimized PAST::Op, to replace the existing one in the tree }


I intentionally kept the syntax here close to your original NQP-based example to show that it's possible to keep the familiarity of Parrot-style argument syntax without trying to merge two very different behaviors into one codebase. You could equally well go more toward regular-expressions like:

transform as_post { <nodetype: 'PAST::Op'> <childcount: 2> }
{ # there isn't really any good syntax for the replacement part in regexes }

or:
transform as_post / match / replace /

Or more toward attribute grammars, XSLT, Prolog, etc... or something completely new and different.

Allison
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to