On 5/31/10 3:11 AM, Tyler Curtis wrote:
Part of the purpose of my GSoC project is to develop tools to make
writing PAST::Optimizations not just possible, but actually
convenient.
Excellent, it's a much needed task.
There are existing special tools for tree transformations: for
example, the TGE or Tree Grammar Engine.
TGE provides a syntax for transformations on trees. To some extent.
More specifically, it provides a syntax for dispatch based on the type
of node in the tree to a PIR sub that does the transformation work. It
also has the ability to define subs to be executed on the attributes
of nodes of a certain type whenever a transform is performed on such a
node.
Agreed, TGE doesn't have a syntax, so you can't look there for syntax.
In fact, a syntax is the single biggest thing TGE needs, and you could
be the person to write that syntax. It also doesn't have the tree
matching syntax that was planned for it (allowing more detailed matching
than "this is a PAST::Op" node, to selecting individual features).
If working on the next version of TGE inspires you, what you should know
is that it's just an implementation of basic Attribute Grammars, which
have been written about since the dawn of computing theory (first
mentioned in an ancient paper by Don Knuth). They are a way of
transforming trees by traversing up and down the tree building up
"attributes" (i.e. annotations) on the tree.
In actual practice of writing compilers, I've found that you pretty much
never need to traverse up the tree, so a huge simplification is possible
by only implementing downward traversals. For optimizers there's a
chance you might need some upward traversals.
Caveat: Even though they've got a great deal of history behind them, I'm
no longer convinced that attribute grammars are the best way to write
compilers. They push you into thinking about annotating the existing
tree, when you really should be thinking about how to build the
replacement tree (or replacement node for optimizations). That is, they
force your brain into the wrong flow.
What other options do we have for syntaxes?
One that has been mentioned to me is XPath. XPath is a syntax for
matching sub-trees of an XML document.
Yes, I also considered XPath and agreed, it's not the right fit. We can
do better.
http://pastie.org/985050 is an example of how a HLL compiler could
apply optimizations to its generated code. Line 8 in Compiler.pm is
commented out, but if some of the language-specific optimizations in
Optimizer.pm were particularly complex, there could be a separate
"SomeLang::Optimizer::Actions" class that would hold action methods
similar to those used for normal NQP-rx grammars.
HLL::Compiler is definitely the place to hook in your optimization
stages when run as part of a larger compiler. But, HLL::Compiler accepts
any arbitrary stage of any arbitrary code, so this convenience is
possible no matter what the optimizations are written in.
* Not easy to programmatically construct optimizations outside of NQP.
Is this actually useful? I'm not sure.
Yes, it's useful.
I'm curious what others think of the two syntax proposals
included(XPath and Regexes), and if anyone has any other ideas of
possible syntaxes.
If you are inspired by Perl 6 Regexes, then go ahead and do it that way.
As pmichaud said, it is part of the plan for Regexes so someone will do
it eventually, and that someone could be you.
There are really two parts of a tree transformation engine, one is
matching (finding) the nodes of the existing tree, and the other is
building the nodes of the new tree.
I have doubts at how well Perl 6 Regexes will do at matching nodes. Not
from practical experience of trying it out, mind you, but from a
theoretical perspective. In general, a system that tries to do two
completely separate things will end up doing both of them poorly. And,
from a straight complexity perspective, Perl 6 Regexes are already
enormously complex from trying to be good at natural language processing
and computer language processing. Adding another dimension of being good
at tree processing is a combinatorial explosion of complexity, which
just may push it over the edge. Also, even with Perl 6 Regexes, you'd
really be inventing new syntax anyway, you'd just be trying to force it
into a string matching mold. (http://pastie.org/983989 is a good example
of this, where the intent of the rule is buried under the fact that it's
trying to act like string matching.)
On the side of building up new nodes, I find the NQP syntax (not the
Regex part, but the plain old tree transformation part, which is a
crippled subset of Perl 6) hopelessly byzantine, and in my most recent
compilers have gone back to writing the tree transformation rules in
pure PIR. (Not even TGE+PIR, just plain old PIR classes with methods for
each transformation rule.) Where I used to spend hours in frustration
trying to get NQP to do what I wanted, I now spend minutes writing the
PIR. Which is *not* to say that PIR is the best tool for the job, but to
say that we can do so much better than what we're doing now.
------
Rather than looking to adopt a existing syntax, can you take a step back
and think "What do I want to do and how do I make it easy?" A quick list
of the operations I regularly perform when writing a tree transformation
are:
- Decide if this is the node I want to process (matching, which could
be declarative syntax).
- Access attributes/children of a node: either class attribute names,
hash keys, array keys, or iterating over hashes/arrays of
attributes/children. (Could be declarative, but want the option of
procedural for more complex cases.)
- Construct a new node (could be declarative syntax).
- Recursively transform children of this node (could be declarative
syntax), into the children of the new node.
And, unique to the optimization case:
- Replace an existing node (and the tree of nodes under it) with a new
optimized node (which may have a tree of nodes under it). (Might be
declarative, certainly easy with procedural.)
I think if you had a simple declarative syntax for matching nodes,
accessing attributes of nodes, and building new nodes, plus the ability
to drop in a block of procedural code (in PIR/NQP/random-HLL, basically
anything that compiles to a Parrot Sub), you'd have made a huge
contribution to Parrot, and to compiler writing techniques all over the
world. It doesn't have to be fancy (in fact, better if it isn't fancy),
just something plain like:
transform as_post
PAST::Op(:pirop('add'),:childcount(2),:childreturns('PAST::Val'|'PAST::Var'))
->
POST::Op(:name(tree.name),:pos(tree.pos),:children(tree.left.as_post,
tree.right.as_post))
which translates down to something like:
.sub 'as_post' :method :multi('PAST::Op')
.param pmc tree
.local pmc post
# because parrot only multidispatches on type, rather than
# value, have to handle the values inside the multimethod.
$S0 = tree.'pirop'()
if $S0 == 'add' goto matched_pirop
# not a match, try other alternates of type 'PAST::Op'
# if the tree grammar has them, otherwise, return failed match
matched_pirop:
# test number of children and types of returns here
post = new ['PAST';'Op']
$S1 = tree.'name'()
post.'name'($S1)
$I0 = tree.'pos'()
post.'pos'($I0)
$P0 = tree.'left'()
# or $P0 = tree[0]
$P1 = self.'as_post'($P0)
post.'push'($P1)
# same for second child
.return (post)
.end
Or:
replace op
PAST::Op(:pirop('add'),:childcount(2),:childreturns('PAST::Val'|'PAST::Var'))
->
{ # code that returns an optimized PAST::Op, to replace the existing one
in the tree }
I intentionally kept the syntax here close to your original NQP-based
example to show that it's possible to keep the familiarity of
Parrot-style argument syntax without trying to merge two very different
behaviors into one codebase. You could equally well go more toward
regular-expressions like:
transform as_post { <nodetype: 'PAST::Op'> <childcount: 2> }
{ # there isn't really any good syntax for the replacement part in regexes }
or:
transform as_post / match / replace /
Or more toward attribute grammars, XSLT, Prolog, etc... or something
completely new and different.
Allison
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev