Part of the purpose of my GSoC project is to develop tools to make
writing PAST::Optimizations not just possible, but actually
convenient.

There are existing special tools for tree transformations: for
example, the TGE or Tree Grammar Engine.
TGE provides a syntax for transformations on trees. To some extent.
More specifically, it provides a syntax for dispatch based on the type
of node in the tree to a PIR sub that does the transformation work. It
also has the ability to define subs to be executed on the attributes
of nodes of a certain type whenever a transform is performed on such a
node.

Some good things about TGE:
* Generic: can be used for any sort of tree-like data structure.
 I like this. My PAST::Walker and PAST::Transformer can't do
this(although 'walk', 'walkChildren', and 'replaceChildren' are  all
multi-methods so nothing prevents someone from writing multi-methods
for traversing POST::Nodes or any other tree-like data structure. It's
just less effortless than with TGE.
* Simple syntax for the dispatching it support.
 Simple syntax is very useful.

Some problems I have with TGE:
* Although the syntax theoretically supports arbitrary languages for
the transformations, PIR is the only implemented option.
 PIR is a surprisingly pleasant language for how low level it is, but
it's not something I'd want to code in more than necessary without a
very good reason.
* The dispatching syntax is too limited.
  I want to be able to conveniently say "Do this to sub-trees that
consist of a PAST::Op node with its 'pirop' attribute set to 'add',
two PAST::Val children whose 'returns' attributes are in this list of
types." instead of saying, "Do this(where this is a subroutine I had
to write to make sure that the node is actually one that I care about)
to all PAST::Op nodes."

What other options do we have for syntaxes?

One that has been mentioned to me is XPath. XPath is a syntax for
matching sub-trees of an XML document. Of course, since XML documents
are essentially just trees, it can easily be generalized to non-XML
tree-like data structures. In fact, it's a great fit for PCT's
PAST::Node and POST::Node, because like XML elements, they have both
attributes and children.

So, how would we define my hypothetical "PAST::Op node with its
'pirop' attribute set to 'add', two PAST::Val children whose 'returns'
attributes are in this list of types" in XPath-like syntax(XPath uses
::, so I used ":" as in XML to represent namespacing)? For
simplicity's sake, I'll assume that Integers are the only type of Val
we're looking for.

/descendant-or-self::PAST:Op/PAST:Val[1]/self::[...@type='Integer']/following-sibling::PAST:Val[2]/self::[...@type='Integer']/parent::node()

Or something along those lines. I'm not absolutely certain about
XPath's syntax. One can probably collapse the
PAST;Val[1]/self::[...@type='Integer'] into a single predicate somehow, I
expect. But that's approximately it.

Some advantages of XPath or an XPath-like syntax:
* Generic
 Like TGE, it is usable on any tree-like data structure(essentially
any Parrot object with an array part and no cycles in the array part).
* Familiar.
 It's not something new and unique to Parrot. Instead it's a standard
and fairly well-known language for sub-tree matching. Quite a few
people will already know the syntax and be able to move along to
actually doing things with PASTs with very little additional effort.

I can see some disadvantages to XPath:
* Semantic differences between XPath and Parrot
 For example, different type systems, different type conversion
rules, different comparison semantics. These aren't insurmountable
obstacles, but they could potentially cause some things to not work
the way people unfamiliar with XPath semantics would expect.
* Version confusion
  If we used XPath, should we use XPath 1.0 or 2.0? Note that 2.0 is
not backwards compatible, so if someone mistakenly thinks that they
are using XPath 1.0 when actually using 2.0, some things will not work
as they expect.
* Some of the standard functions not useful for PAST optimization.
 Using an XPath syntax might cause people to expect all the XPath
standard functions(possibly including the new ones in 2.0). But URI
functions, date handling, etc. are not relevant to optimization.
* Potentially difficult to extend with support for calling Parrot subs
as predicates due to semantic differences.
 This would probably not be compatible with the standard, as well.
* Seemingly no facility for abstraction.
 I can't seem to find any way to define new functions in XPath.
Probably a small issue, but they could be useful in some cases.


pmichaud provided an interesting idea a few nights ago: use Perl 6/NQP
regexes. As he pointed out to me, they are intended to be usable for
more than just text. They also have the advantage of familiarity; most
compiler writers using PCT (now, at least) use NQP's regex support to
define their grammar. sorear presented an interesting idea as to how
exactly this could be done. He suggested a new keyword "rewrite" for
declaring a rule that functions similar to the .tr() method except on
non-strings.

http://pastie.org/983989 is an example of an optimization for constant
folding of addition.

http://pastie.org/985050 is an example of how a HLL compiler could
apply optimizations to its generated code. Line 8 in Compiler.pm is
commented out, but if some of the language-specific optimizations in
Optimizer.pm were particularly complex, there could be a separate
"SomeLang::Optimizer::Actions" class that would hold action methods
similar to those used for normal NQP-rx grammars.

Advantages:

* Familiar to compiler writers using PCT and people who use Perl 6.
* Easily interfaces with Parrot code.
* Easy to abstract common patterns.
 Just define a new rule in your grammar.
* Easy for a HLL writer to add an existing independent optimization to
their HLL(as simple as "is SomeOptimization").
* Easy to integrate with PCT.
 Just add a "parseoptimizer" method to HLLCompiler to parallel
parsegrammar and parseactions, and add a new default stage before POST
generation that applies the optimizations in the HLL's optimizer to
the PAST.

Disadvantages:
* Not very convenient to say "Run this optimization on this tree."
 This sort of syntax definitely optimizes for the HLL compiler
choosing a set of optimizations to be run on all code compiled with it
use case. However, there can be a helper sub for applying an
optimization to a tree, which should help with this.
* Optimizers and Optimizations will look fairly similar to HLL::Grammars.
 This could lead to confusion. However, they do have different parent
classes and use different keywords for declaring rules.
* Not easy to programmatically construct optimizations outside of NQP.
 Is this actually useful? I'm not sure.
* No obvious way to express some possible optimization strategies(e.g.
a genetic optimizer).

I'm curious what others think of the two syntax proposals
included(XPath and Regexes), and if anyone has any other ideas of
possible syntaxes.

-- 
Tyler Curtis
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to