Part of the purpose of my GSoC project is to develop tools to make writing PAST::Optimizations not just possible, but actually convenient.
There are existing special tools for tree transformations: for example, the TGE or Tree Grammar Engine. TGE provides a syntax for transformations on trees. To some extent. More specifically, it provides a syntax for dispatch based on the type of node in the tree to a PIR sub that does the transformation work. It also has the ability to define subs to be executed on the attributes of nodes of a certain type whenever a transform is performed on such a node. Some good things about TGE: * Generic: can be used for any sort of tree-like data structure. I like this. My PAST::Walker and PAST::Transformer can't do this(although 'walk', 'walkChildren', and 'replaceChildren' are all multi-methods so nothing prevents someone from writing multi-methods for traversing POST::Nodes or any other tree-like data structure. It's just less effortless than with TGE. * Simple syntax for the dispatching it support. Simple syntax is very useful. Some problems I have with TGE: * Although the syntax theoretically supports arbitrary languages for the transformations, PIR is the only implemented option. PIR is a surprisingly pleasant language for how low level it is, but it's not something I'd want to code in more than necessary without a very good reason. * The dispatching syntax is too limited. I want to be able to conveniently say "Do this to sub-trees that consist of a PAST::Op node with its 'pirop' attribute set to 'add', two PAST::Val children whose 'returns' attributes are in this list of types." instead of saying, "Do this(where this is a subroutine I had to write to make sure that the node is actually one that I care about) to all PAST::Op nodes." What other options do we have for syntaxes? One that has been mentioned to me is XPath. XPath is a syntax for matching sub-trees of an XML document. Of course, since XML documents are essentially just trees, it can easily be generalized to non-XML tree-like data structures. In fact, it's a great fit for PCT's PAST::Node and POST::Node, because like XML elements, they have both attributes and children. So, how would we define my hypothetical "PAST::Op node with its 'pirop' attribute set to 'add', two PAST::Val children whose 'returns' attributes are in this list of types" in XPath-like syntax(XPath uses ::, so I used ":" as in XML to represent namespacing)? For simplicity's sake, I'll assume that Integers are the only type of Val we're looking for. /descendant-or-self::PAST:Op/PAST:Val[1]/self::[...@type='Integer']/following-sibling::PAST:Val[2]/self::[...@type='Integer']/parent::node() Or something along those lines. I'm not absolutely certain about XPath's syntax. One can probably collapse the PAST;Val[1]/self::[...@type='Integer'] into a single predicate somehow, I expect. But that's approximately it. Some advantages of XPath or an XPath-like syntax: * Generic Like TGE, it is usable on any tree-like data structure(essentially any Parrot object with an array part and no cycles in the array part). * Familiar. It's not something new and unique to Parrot. Instead it's a standard and fairly well-known language for sub-tree matching. Quite a few people will already know the syntax and be able to move along to actually doing things with PASTs with very little additional effort. I can see some disadvantages to XPath: * Semantic differences between XPath and Parrot For example, different type systems, different type conversion rules, different comparison semantics. These aren't insurmountable obstacles, but they could potentially cause some things to not work the way people unfamiliar with XPath semantics would expect. * Version confusion If we used XPath, should we use XPath 1.0 or 2.0? Note that 2.0 is not backwards compatible, so if someone mistakenly thinks that they are using XPath 1.0 when actually using 2.0, some things will not work as they expect. * Some of the standard functions not useful for PAST optimization. Using an XPath syntax might cause people to expect all the XPath standard functions(possibly including the new ones in 2.0). But URI functions, date handling, etc. are not relevant to optimization. * Potentially difficult to extend with support for calling Parrot subs as predicates due to semantic differences. This would probably not be compatible with the standard, as well. * Seemingly no facility for abstraction. I can't seem to find any way to define new functions in XPath. Probably a small issue, but they could be useful in some cases. pmichaud provided an interesting idea a few nights ago: use Perl 6/NQP regexes. As he pointed out to me, they are intended to be usable for more than just text. They also have the advantage of familiarity; most compiler writers using PCT (now, at least) use NQP's regex support to define their grammar. sorear presented an interesting idea as to how exactly this could be done. He suggested a new keyword "rewrite" for declaring a rule that functions similar to the .tr() method except on non-strings. http://pastie.org/983989 is an example of an optimization for constant folding of addition. http://pastie.org/985050 is an example of how a HLL compiler could apply optimizations to its generated code. Line 8 in Compiler.pm is commented out, but if some of the language-specific optimizations in Optimizer.pm were particularly complex, there could be a separate "SomeLang::Optimizer::Actions" class that would hold action methods similar to those used for normal NQP-rx grammars. Advantages: * Familiar to compiler writers using PCT and people who use Perl 6. * Easily interfaces with Parrot code. * Easy to abstract common patterns. Just define a new rule in your grammar. * Easy for a HLL writer to add an existing independent optimization to their HLL(as simple as "is SomeOptimization"). * Easy to integrate with PCT. Just add a "parseoptimizer" method to HLLCompiler to parallel parsegrammar and parseactions, and add a new default stage before POST generation that applies the optimizations in the HLL's optimizer to the PAST. Disadvantages: * Not very convenient to say "Run this optimization on this tree." This sort of syntax definitely optimizes for the HLL compiler choosing a set of optimizations to be run on all code compiled with it use case. However, there can be a helper sub for applying an optimization to a tree, which should help with this. * Optimizers and Optimizations will look fairly similar to HLL::Grammars. This could lead to confusion. However, they do have different parent classes and use different keywords for declaring rules. * Not easy to programmatically construct optimizations outside of NQP. Is this actually useful? I'm not sure. * No obvious way to express some possible optimization strategies(e.g. a genetic optimizer). I'm curious what others think of the two syntax proposals included(XPath and Regexes), and if anyone has any other ideas of possible syntaxes. -- Tyler Curtis _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
