Re: Rewriting the assembler

Steve Fink Wed, 27 Feb 2002 11:28:29 -0800

On Wed, Feb 27, 2002 at 07:07:26PM +0000, Simon Cozens wrote:
> Granted, these components will share some library code, such as that to
> parse out a line of assembly source, but I think that specialized elements
> working on text is the way to go here.
> 
> The real advantage of this method, other than making the overall design
> and process of the assembler easy to understand, is that we can slot in
> optimizations as additional filters at any stage of the assembler's operation.
> 
> I'll produce a more specific PDD about how I'd like the assembler to look
> if this idea makes any sense to anyone other than me.


The disadvantage of this approach is that you're stuck with a
lowest-common denominator format, text, as the communication format
between the components. Which means that each component has to know
how to both parse and render, and it's more difficult to pass
out-of-band information between components.

If you want to preserve the text filter model, then the above problem
can be worked around by requiring nearly all filters to use the same
library code for parsing and rendering. You still have to define a
metadata format, and your metadata will be forced to be fairly simple.

With my regular expression compiler, I didn't use a text filter model,
but I did use a filter model. It's somewhat OO, but only within each
component (and mostly because I wanted to support multiple backends).
Here's all but the 'use' statements from the complete driver program
that takes an regular expression as a command-line argument and prints
parrot opcodes to stdout:

    my $parser = Parse->new();
    my $opt1 = PreOptimize->new();
    my $rewrite = "Rewrite::$backend"->new();
    my $opt2 = Optimize->new();
    my $cgen = "CodeGen::$backend"->new();

    my $tree = $parser->compile(shift());
    $tree = $opt1->pre_rewrite_optimize($tree);
    my @code = $rewrite->rewrite($tree);
    my @optcode = $opt2->optimize(@code);
    my @optasm = $cgen->output(@optcode);
    print join("\n", @optasm), "\n";

Ok, so I'm not great about naming things, but this certainly uses the
filter model. Both optimization steps are optional. Valid code will be
generated if you comment either one out. However, I pass more than
text between the filters. The signatures look something like

 Parse::compile     : string (the regex) -> expression tree
 PreOptimize::pre_rewrite_optimize
                    : expression tree -> expression tree
 Rewrite::rewrite   : expression tree -> array of op structs
 Optimize::rewrite  : array of op structs -> array of op structs
 CodeGen::output    : array of op structs -> array of parrot assembly strings

So all optimization steps have to produce the same type as they
consume. But different data structures are relevant at different
points in the processing (PreOptimize should probably be called
Regex::TreeOptimize, and Optimize Regex::PeepholeOptimize). But using
multiple internal representations means that the parsing/rendering
library has to support all of them, and I'm not sure that it gains a
lot. (And if you don't use a common library, then adding something
like source code line number metadata will be hell; you'll be changing
the API.)

My vote is "Filters: yes. Text: not sure."

[Simon/Dan: can I check in my regex compiler under languages/regex?]

Re: Rewriting the assembler

Reply via email to