On Wed, Feb 27, 2002 at 07:07:26PM +0000, Simon Cozens wrote: > Granted, these components will share some library code, such as that to > parse out a line of assembly source, but I think that specialized elements > working on text is the way to go here. > > The real advantage of this method, other than making the overall design > and process of the assembler easy to understand, is that we can slot in > optimizations as additional filters at any stage of the assembler's operation. > > I'll produce a more specific PDD about how I'd like the assembler to look > if this idea makes any sense to anyone other than me.
The disadvantage of this approach is that you're stuck with a lowest-common denominator format, text, as the communication format between the components. Which means that each component has to know how to both parse and render, and it's more difficult to pass out-of-band information between components. If you want to preserve the text filter model, then the above problem can be worked around by requiring nearly all filters to use the same library code for parsing and rendering. You still have to define a metadata format, and your metadata will be forced to be fairly simple. With my regular expression compiler, I didn't use a text filter model, but I did use a filter model. It's somewhat OO, but only within each component (and mostly because I wanted to support multiple backends). Here's all but the 'use' statements from the complete driver program that takes an regular expression as a command-line argument and prints parrot opcodes to stdout: my $parser = Parse->new(); my $opt1 = PreOptimize->new(); my $rewrite = "Rewrite::$backend"->new(); my $opt2 = Optimize->new(); my $cgen = "CodeGen::$backend"->new(); my $tree = $parser->compile(shift()); $tree = $opt1->pre_rewrite_optimize($tree); my @code = $rewrite->rewrite($tree); my @optcode = $opt2->optimize(@code); my @optasm = $cgen->output(@optcode); print join("\n", @optasm), "\n"; Ok, so I'm not great about naming things, but this certainly uses the filter model. Both optimization steps are optional. Valid code will be generated if you comment either one out. However, I pass more than text between the filters. The signatures look something like Parse::compile : string (the regex) -> expression tree PreOptimize::pre_rewrite_optimize : expression tree -> expression tree Rewrite::rewrite : expression tree -> array of op structs Optimize::rewrite : array of op structs -> array of op structs CodeGen::output : array of op structs -> array of parrot assembly strings So all optimization steps have to produce the same type as they consume. But different data structures are relevant at different points in the processing (PreOptimize should probably be called Regex::TreeOptimize, and Optimize Regex::PeepholeOptimize). But using multiple internal representations means that the parsing/rendering library has to support all of them, and I'm not sure that it gains a lot. (And if you don't use a common library, then adding something like source code line number metadata will be hell; you'll be changing the API.) My vote is "Filters: yes. Text: not sure." [Simon/Dan: can I check in my regex compiler under languages/regex?]