Re: Rewriting the assembler

Brian Wheeler Wed, 27 Feb 2002 12:51:06 -0800


On Wed, 2002-02-27 at 14:07, Simon Cozens wrote:
> I know some people have been talking about rewriting the assembler; I've
> had some more thoughts on this over the past couple of days.
> 
> First, I think that our assembler is going to be a reference implementation
> for those producing bytecode-emitting compilers. It does not need to be 
> fast, but it does need to be clear and easy to understand.
>


No arguments there.



> Some people have been talking about making the assembler more OO; in my
> opinion, this is a mistake. OO programming has the great advantage that you
> can abstract away a lot of the tricky bits, hiding the complexity. It however
> has the disadvantage that you hide away the complexity. Let's make this a
> nice, open, transparent program.
>
> The way I've thought about doing it so far is not unlike the Unix toolset
> model; small components doing a dedicated task. Assembly is essentially a 
> filtering process, and Unix filter techniques can be brought to bear on it.
> For instance, I've just written a little component which takes the assembly
> and expands the ops, turning "set" into "set_p_ic" or whatever it may be.

Interesting.

> 
> Granted, these components will share some library code, such as that to
> parse out a line of assembly source, but I think that specialized elements
> working on text is the way to go here.

What is wrong with a simple array (of arrays) that gets passed around?


> 
> The real advantage of this method, other than making the overall design
> and process of the assembler easy to understand, is that we can slot in
> optimizations as additional filters at any stage of the assembler's operation.

There certainly are advantages...but there are disadvantages too.  A
scalar holding a honking big chunk of text is going to be difficult to
identify what source line was originally used.

> 
> I'll produce a more specific PDD about how I'd like the assembler to look
> if this idea makes any sense to anyone other than me.
> 

Here's my hat, which I will now throw into the ring :)

* Yes, its OO, just so we don't have a freaking ton of variables laying
around which could potientially conflict with someone who wants to embed
this into a compiler or something

* sort of does what Simon wants, in that there are separate phases,
which could have extra ones inserted.

* doesn't actually dump bytecode, but will happily parse queens.pasm and
most of the other things I've thrown at it, providing it doesn't have
macros or label arithmetic.


Two files are here:

* a completely hideous "test-newassembler.pl" which calls the module.
* the 350 line NewAssembler.pm thingy itself.


This is just something I'm tinkering with.  Vomit on it if you must :)

Brian

package Parrot::NewAssembler;

use Carp;
use Parrot::Config;
use Parrot::Op;
use Parrot::OpLib::core;
use Parrot::PMC qw(%pmc_types);
use Parrot::PackFile;
use Parrot::Types;
use Symbol;


sub new {
    my($class,%args)=@_;
    my $self={

        # output file information
        'packfile'=>new Parrot::PackFile,
        'bytecode'=>'',
        'listing'=>'',

        # assembly
        'PC'=>0,        
        'opcodes'=>{},
        'errors'=>[],

        # constants
        'constants'=>{},
        'constant_data'=>[],

        # Label/fixup information
        'label'=>{},
        'last_label'=>'__start__',
        'fixup'=>{},

        # preprocessor data
        'macro'=>{},
        'equate'=>{},
        'include_path'=>[],


    };

   
    # Initialize opcode table.
    foreach my $op (@$Parrot::OpLib::core::ops) {
        $self->{'opcodes'}{$op->full_name}=$op;
    }


    return bless $self,$class;
}


sub error {
    my($self,$msg,$file,$line)=@_;
    push(@{$self->{errors}},"ERROR ($file,$line): $msg");
}


#
# preprocess:  preprocess assembler source, and convert into internal
#              format.  
#
sub preprocess {
    my($self,$scalar_code,$file,$line)=@_;
    my $code=[];
    my $counter=$line || 1;
    my @lines=split(/\n/,$scalar_code);
    while(scalar(@lines)) {
        my $l=shift(@lines); # line to pre-process
        my $rl=$l; # raw line
        $l=~s/\#.*//;
        $l=~s/^\s+//;
        $l=~s/\s+$//;

        # do equate substitution (if any)
        foreach (keys(%{$self->{equate}})) {
            $l=~s/\b$_\b/$self->{equate}{$_}/g;
        }
        
        # handle include directive
        if($l=~m/INCLUDE\s+['"](.+)["']/) {
            # an include directive has been found.
            my($filename)=$1;
            if(! -e $filename) {
                my $found=0;
                foreach my $path (@{$self->{include_path}}) {
                    if(-e "$path/$filename") {
                        $filename="$path/$filename";
                        $found=1;
                        last;
                    }
                }
                if(!$found) {
                    $self->error("'$filename' not found.",$file,$line);
                    return undef;
                }
            }
            open(H,$filename);
            my $c=join("",<H>);
            close(H);
            my $pp=$self->preprocess($c,$filename,1);
            push(@$code,[$file,$counter,'',$rl]);
            push(@$code,@$pp);
            $counter++;
            next;
        }

        # handle equate assignment.
        if($l=~m/^([_A-Za-z]\w*)\s+EQU\s+(.+)$/) {
            $self->{equate}{$1}=$2;
            $l=""; # clear line of code.
        }

        # handle macro definition
        if($l=~m/^([_A-Za-z]\w*)\s+MACRO(?:\s+(.+))?$/) {
            my($name,$args)=($1,$2);
            if(exists($self->{macro}{$name})) {
                # macro already exists!
            } else {
                

            }
        }


        # default to normal code: add it to the array.
        push(@$code,[$file,$counter,$l,$rl]);
        $counter++;
    }

   

    foreach (@$code) {
        my($file,$line,$c,$raw)=@$_;
        print "[$file:$line] $c == $raw\n";
    }

    return $code;
}




#
# assemble:  assemble 'pure' source code (no comments, macros, etc)
#
sub assemble {
    my($self,$code,$file,$line)=@_;
    if(ref($code) ne "ARRAY") {
        # inline parrot, convert to internal format.
        $code=$self->preprocess($code,$file,$line);
    }
    foreach $data (@$code) {
        my($file,$line,$code,$rawcode)=@$data;
        if($code=~m/^(\S+):/) {
            # the line has a label.
            my($label)=lc($1);
            if($label=~m/^\$([_A-Z0-9]+)/i) {
                # local label, canonize it.
                $label=$1;
                $label=$self->{'last_label'}.".".$label;
            } else {
                # this is a global label, reset the 'last_label'.
                $self->{'last_label'}=$label;
            }

            if(exists($self->{label}{$label})) {
                # we've seen this label already.
                ## FIXME
            }
            $self->{label}{$label}=$self->{PC};

            # we can now do any needed fixups for this label.
            ## FIXME

            $code=~s/^(\S+:)\s*//; # remove the label.
        }

        # label arithmetic
        ## FIXME



        # constantize string args
        $code =~ s{([NU])?"(((\\")|[^"])*)"}{constantize_string($self,$2, $1)}egx;
        $code=~s/,/ /g;
        my($op,@args)=split(/\s+/,$code);
        next if($op eq ""); # this line only contained a label.
        # check if op is really a macro.




        my $arg_pc=$self->{PC}+1;
        foreach my $arg (@args) {
            $arg_pc+=1;
            next if($arg=~m/^\[/); # already been fixed up.

            # check for registers
            if($arg=~m/^([INPS])(\d+)$/i) {
                my($type,$num)=(lc $1,$2);
                if($num < 32) {
                    $arg="[$type:$num]";
                    next;
                }
            }
            
            # check for labels
            if($arg=~m/^([_A-Z][_A-Z0-9]*(\.[_A-Z0-9]*)?)/i) {
                # a global label, with optional local
                my($glabel,$llabel)=($1,$2);
                $glabel=lc($glabel);
                if(exists($self->{'label'}{$glabel})) {
                    $arg="[ic:$self->{'label'}{$glabel}]";
                } else {
                    push(@{$self->{'fixup'}{$glabel}},$arg_pc);
                    $arg="[ic:-123456789]";
                }
                next;
            } elsif($arg=~m/^\$([_A-Z0-9]+)/i) {
                # a local label...canonize it.
                my $label=$1;
                $label=lc($self->{'last_label'}.".".$label);
                if(exists($self->{'label'}{$label})) {
                    $arg="[ic:$self->{'label'}{$label}]";
                } else {
                    push(@{$self->{'fixup'}{$label}},$arg_pc);
                    $arg="[ic:-123456789]";
                }
                next;
            }

            # check for integers
            my $intval=constantize_integer($self,$arg);
            if(defined($intval)) {
                $arg="[ic:$intval]";
                next;
            }

            # default to numeric
            my $numval=constantize_number($self,$arg);
            $arg="[nc:$numval]";
        }
        $self->{PC}=$arg_pc;

        # at this point, all arguments should be fixed up.  Grab signature
        my(@sig);
        foreach (@args) {
            m/\[([^:]+):/;
            push(@sig,$1);
        }
        my $signature=$op.(scalar(@sig)?"_".join("_",@sig):"");

        if(exists $self->{'opcodes'}{$signature}) {
            $signature="[ok: $signature]";
            
            



        } else {
            $signature="[not found $signature]";
        }



        print "($file:$line:$self->{PC}): ",join("|",$signature,@args),"   $rawcode\n";


    }


}





sub constantize_string {
    my $self = shift;
    my $s = shift;
    my $p = shift || "";

    my %encodings=('' => 0, 'N' => 0, 'U' => 3);
    my %escape = (
                  'a'  => "\a",
                  'n'  => "\n",
                  'r'  => "\r",
                  't'  => "\t",
                  '\\' => '\\'
                  );

    my $e = $encodings{$p};

    confess if !defined $s || !defined $e;

    $s=~s/\\(0\d*)/chr(oct($1))/eg;
    $s=~s/\\x([0-9a-fA-F]{1,2})/chr(hex($1))/ge;
    $s=~s/\\([anrt\\])/$escape{$1}/ge;

    if(!exists($self->{constants}{s}{$s}{$e})) {
        push(@{$self->{constant_data}},['s',$s,$e]);
        $self->{constants}{s}{$s}{$e}=$#{$self->{constant_data}};
    }

    return "[sc:$self->{constants}{s}{$s}{$e}]";
}

sub constantize_integer {
    my $self = shift;
    my $i = shift;

    if ($i =~ /^[+-]?0b[01]+$/i) {
      $i = from_binary( $i );
    }
    elsif ($i =~ /^[+-]?0x?[0-9a-f]*$/i) {
      $i = oct($i);
    }
    elsif ($i =~ m/^[+-]?\d+$/) {
      # Good ones
    } else {
        $i=undef;
    }
    # XXX parrot cannot currently handle integers over 2 ** 31
    if( $i > (2 ** 31) || $i < -(2**31) ) {
      error( "Cannot have integer $i because it is greater than 2 ** 31.\n", $file, 
$line );
    }
    return $i;
}


sub constantize_number {
    my $self = shift;
    my $n = shift;
    if(!exists($self->{'constants'}{n}{$n})) {
        push(@{$self->{constant_data}},['n',$n]);
        $self->{constants}{n}{$n}=$#{$self->{constant_data}};
    }
    return $self->{'constants'}{n}{$n};
}




1;

__END__

#!/usr/bin/perl
use Parrot::NewAssembler;

$asm=new Parrot::NewAssembler;


if($ARGV[0] ne "") {
    open(H,$ARGV[0]);
    $filedata=join("",<H>);
    close(H);
    $code=$asm->preprocess($filedata,$ARGV[0]);
    $asm->assemble($code);

} else {
my $p=$asm->preprocess('
queen_at EQU queen_fat
$000: set S0, "hello world\n"
set S1, "bye world\n"
test: set S2, "hello world\n"
set S3, S0
set N0, 1.3323
$000: set N1, N0
set N2, -3.23
set I0, 1
$001: set I1, I0
set I2, -1
branch $000
branch __START__.000
branch test
branch foo
foo: end
end

INCLUDE "../examples/assembly/queens.pasm"

');

$asm->assemble($p);

};

Re: Rewriting the assembler

Reply via email to