This just about implements a jit for ARM. It doesn't actually do any ops in
assembler yet, except for end. It's names on the basis that it's for v3 or
later instructions. (I may have all the names slightly wonky, but IIRC v3
is ARM600 and later cores. StrongARM and ARM8 are v4, but the machine I've
got has other hardware that won't cope with the half word loads that v4
brings.) Strictly it's something like little endian, APCS 32, ARM v3
[is it even APCS-R? (Arm Procedure Call Standard). As it's using a frame
 pointer does that mean there's more that should be in the name?
 Not that gdb thinks that I got the frame pointer correct]

Would it be useful for parrot to be able to use the 32*32=>64 bit multiply
instructions that come in post ARM v3?

Problems that I remember that I encountered. (Comments in the code may
indicate more). Part of these were understanding things - it doesn't mean
that the current way is wrong, just that it wasn't obvious to me :-(

1: '}' is a necessary character in ARM assembler syntax, so jit2h.pl needs
   to be a bit smarter about deciding when to chop the end of a function

2: There is no terse way to load arbitrary 32 bit constants into a register
   with ARM instructions. There are 2 usual methods
   1: Put the constant in a constant pool within +- 4092 or so bytes of the
      PC, and load it with an offset from the PC.
   2: Make it with 1, 2 or 3 instructions. I believe that currently it is
      conjectured that it is possible to make any 32 bit value with 3 ARM
      instructions, and so far no-one has found any value that they couldn't
      make, but no-one has proved it possible and thereby made an algorithm
      that lets a program generate instructions to build a constant

   Either way, I found I was fighting the current jit which expects (at worst)
   to be able to split a 32 bit constant into 2 (possibly unequal) halves
   stored in two machine instructions. To be more flexible jit would need to
   know what some CPU registers contain (ie things like the current
   interpreter pointer), and be able to choose whether to get a value or
   pointer by arithmetic from a CPU register, by deferencing a CPU register
   (possibly with offset) or by giving up and loading a constant

   This will make more sense to anyone who gets hold of an ARM machine and
   then tries to write ops :-)

3: I wanted to put the pointer to the current interpreter in r7. This made
   the default precompiled "call" function have its branch somewhere wonky.
   It seems to me that Parrot::Jit->call should be returning a 2 item list
   the  bytecode, and the offset of the branching instruction in there.

4: I think in a RISC way, so expect the offset to be of the start of the
   instruction that needs butchering, not the byte within it. (How the sparc
   position was expressed confused me for a while).

it's a slow beast, particularly with -g:

$ ./test_parrot  examples/assembly/mops.pbc       
Iterations:    100000000
Estimated ops: 200000000
Elapsed time:  109.129854
M op/s:        1.832679

This was the first working jit, with Fix_cpcf_call() as
  ldr r0, [r0]
  mov pc, r0

Iterations:    100000000
Estimated ops: 200000000
Elapsed time:  65.109552
M op/s:        3.071746

This is the slightly faster jit, with Fix_cpcf_call() as ldr pc, [r0]

Iterations:    100000000
Estimated ops: 200000000
Elapsed time:  60.948834
M op/s:        3.281441
Segmentation fault

Which dmesg reports as:

test_parrot: unhandled page fault at pc=0x00000000, lr=0x00000000 (bad 
address=0x00000000, code 0)

and I think it may be the irritating hardware bug care of Digital's
engineers' mistake in the early StrongARMs which causes problems on page
faults that load PC.

Anyway, it's not very tested, but it seems that just binning the runops loop
gets a 75% speedup. :-)

**Beware** - I've no idea if loading the addresses of registers actually
works. The .pm code is still from sun4Generic.pm

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html

--- include/parrot/jit.h~       Tue Jan 29 14:05:45 2002
+++ include/parrot/jit.h        Thu Jan 31 16:52:40 2002
@@ -22,6 +22,10 @@
 static void write_32(char *instr_end, ptrcast_t value);
 typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg);
 #endif
+#ifdef ARMV3
+typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg,
+                      void *cur_interpreter);
+#endif
 
 
 #define MAX_SUBSTITUTION 3
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ jit/armv3/core.jit  Thu Jan 31 16:53:24 2002
@@ -0,0 +1,15 @@
+;
+;   armv3_core.jit 
+;
+; $Id:  $
+;
+
+Parrot_end {
+       ldmea   fp, {r4, r5, r6, r7, fp, sp, pc}
+}
+
+Parrot_noop {
+        # Seems that as recognises this and assembles mov r0, r0 for a nop.
+       nop
+}
+
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ lib/Parrot/Jit/armv3-linux.pm       Thu Jan 31 23:36:03 2002
@@ -0,0 +1,30 @@
+#
+# Parrot::Jit;
+#
+# $Id: $
+#
+
+package Parrot::Jit;
+
+use base qw(Parrot::Jit::armv3Generic);
+
+$OBJDUMP = "objdump -d";
+$AS      = "as";
+
+$OP_ARGUMENT_SIZE = 4; # Hack? I'm only using this for the pointer substitutions
+
+$Call_inmediate_arg_size = 12;
+$Call_address_arg_size = 16;
+$Call_start = 0;
+$Call_move = 0;
+#$Precompiled_call_position = 24; # This is the start of the BL instruction.
+# XXXX Hack. This is the start of the BL instruction if I use the r7 shortcut.
+# But jit2h.pl doesn't seem to have a very elegant way of returning this.
+$Precompiled_call_position = 16;
+# I think that Parrot::Jit::call should be returning a 2 item list - the
+# bytecode, and the offset of the call instruction in there. This would remove
+# all 5 of the above variables.
+%syscall_number = (
+);
+
+1;
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ lib/Parrot/Jit/armv3Generic.pm      Fri Feb  1 00:51:45 2002
@@ -0,0 +1,299 @@
+#
+# Parrot::Jit::armv3Generic;
+#
+# $Id $
+#
+
+package Parrot::Jit::armv3Generic;
+
+use IO::File;
+
+use constant DEBUG   => 1;
+use Carp;
+
+use constant TMP_OBJ => "t.o";
+use constant TMP_AS  => "t.s";
+
+# Maybe 0x00000000 would be a better dummy instruction:
+#    0:   00000000        andeq   r0, r0, r0
+use constant DUMMY_INSTR => 'nop';
+
+my $Argument = '[\&\*][a-zA-Z_]+\[\d+\]';
+my $Pointer_Argument = '\&[a-zA-Z_]+\[\d+\]';
+my $Literal_Argument = '\*[a-zA-Z_]+\[\d+\]';
+
+# Base address of registers are stored in the following CPU registers
+my %register_base_map = ( INT => "r4", NUM => "r5", STR => "r6",
+                          '&INTERPRETER' => "r7" );
+my $DUMMY_ARG_P = '0xFFFFFFFF'; # I hope SWINV 0xFFFFFF is still not re-used
+my $DUMMY_ARG_L = '0xFFFFFFFF';
+
+
+sub init() {
+   my $start =<<END;
+       mov     ip, sp
+       stmfd   sp!, {r4, r5, r6, r7, fp, ip, lr, pc}
+       sub     fp, ip, #4
+       mov     r4, r0
+       mov     r5, r1
+       mov     r6, r2
+       mov     r7, r3
+END
+
+    $start = Parrot::Jit->Assemble($start);
+    return $start;
+}
+
+# No working system call!!!!!
+sub system_call($$$) {
+    my ($class,$arg_c,$arg_v,$sys_n) = @_;
+
+    die "No system calls yet\n";
+}
+
+sub call($$) {
+    my ($class,$argc,$argv) = @_;
+
+    die "$argc > 4 on ARM" if $argc > 4;
+
+    my ($k,$assembly,$j,$l);
+
+    # Bletch. Global variable:
+    $Parrot::Jit::Call_move = 0;
+
+    for($k = 0; $k < $argc; $k++) {
+        $argv =~ s/([VA])([\&\*][a-zA-Z_]+\[\d+\])//;
+        $j = $1;
+        $l = $2;
+
+       # ARM calling conventions - for < 4 arguments uses r0..r3
+        if (($l eq '&INTERPRETER[0]')) {
+            if ($j eq 'V') {
+                $assembly .= "mov r$k, r7\n";
+                $Parrot::Jit::Call_move -= 8;
+            } else {
+                $assembly .= "ldr r$k, [r7]\n";
+                $Parrot::Jit::Call_move -= 12;
+            }
+        } else {
+            # This is sick, but there isn't a clean way to do general 32 bit
+            # constants in arm without loading them from a constant pool.
+            # The JIT doesn't (yet) let us make a constant pool here, or
+            # do the usual clever tricks with mov/mvn/add/orr/sub/rsb
+            # (let alone really clever tricks with non-standard immediate
+            # constants to set the carry flag that I've never needed) hence
+            # this pipelinestall-tastic branch:
+
+            #     load register pc relative to here -+
+            # +-  branch round constant              |
+            # |   constant                         <-+
+            # +->
+            #     (where this next instruction may be load register from
+            #      the address it points to)
+            #
+            # The special case above for the pointer to the interpreter shows
+            # that it is a lot cleaner if we know how to make an address from
+            # something we already have in registers or the stack.
+            # I'm wondering if for ARM it would be better to pass in
+            # current bytecode + 4092 as argument 5, and if we need
+            # CUR_OPCODE calculate that as an offset from arg 5
+            # (which will be on the stack at [fp, #4] for a 5 arg function)
+            # if a block of bytecode excedes (4092 * 2) bytes then add ARM
+            # instructions to shift our opcode pointer on by 4092 * 2.
+
+            $assembly .= "ldr r$k, [pc]\n add pc, pc, #0\n";
+            if ($j eq 'A') {
+                $assembly .= ".word L$l\n ldr r$k, [r$k]\n";
+            } else {
+                $assembly .= ".word $l\n";
+            }
+        }
+    }
+
+    # call and link to 24 bit offset
+
+    $assembly .= ".L1: bl L1\n";
+    return Parrot::Jit->Assemble($assembly);
+
+    # Alternatively, code of the form:
+    # adr     r0,  .L1
+    # ldmia   r0,  {r0, r1, r2} ; register list built by jit
+    # adr     r14, .L2          ; fake a bl
+    # b       <where ever>
+    # .L1:    r0 data
+    #         r1 data
+    #         r2 data
+    # .L2:                      ; next instruction - return point from func.
+
+    # would be much cleaner.
+    # (with ldr rN, [rN] for the deref arguments added in where needed,
+    # and code to stack things for > 4 arguments)
+}
+
+sub Fix_normal_call() {
+    return "";
+}
+
+sub Fix_cpcf_call() {
+    # return value from function cur_opcode in %o0, jump to *cur_opcode, which
+    # has been modified by jit to contain corresponding jit code
+    return Parrot::Jit->Assemble("ldr pc, [r0]\n");
+}
+
+sub Assemble($) {
+    my ($class,$body) = @_;
+
+    my ($line, $ln, $assembler,$n,$s, $reg_class);
+    my (@special,@special_arg);
+
+    $ln = 0;
+    $body =~ s/([^\n]*)\n?//;
+    $line = $1;
+    $assembler = "";
+    while (defined($line)) {
+        $line =~ s/^\s*//;
+        if (($line =~ m/^J/) || ($line =~ m/^C/) || ($line =~ m/^F/)) {
+            $assembler .= DUMMY_INSTR . "\n";
+            # Store the special instruction in the line where it will go.
+            $special[$ln] = $line;
+        } elsif ($line =~ m/^S/) {
+            $line =~ m/\((\w+)\s*,\s*(\d+)\s*,\s*([^\)]*)\)\s*/;
+            $body = system_call($2,$3,$1) . $body;
+            $ln--;
+        } elsif (($reg_class) = $line =~ m/(INT|NUM|STR)_REG/) {
+           # map parrot register set to appropriate base CPU register
+            $n = 0;
+            # XXX check this.
+            $line =~ s/($Argument)/[ $register_base_map{$reg_class} + 1 ]/;
+           if(defined($1)) {
+               $special_arg[$ln][$n++] = $1;
+           }
+            $assembler .= $line . "\n";
+        } elsif (0 and  $line =~ m/\&INTERPRETER\[(\d+)\]/) {
+            # Address of current interpreter is stored in a register.
+            # XXX This doesn't work with the default code generation, because
+            # the default code generation expects to be writing instructions
+            # that load a 32 bit constant from a .word
+            if ($1 == 0) {
+                $line =~ s/(\&INTERPRETER)\[\d+\]/$register_base_map{$1}/;
+                warn "line is now '$line'";
+            } else {
+                die "Can't do control stack deference yet";
+            }
+        } elsif ($line =~ m/\*JUMP_INT_CONST/) {
+            # XXX Sparc hangover
+           # This is good for 22-bit branches (which have a 24 bit range)
+           $n = 0;
+           $line =~ s/($Argument)/_$ln/;
+           if(defined($1)) {
+               $special_arg[$ln][$n++] = $1;
+           }
+            $assembler .= $line . "\n_$ln:\n";
+        } elsif ($line =~ m/$Argument/) {
+            $n = 0;
+
+            $line =~ s/L($Argument)/$DUMMY_ARG_L/;
+            $line =~ s/($Argument)/$DUMMY_ARG_P/;
+
+            if (defined($1)) {
+                $special_arg[$ln][$n++] = $1;
+            }
+            $assembler .= $line . "\n";
+        } else {
+            $assembler .= $line . "\n";
+        }
+        $ln++;
+       $line = undef;
+        if($body =~ m/([^\n]*)\n/){
+           $line = $1;
+           $body =~ s/[^\n]*\n//;
+       }
+    }
+
+    write_as($assembler,TMP_AS);
+    assemble(TMP_AS, TMP_OBJ);
+    return disassemble(TMP_OBJ,\@special_arg,\@special,$ln);
+}
+            
+sub write_as($$) {
+       my ($code, $target) = @_;
+
+       my $out = new IO::File "> $target"
+               or die "Could not write to $target: $!";
+
+    print $out <<'END';
+       .align  2
+       .global main
+        .type    main,function
+main:
+END
+  confess "no code" unless $code =~ /\w/s;
+    print $out $code;
+}
+
+sub assemble($$) {
+       my ($file, $obj) = @_;
+
+       print STDERR "Assembling:\n\n", (new IO::File $file)->getlines, "\n\n"
+               if DEBUG;
+
+       system $Parrot::Jit::AS." $file -o $obj";
+       die $Parrot::Jit::AS." $file failed" if (($? >> 8) != 0);
+}
+
+sub disassemble($$$$) {
+       my ($obj,$sa,$si,$l) = @_;
+
+    my ($result,@t);
+
+       print STDERR "Disassembly:\n\n" if DEBUG;
+
+       my $objdump = new IO::File $Parrot::Jit::OBJDUMP." $obj |"
+               or die "Could not run ".$Parrot::Jit::OBJDUMP." $obj: $!";
+
+       while (<$objdump>) { last if /main/ }
+
+    $ln = 0;
+       while (<$objdump>) {
+        if (m/^\s*$/) {
+            <$objdump>;
+            next;
+        }
+               my ($opcodes, $instr, $args) =
+                       /^\s* \w+: \s+ ( [0-9A-Fa-f]{8} ) \s+ (\w+) \s+ (.+)?/x;
+
+        if ((defined($opcodes)) && (defined($instr))) {
+            if (($instr eq DUMMY_INSTR) && defined(@$si[$ln])) {
+                $result .= @$si[$ln];
+            } else {
+                # Little endian assumption.
+                $opcodes =~ s/(..)(..)(..)(..)/\\x$4\\x$3\\x$2\\x$1/g;
+                if (defined(@$sa[$ln])) {
+                    $n = 0;
+                    # Look for our "special" address and "special" constant
+                    if ($opcodes =~ m/^(?:\\xFF){4}/i) {
+                        @t = @$sa[$ln];
+                        $s = $t[0][$n++];
+                        warn sprintf "Was %d '$opcodes'\n", length $opcodes;
+                        $opcodes =~ s/^(?:\\xFF){4}/$s/i;
+                        warn sprintf "Now %d '$opcodes'\n", length $opcodes;
+                    }
+                }
+                $result .= $opcodes;
+            }
+        }
+        $ln++;
+               print STDERR $_ if DEBUG;
+       }
+
+#    $result =~ s/\\x00 \\x00 (\\x.. \\x.. )JUMP\(([^\)]*)\)/JUMP($2) $1/;
+    
+       print STDERR "\n\nResult:\n\n" if DEBUG;
+
+       print STDERR $result . "\n\n" if DEBUG;
+
+    $result =~ s/\s//g;
+    return $result;
+}
+
+1;
--- Configure.pl~       Thu Jan 31 10:15:18 2002
+++ Configure.pl        Thu Jan 31 18:58:09 2002
@@ -152,6 +152,7 @@
 
 $jitarchname              =  "$cpuarch-$osname";
 $jitarchname                 =~ s/i[456]86/i386/i;
+$jitarchname                 =~ s/armv[34]l?/armv3/i;
 $jitarchname              =~ s/-(net|free|open)bsd$/-bsd/i;
 $jitcapable               = 0;
 
--- jit2h.pl~   Wed Jan 30 23:40:58 2002
+++ jit2h.pl    Thu Jan 31 14:57:38 2002
@@ -59,7 +59,7 @@
             $asm = "";
             next;
         }
-        if ($line =~ m/}/) {
+        if ($line =~ m/^}/) {
             $ops{$function} = Parrot::Jit->Assemble($asm);
             $function = undef;
             $body = undef;
--- jit.c~      Wed Jan 30 10:31:38 2002
+++ jit.c       Fri Feb  1 00:54:03 2002
@@ -8,6 +8,9 @@
 #include "parrot/jit.h"
 #include "parrot/jit_struct.h"
 
+#ifdef ARMV3
+static void write_24(void *instr_start, ptrcast_t value);
+#endif
 
 /* Don't ever count on any info here */
 
@@ -357,6 +360,12 @@
                                  ]
                       ); 
 
+            /* fprintf (stderr, "Want to branch22 to %p from instruction at %p\n",
+               address, &arena[v.info[i].position]); */
+#ifdef ARMV3
+            address = (INTVAL *)((char *)address - (arena + v.info[i].position) - 8);
+            write_24(&arena[v.info[i].position], (ptrcast_t)address);
+#else
 #ifdef SUN4
             address = (INTVAL *)((char *)address - (arena + v.info[i].position - 3));
             write_22(&arena[v.info[i].position], (ptrcast_t)address);
@@ -380,6 +389,7 @@
             
             memcpy(&arena[v.info[i].position],&address,OP_ARGUMENT_SIZE);
 #endif
+#endif
         }
         
         /* XXX the idea is to write all this functions in asm */
@@ -409,6 +419,12 @@
                         address = (INTVAL *)interpreter->op_func_table[*pc];
                         break;
             }
+            /* fprintf (stderr, "Want to branch to %p from instruction at %p\n",
+               address, &arena[v.info[i].position]); */
+#ifdef ARMV3
+            address = (INTVAL *)((char *)address - (arena + v.info[i].position) - 8);
+            write_24(&arena[v.info[i].position], (ptrcast_t)address);
+#else
 #ifdef SUN4
             address = (INTVAL *)((char *)address - (arena + v.info[i].position - 3));
             write_30(&arena[v.info[i].position], (ptrcast_t)address);
@@ -436,11 +452,17 @@
 
             memcpy(&arena[v.info[i].position],&address,OP_ARGUMENT_SIZE);
 #endif
+#endif
         }
 
         v = op_assembly[*pc].interpreter;
         for (i = 0; i < v.amount; i++)
         {
+#ifdef ARMV3
+            fprintf (stderr, "arm doesn't support interpreter relocations. Relocation 
+should be replaced with copy or dereference of r7. (number=%d)\n",
+                     v.info[i].number);
+            exit (1);
+#endif
             switch(v.info[i].number) {
                 case 0: 
                         address = (INTVAL *)interpreter; 
@@ -493,6 +515,8 @@
         pc += ivalue;
     }
 
+    /* Dump out the assembler we built - useful to feed to objdump -d */
+    /* write (0, arena_start, size); */
     return (jit_f)arena_start;
 }
 
@@ -569,6 +593,16 @@
 
 #endif
 
+#ifdef ARMV3
+
+/* Write 24 bit immediate value into PC relative branches */
+static void write_24(void *instr_start, ptrcast_t value) {
+    unsigned long *instr = (unsigned long *) instr_start;
+    value >>= 2; /* divide displacement by 4 */
+    *instr = (*instr & 0xFF000000) | (value & 0x00FFFFFF);
+}
+
+#endif
 
 /*
  * Local variables:
--- interpreter.c~      Wed Jan 30 10:31:35 2002
+++ interpreter.c       Fri Feb  1 00:22:10 2002
@@ -308,6 +308,13 @@
                         (void *)&interpreter->num_reg.registers[0],
                         (void *)&interpreter->string_reg.registers[0]);
 #endif
+#ifdef ARMV3
+    (jit_code)((void *)(&interpreter->int_reg.registers[0]),
+               (void *)&interpreter->num_reg.registers[0],
+               (void *)&interpreter->string_reg.registers[0],
+               (void *)interpreter);
+               
+#endif
 
 #else
     return;

Reply via email to