This just about implements a jit for ARM. It doesn't actually do any ops in assembler yet, except for end. It's names on the basis that it's for v3 or later instructions. (I may have all the names slightly wonky, but IIRC v3 is ARM600 and later cores. StrongARM and ARM8 are v4, but the machine I've got has other hardware that won't cope with the half word loads that v4 brings.) Strictly it's something like little endian, APCS 32, ARM v3 [is it even APCS-R? (Arm Procedure Call Standard). As it's using a frame pointer does that mean there's more that should be in the name? Not that gdb thinks that I got the frame pointer correct]
Would it be useful for parrot to be able to use the 32*32=>64 bit multiply instructions that come in post ARM v3? Problems that I remember that I encountered. (Comments in the code may indicate more). Part of these were understanding things - it doesn't mean that the current way is wrong, just that it wasn't obvious to me :-( 1: '}' is a necessary character in ARM assembler syntax, so jit2h.pl needs to be a bit smarter about deciding when to chop the end of a function 2: There is no terse way to load arbitrary 32 bit constants into a register with ARM instructions. There are 2 usual methods 1: Put the constant in a constant pool within +- 4092 or so bytes of the PC, and load it with an offset from the PC. 2: Make it with 1, 2 or 3 instructions. I believe that currently it is conjectured that it is possible to make any 32 bit value with 3 ARM instructions, and so far no-one has found any value that they couldn't make, but no-one has proved it possible and thereby made an algorithm that lets a program generate instructions to build a constant Either way, I found I was fighting the current jit which expects (at worst) to be able to split a 32 bit constant into 2 (possibly unequal) halves stored in two machine instructions. To be more flexible jit would need to know what some CPU registers contain (ie things like the current interpreter pointer), and be able to choose whether to get a value or pointer by arithmetic from a CPU register, by deferencing a CPU register (possibly with offset) or by giving up and loading a constant This will make more sense to anyone who gets hold of an ARM machine and then tries to write ops :-) 3: I wanted to put the pointer to the current interpreter in r7. This made the default precompiled "call" function have its branch somewhere wonky. It seems to me that Parrot::Jit->call should be returning a 2 item list the bytecode, and the offset of the branching instruction in there. 4: I think in a RISC way, so expect the offset to be of the start of the instruction that needs butchering, not the byte within it. (How the sparc position was expressed confused me for a while). it's a slow beast, particularly with -g: $ ./test_parrot examples/assembly/mops.pbc Iterations: 100000000 Estimated ops: 200000000 Elapsed time: 109.129854 M op/s: 1.832679 This was the first working jit, with Fix_cpcf_call() as ldr r0, [r0] mov pc, r0 Iterations: 100000000 Estimated ops: 200000000 Elapsed time: 65.109552 M op/s: 3.071746 This is the slightly faster jit, with Fix_cpcf_call() as ldr pc, [r0] Iterations: 100000000 Estimated ops: 200000000 Elapsed time: 60.948834 M op/s: 3.281441 Segmentation fault Which dmesg reports as: test_parrot: unhandled page fault at pc=0x00000000, lr=0x00000000 (bad address=0x00000000, code 0) and I think it may be the irritating hardware bug care of Digital's engineers' mistake in the early StrongARMs which causes problems on page faults that load PC. Anyway, it's not very tested, but it seems that just binning the runops loop gets a 75% speedup. :-) **Beware** - I've no idea if loading the addresses of registers actually works. The .pm code is still from sun4Generic.pm Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html --- include/parrot/jit.h~ Tue Jan 29 14:05:45 2002 +++ include/parrot/jit.h Thu Jan 31 16:52:40 2002 @@ -22,6 +22,10 @@ static void write_32(char *instr_end, ptrcast_t value); typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg); #endif +#ifdef ARMV3 +typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg, + void *cur_interpreter); +#endif #define MAX_SUBSTITUTION 3 --- /dev/null Mon Jul 16 22:57:44 2001 +++ jit/armv3/core.jit Thu Jan 31 16:53:24 2002 @@ -0,0 +1,15 @@ +; +; armv3_core.jit +; +; $Id: $ +; + +Parrot_end { + ldmea fp, {r4, r5, r6, r7, fp, sp, pc} +} + +Parrot_noop { + # Seems that as recognises this and assembles mov r0, r0 for a nop. + nop +} + --- /dev/null Mon Jul 16 22:57:44 2001 +++ lib/Parrot/Jit/armv3-linux.pm Thu Jan 31 23:36:03 2002 @@ -0,0 +1,30 @@ +# +# Parrot::Jit; +# +# $Id: $ +# + +package Parrot::Jit; + +use base qw(Parrot::Jit::armv3Generic); + +$OBJDUMP = "objdump -d"; +$AS = "as"; + +$OP_ARGUMENT_SIZE = 4; # Hack? I'm only using this for the pointer substitutions + +$Call_inmediate_arg_size = 12; +$Call_address_arg_size = 16; +$Call_start = 0; +$Call_move = 0; +#$Precompiled_call_position = 24; # This is the start of the BL instruction. +# XXXX Hack. This is the start of the BL instruction if I use the r7 shortcut. +# But jit2h.pl doesn't seem to have a very elegant way of returning this. +$Precompiled_call_position = 16; +# I think that Parrot::Jit::call should be returning a 2 item list - the +# bytecode, and the offset of the call instruction in there. This would remove +# all 5 of the above variables. +%syscall_number = ( +); + +1; --- /dev/null Mon Jul 16 22:57:44 2001 +++ lib/Parrot/Jit/armv3Generic.pm Fri Feb 1 00:51:45 2002 @@ -0,0 +1,299 @@ +# +# Parrot::Jit::armv3Generic; +# +# $Id $ +# + +package Parrot::Jit::armv3Generic; + +use IO::File; + +use constant DEBUG => 1; +use Carp; + +use constant TMP_OBJ => "t.o"; +use constant TMP_AS => "t.s"; + +# Maybe 0x00000000 would be a better dummy instruction: +# 0: 00000000 andeq r0, r0, r0 +use constant DUMMY_INSTR => 'nop'; + +my $Argument = '[\&\*][a-zA-Z_]+\[\d+\]'; +my $Pointer_Argument = '\&[a-zA-Z_]+\[\d+\]'; +my $Literal_Argument = '\*[a-zA-Z_]+\[\d+\]'; + +# Base address of registers are stored in the following CPU registers +my %register_base_map = ( INT => "r4", NUM => "r5", STR => "r6", + '&INTERPRETER' => "r7" ); +my $DUMMY_ARG_P = '0xFFFFFFFF'; # I hope SWINV 0xFFFFFF is still not re-used +my $DUMMY_ARG_L = '0xFFFFFFFF'; + + +sub init() { + my $start =<<END; + mov ip, sp + stmfd sp!, {r4, r5, r6, r7, fp, ip, lr, pc} + sub fp, ip, #4 + mov r4, r0 + mov r5, r1 + mov r6, r2 + mov r7, r3 +END + + $start = Parrot::Jit->Assemble($start); + return $start; +} + +# No working system call!!!!! +sub system_call($$$) { + my ($class,$arg_c,$arg_v,$sys_n) = @_; + + die "No system calls yet\n"; +} + +sub call($$) { + my ($class,$argc,$argv) = @_; + + die "$argc > 4 on ARM" if $argc > 4; + + my ($k,$assembly,$j,$l); + + # Bletch. Global variable: + $Parrot::Jit::Call_move = 0; + + for($k = 0; $k < $argc; $k++) { + $argv =~ s/([VA])([\&\*][a-zA-Z_]+\[\d+\])//; + $j = $1; + $l = $2; + + # ARM calling conventions - for < 4 arguments uses r0..r3 + if (($l eq '&INTERPRETER[0]')) { + if ($j eq 'V') { + $assembly .= "mov r$k, r7\n"; + $Parrot::Jit::Call_move -= 8; + } else { + $assembly .= "ldr r$k, [r7]\n"; + $Parrot::Jit::Call_move -= 12; + } + } else { + # This is sick, but there isn't a clean way to do general 32 bit + # constants in arm without loading them from a constant pool. + # The JIT doesn't (yet) let us make a constant pool here, or + # do the usual clever tricks with mov/mvn/add/orr/sub/rsb + # (let alone really clever tricks with non-standard immediate + # constants to set the carry flag that I've never needed) hence + # this pipelinestall-tastic branch: + + # load register pc relative to here -+ + # +- branch round constant | + # | constant <-+ + # +-> + # (where this next instruction may be load register from + # the address it points to) + # + # The special case above for the pointer to the interpreter shows + # that it is a lot cleaner if we know how to make an address from + # something we already have in registers or the stack. + # I'm wondering if for ARM it would be better to pass in + # current bytecode + 4092 as argument 5, and if we need + # CUR_OPCODE calculate that as an offset from arg 5 + # (which will be on the stack at [fp, #4] for a 5 arg function) + # if a block of bytecode excedes (4092 * 2) bytes then add ARM + # instructions to shift our opcode pointer on by 4092 * 2. + + $assembly .= "ldr r$k, [pc]\n add pc, pc, #0\n"; + if ($j eq 'A') { + $assembly .= ".word L$l\n ldr r$k, [r$k]\n"; + } else { + $assembly .= ".word $l\n"; + } + } + } + + # call and link to 24 bit offset + + $assembly .= ".L1: bl L1\n"; + return Parrot::Jit->Assemble($assembly); + + # Alternatively, code of the form: + # adr r0, .L1 + # ldmia r0, {r0, r1, r2} ; register list built by jit + # adr r14, .L2 ; fake a bl + # b <where ever> + # .L1: r0 data + # r1 data + # r2 data + # .L2: ; next instruction - return point from func. + + # would be much cleaner. + # (with ldr rN, [rN] for the deref arguments added in where needed, + # and code to stack things for > 4 arguments) +} + +sub Fix_normal_call() { + return ""; +} + +sub Fix_cpcf_call() { + # return value from function cur_opcode in %o0, jump to *cur_opcode, which + # has been modified by jit to contain corresponding jit code + return Parrot::Jit->Assemble("ldr pc, [r0]\n"); +} + +sub Assemble($) { + my ($class,$body) = @_; + + my ($line, $ln, $assembler,$n,$s, $reg_class); + my (@special,@special_arg); + + $ln = 0; + $body =~ s/([^\n]*)\n?//; + $line = $1; + $assembler = ""; + while (defined($line)) { + $line =~ s/^\s*//; + if (($line =~ m/^J/) || ($line =~ m/^C/) || ($line =~ m/^F/)) { + $assembler .= DUMMY_INSTR . "\n"; + # Store the special instruction in the line where it will go. + $special[$ln] = $line; + } elsif ($line =~ m/^S/) { + $line =~ m/\((\w+)\s*,\s*(\d+)\s*,\s*([^\)]*)\)\s*/; + $body = system_call($2,$3,$1) . $body; + $ln--; + } elsif (($reg_class) = $line =~ m/(INT|NUM|STR)_REG/) { + # map parrot register set to appropriate base CPU register + $n = 0; + # XXX check this. + $line =~ s/($Argument)/[ $register_base_map{$reg_class} + 1 ]/; + if(defined($1)) { + $special_arg[$ln][$n++] = $1; + } + $assembler .= $line . "\n"; + } elsif (0 and $line =~ m/\&INTERPRETER\[(\d+)\]/) { + # Address of current interpreter is stored in a register. + # XXX This doesn't work with the default code generation, because + # the default code generation expects to be writing instructions + # that load a 32 bit constant from a .word + if ($1 == 0) { + $line =~ s/(\&INTERPRETER)\[\d+\]/$register_base_map{$1}/; + warn "line is now '$line'"; + } else { + die "Can't do control stack deference yet"; + } + } elsif ($line =~ m/\*JUMP_INT_CONST/) { + # XXX Sparc hangover + # This is good for 22-bit branches (which have a 24 bit range) + $n = 0; + $line =~ s/($Argument)/_$ln/; + if(defined($1)) { + $special_arg[$ln][$n++] = $1; + } + $assembler .= $line . "\n_$ln:\n"; + } elsif ($line =~ m/$Argument/) { + $n = 0; + + $line =~ s/L($Argument)/$DUMMY_ARG_L/; + $line =~ s/($Argument)/$DUMMY_ARG_P/; + + if (defined($1)) { + $special_arg[$ln][$n++] = $1; + } + $assembler .= $line . "\n"; + } else { + $assembler .= $line . "\n"; + } + $ln++; + $line = undef; + if($body =~ m/([^\n]*)\n/){ + $line = $1; + $body =~ s/[^\n]*\n//; + } + } + + write_as($assembler,TMP_AS); + assemble(TMP_AS, TMP_OBJ); + return disassemble(TMP_OBJ,\@special_arg,\@special,$ln); +} + +sub write_as($$) { + my ($code, $target) = @_; + + my $out = new IO::File "> $target" + or die "Could not write to $target: $!"; + + print $out <<'END'; + .align 2 + .global main + .type main,function +main: +END + confess "no code" unless $code =~ /\w/s; + print $out $code; +} + +sub assemble($$) { + my ($file, $obj) = @_; + + print STDERR "Assembling:\n\n", (new IO::File $file)->getlines, "\n\n" + if DEBUG; + + system $Parrot::Jit::AS." $file -o $obj"; + die $Parrot::Jit::AS." $file failed" if (($? >> 8) != 0); +} + +sub disassemble($$$$) { + my ($obj,$sa,$si,$l) = @_; + + my ($result,@t); + + print STDERR "Disassembly:\n\n" if DEBUG; + + my $objdump = new IO::File $Parrot::Jit::OBJDUMP." $obj |" + or die "Could not run ".$Parrot::Jit::OBJDUMP." $obj: $!"; + + while (<$objdump>) { last if /main/ } + + $ln = 0; + while (<$objdump>) { + if (m/^\s*$/) { + <$objdump>; + next; + } + my ($opcodes, $instr, $args) = + /^\s* \w+: \s+ ( [0-9A-Fa-f]{8} ) \s+ (\w+) \s+ (.+)?/x; + + if ((defined($opcodes)) && (defined($instr))) { + if (($instr eq DUMMY_INSTR) && defined(@$si[$ln])) { + $result .= @$si[$ln]; + } else { + # Little endian assumption. + $opcodes =~ s/(..)(..)(..)(..)/\\x$4\\x$3\\x$2\\x$1/g; + if (defined(@$sa[$ln])) { + $n = 0; + # Look for our "special" address and "special" constant + if ($opcodes =~ m/^(?:\\xFF){4}/i) { + @t = @$sa[$ln]; + $s = $t[0][$n++]; + warn sprintf "Was %d '$opcodes'\n", length $opcodes; + $opcodes =~ s/^(?:\\xFF){4}/$s/i; + warn sprintf "Now %d '$opcodes'\n", length $opcodes; + } + } + $result .= $opcodes; + } + } + $ln++; + print STDERR $_ if DEBUG; + } + +# $result =~ s/\\x00 \\x00 (\\x.. \\x.. )JUMP\(([^\)]*)\)/JUMP($2) $1/; + + print STDERR "\n\nResult:\n\n" if DEBUG; + + print STDERR $result . "\n\n" if DEBUG; + + $result =~ s/\s//g; + return $result; +} + +1; --- Configure.pl~ Thu Jan 31 10:15:18 2002 +++ Configure.pl Thu Jan 31 18:58:09 2002 @@ -152,6 +152,7 @@ $jitarchname = "$cpuarch-$osname"; $jitarchname =~ s/i[456]86/i386/i; +$jitarchname =~ s/armv[34]l?/armv3/i; $jitarchname =~ s/-(net|free|open)bsd$/-bsd/i; $jitcapable = 0; --- jit2h.pl~ Wed Jan 30 23:40:58 2002 +++ jit2h.pl Thu Jan 31 14:57:38 2002 @@ -59,7 +59,7 @@ $asm = ""; next; } - if ($line =~ m/}/) { + if ($line =~ m/^}/) { $ops{$function} = Parrot::Jit->Assemble($asm); $function = undef; $body = undef; --- jit.c~ Wed Jan 30 10:31:38 2002 +++ jit.c Fri Feb 1 00:54:03 2002 @@ -8,6 +8,9 @@ #include "parrot/jit.h" #include "parrot/jit_struct.h" +#ifdef ARMV3 +static void write_24(void *instr_start, ptrcast_t value); +#endif /* Don't ever count on any info here */ @@ -357,6 +360,12 @@ ] ); + /* fprintf (stderr, "Want to branch22 to %p from instruction at %p\n", + address, &arena[v.info[i].position]); */ +#ifdef ARMV3 + address = (INTVAL *)((char *)address - (arena + v.info[i].position) - 8); + write_24(&arena[v.info[i].position], (ptrcast_t)address); +#else #ifdef SUN4 address = (INTVAL *)((char *)address - (arena + v.info[i].position - 3)); write_22(&arena[v.info[i].position], (ptrcast_t)address); @@ -380,6 +389,7 @@ memcpy(&arena[v.info[i].position],&address,OP_ARGUMENT_SIZE); #endif +#endif } /* XXX the idea is to write all this functions in asm */ @@ -409,6 +419,12 @@ address = (INTVAL *)interpreter->op_func_table[*pc]; break; } + /* fprintf (stderr, "Want to branch to %p from instruction at %p\n", + address, &arena[v.info[i].position]); */ +#ifdef ARMV3 + address = (INTVAL *)((char *)address - (arena + v.info[i].position) - 8); + write_24(&arena[v.info[i].position], (ptrcast_t)address); +#else #ifdef SUN4 address = (INTVAL *)((char *)address - (arena + v.info[i].position - 3)); write_30(&arena[v.info[i].position], (ptrcast_t)address); @@ -436,11 +452,17 @@ memcpy(&arena[v.info[i].position],&address,OP_ARGUMENT_SIZE); #endif +#endif } v = op_assembly[*pc].interpreter; for (i = 0; i < v.amount; i++) { +#ifdef ARMV3 + fprintf (stderr, "arm doesn't support interpreter relocations. Relocation +should be replaced with copy or dereference of r7. (number=%d)\n", + v.info[i].number); + exit (1); +#endif switch(v.info[i].number) { case 0: address = (INTVAL *)interpreter; @@ -493,6 +515,8 @@ pc += ivalue; } + /* Dump out the assembler we built - useful to feed to objdump -d */ + /* write (0, arena_start, size); */ return (jit_f)arena_start; } @@ -569,6 +593,16 @@ #endif +#ifdef ARMV3 + +/* Write 24 bit immediate value into PC relative branches */ +static void write_24(void *instr_start, ptrcast_t value) { + unsigned long *instr = (unsigned long *) instr_start; + value >>= 2; /* divide displacement by 4 */ + *instr = (*instr & 0xFF000000) | (value & 0x00FFFFFF); +} + +#endif /* * Local variables: --- interpreter.c~ Wed Jan 30 10:31:35 2002 +++ interpreter.c Fri Feb 1 00:22:10 2002 @@ -308,6 +308,13 @@ (void *)&interpreter->num_reg.registers[0], (void *)&interpreter->string_reg.registers[0]); #endif +#ifdef ARMV3 + (jit_code)((void *)(&interpreter->int_reg.registers[0]), + (void *)&interpreter->num_reg.registers[0], + (void *)&interpreter->string_reg.registers[0], + (void *)interpreter); + +#endif #else return;