Hi Jakob,
while LLVM is surely an interesting project, I think it would by far be
overkill for what we need here.
What I'm doing is extremely simple. I defined a (mostly) single-address
assembly language, with a rather basic set of instructions which I
believe can easily be mapped to differing architectures. At least, there
are no assumptions about arity, register sets or instruction formats.
And, most important, it is quite readable (as opposed the most current
assembly languages (including LLVM)!
Though I think it is still too early to publish yet, let me show you
example from the current sources ('car' and 'if').
# (c....r 'lst) -> any
(code 'doCarE_E 2)
push X
ldX E
ldE (E CDR) # Evaluate CADR
ldE (E)
eval
num? E # Check list
jnz lstErrEX
ldE (E) # Take CAR
pop X
ret
On x86-64 this expands to:
.balign 16
nop
nop
.global doCarE_E
doCarE_E:
pushq %r13
movq %rbx, %r13
movq 8(%rbx), %rbx
movq (%rbx), %rbx
test $0x06, %bl
jnz 2f
test $0x08, %bl
jnz 1f
call evListE_E
jmp 2f
1:
movq (%rbx), %rbx
2:
testb $0x06, %bl
jnz lstErrEX
movq (%rbx), %rbx
popq %r13
ret
'if' is a little longer:
# (if 'any1 'any2 . prg) -> any
(code 'doIfE_E 2)
ldE (E CDR) # Body
push (E CDR) # Push rest
ldE (E) # Get condition
eval # Eval condition
nil? E
if ne # Non-NIL
stE (AtSym)
pop E # Get rest
ldE (E) # Consequent
eval/ret
end
xchX (S) # Get rest in X
ldX (X CDR) # Else
do
ldE (X)
ldX (X CDR)
eval
atom? X # Atom?
until nz # Yes
pop X
ret
with this on x86-64:
.balign 16
nop
nop
.global doIfE_E
doIfE_E:
movq 8(%rbx), %rbx
pushq 8(%rbx)
movq (%rbx), %rbx
test $0x06, %bl
jnz 2f
test $0x08, %bl
jnz 1f
call evListE_E
jmp 2f
1:
movq (%rbx), %rbx
2:
cmpq $Nil, %rbx
jz .125
movq %rbx, AtSym
popq %rbx
movq (%rbx), %rbx
test $0x06, %bl
jnz ret
test $0x08, %bl
jz evListE_E
movq (%rbx), %rbx
ret
.125:
xchg %r13, (%rsp)
movq 8(%r13), %r13
.126:
movq (%r13), %rbx
movq 8(%r13), %r13
test $0x06, %bl
jnz 2f
test $0x08, %bl
jnz 1f
call evListE_E
jmp 2f
1:
movq (%rbx), %rbx
2:
testb $0x0E, %r13b
jz .126
popq %r13
ret
The mapping of the individual instructions is rather straightforward. On
the x86 architecture, most of them expand to a single target
instruction.
The machine register set is defined as:
+---+---+---+---+---+---+---+---+
| A | B | \ [A]ccumulator
+---+---+---+---+---+---+---+---+ D [B]yte register
| C | / [C]ount register
+---+---+---+---+---+---+---+---+ [D]ouble register
| E | [E]xpression register
+---+---+---+---+---+---+---+---+
+---+---+---+---+---+---+---+---+
| X | [X] Index register
+---+---+---+---+---+---+---+---+ [Y] Index register
| Y | [Z] Index register
+---+---+---+---+---+---+---+---+
| Z |
+---+---+---+---+---+---+---+---+
+---+---+---+---+---+---+---+---+
| L | [L]ink register
+---+---+---+---+---+---+---+---+ [S]tack pointer
| S |
+---+---+---+---+---+---+---+---+
+-------------------------------+
| [z]ero [s]ign [c]arry | [F]lags
+-------------------------------+
Source Adressing Modes:
ldA 1234 # Immediate
ldA R # Register
ldA Label # Direct
ldA (R) # Indexed
ldA (R 8) # Indexed with offset
ldA (R OFFS)
ldA (Global) # Indirect
ldA (Global OFFS) # Indirect with offset
Destination Adressing Modes:
stA (Global) # Indirect
stA (Global OFFS) # Indirect with offset
ldA R # Register
stA (R) # Indexed
stA (R 8) # Indexed with offset
stA (R OFFS)
Target Adressing Modes:
jmp 1234 # Absolute
jmp Label
jmp (R) # Indexed
jmp (Global) # Indirect
The whole thing is so simple formost because there is only a single word
size (i.e. 64 bit) for all instructions, with the exception of the 'B'
register for byte operations.
The instructions take the form of
ldA something
instead of
ld A, something
The reason for this is to have a separate instruction for each register
(ldA, ldB, ldX etc.), making it probably easier to output completely
different instruction sequences depending on the target architecture and
its requirements or restrictions on individual registers.
On x86, it looks like
(asm ldA (Src Mode)
(prinst "movq" Src "%rax") )
...
(asm ldD (Src Mode)
(prinst "movq" Src "%rax")
(prinst "movq" (highWord Src) "%rdx") )
...
(asm orA (Src Mode)
(if (and (num? Mode) (>= 255 Mode))
(prinst "orb" Src "%al")
(prinst "or" Src "%rax") ) )
and so on.
Well, I hope this explains my intentions a little.
Cheers,
- Alex
--
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]