well, my last comment may have been a little harsh on register VMs.

so, on each side there are costs and gains.
for this, I will focus on interpreters, since this is the main area where the debate is relevant.

stack VM, pros:
typically very simple to target;
instruction stream is usually fairly dense and easy to standardize;

more difficult to eliminate common expressions or apply other more agressive optimizations;
typically, many of the operations are unneeded.

register VM, pros:
interpreter can be faster than for a stack machine;
more highly optimized instruction streams can be produced;
a less complicated JIT process may be needed;

some overhead may be due if there is a mismatch between the interpreter and underlying arch; for the added complexity in targetting the thing, one is not likely to get that much "real" advantage in terms of performance; a register machine will often require a more complex instruction set than a stack machine;

so, for general purpose bytecode designs, I would still opt for a stack-machine based interpreter.
for raw performance, a register machine may offer better performance.

we can also note that most major stack-based VMs place rigid restrictions on the behavior of the opcode stream, such as requiring that for each execution of a given opcode, the local stack frame have exactly the same layout, ...

possible solution:
the cannonical bytecode is not directly interpreted, but is always translated (like in .NET); the bytecode may then be JITed to native machine code, or trans-compiled to an internal interpreted form (likely a register machine).

potentially, this translation process could perform optimizations (such as restructuring the instruction stream and eliminating subexpressions), or it could be a simple and direct mapping (say, directly mapping stack elements to registers).

given in my project, since I will likely be interpreting several different bytecode formats (JBC-cannonical, JBC-modified, AVM2, ...), it could make sense to use an internal regularized interpreter, and I could modify my existing plans to use a register VM here. I had initially considered using a modified form of JBC, but a register VM may make more sense.

this could also simplify the task of blurring the line between the native-compiled and interpreted parts of the project, given there is no real complaint for making a register-based interpreter act much the same as the processor...

hypothetical design:
registers are allocated in frames (like the locals and stack-frames in the JVM);
opcodes are 32 bit, and typically follow a generic 3-address form;
there will be several groups of registers differentiated by type;
"linking" may be used in place of the highly indirect structure of the JVM;

32-bit, native endianess, with the opcode in the low 8 bits, and args in higher bits.
A    op dest, left, right
B    op dest, immed16
C    op immed24

register groups:
I    integer (32 bit)
L    long (64 bit)
P    pointer / reference
F    float
D    double
X    128-bit (int128, float128, vectors, ...)
Y    wider variable-sized registers

one can allocate some number of each, and the groups are non-overlapping.
opcodes will thus be type specialized.

add_i 3, 7, 9    //add I7 and I9 putting result into I3
load_ip 4, 1    //load int from P1 and store in I4

how to "best" transfer function arguments?...

current leading is to use the lower registers for recieving args, and higher registers for composing arguments (this saves having to use a stack for composing arguments...).

for example:
int foo(int x, int y)
   int z;
   z=bar(x, y, x+y);

I: - x y, z, x y (x+y)

mov_i 4, 1
mov_i 5, 2
add_i 6, 1, 2
call_i 3, "Baz/bar(ii)i"
ret_i 3

iload 1
iload 2
iload 1
iload 2
invokevirtual Baz/bar(II)I

or such...

fonc mailing list

Reply via email to