Bryan -- > > 4) Accessing the necessary registers as current written (from the > > interpreter struct.) > > The added benchmarks are the caching of the interpreter's register groups > within the runops_*_core. (You can't cache the register set itself, as > functions may manipulate the register stack.)
The Crystalizing Loader proposal I just made would work better if the addresses to the current registers were always the same and pushing regs onto stacks made copies, rather than having the current reg file be the new set of regs. I don't know enough right now about how that stuff works to see how hard it would be to make that change, and whether that change would entail additional cost or the same cost (does the current implementation leave the regs with their current values?) If it doesn't add cost, it seems like both what you are working on and what I'm thinking about would benefit from such a change. > One of the more interesting discoveries? Adding a 'default:' case to the > switch slowed down the Linux runs by several percent. I'm interested to know if there's a way to turn the op funcs into chunks of code that longjmp around (or something equivalent) so we can get rid of function call overhead for simple ops (complex ops could consist primarily of a function call internally). In this case, the crystalizing loader puts the address to jump to in place of the opcode, and opcodes jump to the location in the next opcode field when they are done, and the 'end' opcode is replaced by a well-known location that terminates the runops core. This isn't too hard to imagine in assembly language, but implementing it in portable C probably isn't for the faint of heart Regards, -- Gregor