Re: More speed trials

Gregor N. Purdy Sat, 06 Oct 2001 16:14:09 -0700

Bryan --

> > 4) Accessing the necessary registers as current written (from the
> > interpreter struct.)
> 
> The added benchmarks are the caching of the interpreter's register groups
> within the runops_*_core.  (You can't cache the register set itself, as 
> functions may manipulate the register stack.)


The Crystalizing Loader proposal I just made would work better if the
addresses to the current registers were always the same and pushing
regs onto stacks made copies, rather than having the current reg file
be the new set of regs.

I don't know enough right now about how that stuff works to see how hard
it would be to make that change, and whether that change would entail
additional cost or the same cost (does the current implementation leave
the regs with their current values?) If it doesn't add cost, it seems
like both what you are working on and what I'm thinking about would
benefit from such a change.

> One of the more interesting discoveries?  Adding a 'default:' case to the 
> switch slowed down the Linux runs by several percent.

I'm interested to know if there's a way to turn the op funcs into chunks
of code that longjmp around (or something equivalent) so we can get rid of
function call overhead for simple ops (complex ops could consist primarily
of a function call internally).

In this case, the crystalizing loader puts the address to jump to in place
of the opcode, and opcodes jump to the location in the next opcode field
when they are done, and the 'end' opcode is replaced by a well-known 
location that terminates the runops core.

This isn't too hard to imagine in assembly language, but implementing it
in portable C probably isn't for the faint of heart


Regards,

-- Gregor

Re: More speed trials

Reply via email to