Cachegrind of course states that the memcpy in the register push/pop is the culprit, the pushN/popN take almost double the time of the other.
I think, there was some discussion ago, if we couldn't use sliding register windows
I'd rather not have the window, but...
Saving and restoring all the registers is obviously a waste of time in many cases. My assumption is that the compilers won't emit saveall/restoreall instructions unless they're really needed, and in most cases they won't be, so I think part of the timing's excessive.
Having said that, since the lower half of the register sets are parameters and shouldn't be restored over, it seems sensible that they shouldn't be saved over either. I think we may be better served halving the size of the frame on the register stacks, adding in pushtop, pushbottom, poptop, popbottom, and tossing the half-pop ops. (well, they'd get renamed to poptop) saveall and restoreall, along with the push ops, will stay, they'll just transparently do a pushbottom and pushtop operation.
Not a big deal--the only reason it's not done is it has Jit repercussions and I wanted you and Daniel to have a chance to bring up problems with the scheme before I went and broke the JIT.
--
Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk