We've two big issues with sub calling.
The first is the CPS style chews through continuation objects at a massive rate, and using them means that everything needs to be COW.
The second is that with the register file, we're doing a lot of copying of data on each sub call and, while code can (though at the moment really doesn't) restrict which sets of registers gets saved, that doesn't work for indirect calls to bytecode, via vtable functions and whatnot.
I've solutions of a sort for both problems.
For the first, we've got the infrastructure in place to do what we need. A return continuation can be recycled *if* we know it hasn't been used. With the return continuation register, that's now easy. First, we remove the return continuation from the registers the called sub/method sees. The continuation's still there, in the return register, but it's not in the general register set. Next, we distinguish between actual and potential continuations.
A return continuation is a *potential* continuation. That is, it *can* be a full continuation, but until something actually does something to it, most of its continuation-ness can be deferred. That is, unless something takes a real continuation or fetches the return continuation out of the return continuation register (thus turning it from potential to actual) the continuation can be safely recycled once invoked and doesn't have to go COW-marking stacks or anything.
This does mean that we'd want to encourage people to use the <invoke>cc ops to call a function or method (as they're the only ones that can safely create a potential continuation) and use the return op to invoke the return continuation in the return continuation register (so it can then go recycle the continuation after extracting out all the good bits)
Taking a continuation can *also* mark the current interpreter structure as dirty. That way when return is invoked, if the interpreter *isn't* dirty we can immediately recycle the register file if we choose to hang the current register file off a pointer.
For the register copying problem, we've a couple of options. At the moment I'm leaning towards re-abstracting out the register file and hanging it off a pointer in the interpreter structure. We can allocate a new register file when making a sub call and only copy the relevant bits into it as a sort of extra bonus for speed. (And copy back only the relevant bits on return) The two downsides to this are that it does slow access to the actual registers, which'll impact the interpreter by a bit, and it will completely invalidate all the existing JIT code.
I'll note that I'm not sure that the register copying issue will truly be an issue in most code, if proper save/restore sets are done, which could be helped by hints passed to the pir compiler by whatever compiler modules are being used. On the other hand, looking at the -t output from The Work Project, I see a near-insane amount of bytecoded vtable method calling, so I'm willing to accept that it'll be an issue for many people. (I'd not have the problem if my data types had their vtable functions written in C. I chose not to for implementation speed reasons and because the dynloading stuff was badly broken when I needed them. I'm reasonably sure that the big languages (perl/python/ruby/tcl (Hi Wil) will have their basic PMC classes all done up in C)
I think this makes some sense, but I'm kinda sick at the moment, so it may not. (OTOH, being ill is likely why I'm looking at this again, so we take the good with the bad...)
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk
