Re: YARL - yet another run loop: CSwitch

Dan Sugalski Sat, 15 Feb 2003 12:00:23 -0800

At 12:13 AM +0100 2/15/03, Leopold Toetsch wrote:

Dan Sugalski wrote:
At 5:36 PM +0100 2/8/03, Leopold Toetsch wrote:
[ threaded JIT/prederef ]
Ouch, yes. So does JIT.
So JIT/prederefed code must be separated for threads.
Yup. This was a decision made a long time ago, back when Daniel started the first JIT stuff.

Not necessarily so now, it seems. Adressing registers by (thread + index), as outlined in Jerome's recent reply to the YARL (switched run loop) thread is equally fast on e.g. I386. Dunno if other $arch has a similar addressing mode. But as all recent processors now are (very) pipelined, one extra argument in the op stream only increases code size but not execution speed.

Right, but that's for now, not necessarily forever.

It's one of the reasons I haven't been too worried about speeding up the core loop. I've been figuring we'll end up with three:

*) JIT
*) CGoto
*) Old indirect dispatch
The fastest are in terms of possible $arch/compiler features now:

 - JIT
 - CGP (makes CGoto obsolete)
 - Switched Prederef (not in CVS)

but plain function call is needed e.g. for JIT - now.

Right, but that's something that'll get slowly phased out as the JIT gets more mature and more opcodes get JITted. There's also the potential for the JIT to get really aggressive, if we can find folks with both the talent and the time to do it.

and leave it at that. When Gregor was working on the prederef I figured we'd use it as the third, since the JIT was new and I wasn't sure it'd be possible to get it as a good general solution, but it's developed so much that I'm not sure it's worth more loop development. (I could, of course, be wrong... :)

JIT (known as that acronym, but isn`t just in time in parrot) is a very $arch depend feature. I did speed up mul_i_ic by 2-50 for some constants today, which a different $arch doesn't yet have implemented.

The CGP core is really fast for all compilers that have computed goto - and honestly, the code that HL emit, will resemble much a code that is not well suited for JIT.

I'm not sure we'll come across anything that's less well suited to the JIT than to a CG core, though I suppose there are potential code density issues.

One of the big things I'm concerned about is a proliferation of core loops, and the impact on the size of running programs. I want to keep the number of cores that force preprocessing the input bytecode to a minimum if at all possible. We also need to deal with those compilers that don't have computed goto, a feature that is definitely not C89 compatible.

What I'd like is to keep us at four potential cores (since I realized I forgot one):

1) Plain function dispatch
2) Switch core (for compilers with no computed goto)
3) CG core
4) JIT

with only the JIT allowed to rewrite the bytecode stream that it executes. I do realize that this rules out some potential code, and I'm not happy about that, but I'm worried that we're going to end up with a dozen different cores, all of which are only half-maintained.

Having said that, if the core building is completely automatic, and we can find an easy way (i.e. one that requires very little programmer thought and effort) to build and test any random collection of cores, I'm OK with that.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk

Re: YARL - yet another run loop: CSwitch

Reply via email to