Re: More speed trials

Bryan C . Warnock Sat, 06 Oct 2001 18:42:06 -0700

On Saturday 06 October 2001 09:07 pm, Gregor N. Purdy wrote:
> > But before we going jumping the gun, let's see what straight registers
> > do. {dum de dum de dum...} Runs about the same for me.  (A shade slower
> > on Linux.)
>
> Could you elaborate on this statement please? I'm not sure I follow...


Oh, since I wasn't doing any register stack manipulation, I pointed to the 
register set itself (to save another level of indirection) to see if that 
would indeed improve performance on register manipulation.  The x86 ran 1/10 
second slower, and the SPARC was unchanged.  (So there's no real performance 
gain to point at for working off the bottom instead of the top.)

>
> > > I'm interested to know if there's a way to turn the op funcs into
> > > chunks of code that longjmp around (or something equivalent) so we can
> > > get rid of function call overhead for simple ops (complex ops could
> > > consist primarily of a function call internally).
> >
> > But argument passing?  In theory, you'd just be coding by hand what the
> > platform's calling semantics already provide you.  (More or less.)
>
> There's no argument passing, because the args are on the stream. If
> everything is in the byte code stream. You jump to a fixed up address.
> The code there knows the PC within the byte code, so it messes with
> its args (fixed up pointers to regs and constants) and then jumps to
> the address thats been fixed up in place of the next op's opcode (after
> updating the PC). No argument passing. Unless I've missed something...

Well, yes.  Argument passing.  Whether they're on the stack or in the 
stream.  (In this case, the stream *is* the stack, sort of.)  I'm just 
saying that, in essence, all the jumping that you'd be coding, with the 
arguments in the stream (vice the stack), is more or less simply reinventing 
the calling semantics of whatever hardware you're on.  At some point, 
though, we will have to trade maintainability and sanity for speed.  ;-)

>
> > > In this case, the crystalizing loader puts the address to jump to in
> > > place of the opcode, and opcodes jump to the location in the next
> > > opcode field when they are done, and the 'end' opcode is replaced by a
> > > well-known location that terminates the runops core.
> >
> > Saving the dereference of the opcode type.  Yes, I'm reserving judgement
> > on this (whilst I ponder it.)
>
> Yeah, I want to save (really amortize) all those dereferences and also
> save the function call overhead for all simple ops (as I said before,
> complex ops that need temporary variables and such would probably be
> moved to functions and the code at the jump target would call that
> function with appropriate arg passing and then get back to the same
> business as the rest of the ops by updating PC and jumping to the next
> op func body.

Well, the simple ops switch is all inlined.  (No function calls.)  But you 
lose the ability to truly cache those addresses, so you can't call them 
directly.  And attemping to discern between an already converted address and 
a simply op will lose any ground you've gained.

(But some caching of the dereferences are good.  On the x86, where registers 
are scarce, I just squeezed another 1.3 million ops/sec by doing that.  But 
the same trick on the SPARC (which has the registers to cache it 
automatically) suffered a performance hit with the overhead of storing the 
deference.)
 
-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: More speed trials

Reply via email to