On Tuesday 09 October 2001 07:20 am, Paolo Molaro wrote:
> On 10/07/01 Bryan C. Warnock wrote:
> > while (*pc) {
> >     switch (*pc) {
> >     }
> > }
>

Should have been "while (pc)".  Oops.

> With the early mono interpreter I observed a 10% slowdown when I
> checked that the instruction pointer was within the method's code:
> that is not a check you need on every opcode dispatch, but only with
> branches, so it makes sense to have a simple while (1) loop.

Yes, it's amazing how fast "hell-bent" is.  ;-)  In my pre-Parrot testing, I 
did simply do a while (1).  Since, I've used whatever was already coded.  
(Mostly.)  However, I'd need a ruling from Dan on what is acceptable
for the opcode dispatch loop.  I've simply used the existing code to provide 
a baseline behavior. 

>
> About the goto label feature discussed elsewhere, if the dispatch
> loop is emitted at compile time, there is no compatibility problem
> with non-gcc compilers, since we know what compiler we are going to use.
> I got speedups in the 10-20% range with dispatch-intensive benchmarks in
> mono. It can also be coded in a way similar to the switch code
> with a couple of defines, if needed, so that the same code compiles
> on both gcc and strict ANSI compilers.

I also tested (previously; I need to hit it again) replacing the loop, 
switch, and breaks with a lot of gotos and labels.

LOOP:
    /* function look up, if need be */
    switch (*pc) {
        case (1) : { /* yada yada yada */; goto LOOP }
        ...
    }

It improved the speed of non-optimized code, because you didn't jump to the 
end of the switch simply to jump back to the loop conditional.  But I didn't 
see any additional improvements with optimized code, because the optimizers 
take care of that for you.  (Well, really, they put another copy of the 
while condition at the bottom.)  

> > seems simpler, but introduces some potential page (or at least
> > i-cache(?)) thrashing, as you've got to do a significant jump just in
> > order to jump again.  The opcode comparison, followed by a small jump,
> > behaves much nicer.
>
> ... but adds a comparison even for opcodes that don't need it.
> As with the check for the program counter, it's a check that
> not all the opcodes need and as such should be left out of the
> fast path. This means that the first 200 and something opcodes
> are the most common ones _and_ the ones that need to be fast:
> there is no point in avoiding a jump for calls to exit(),
> read() etc (assuming those need opcodes at all).

Well, you've got to represent them somehow.  I haven't come up with any 
clever (or non-clever, for that matter) of doing that.  We toyed with 
earlier of doing contextual switches - of changing the opcode loop itself as 
the context of the opcode stream changes.  Part of that's been done already, 
with the trace and bounds checking.  Currently, though, the opcode loops are 
too heavy to make that efficient.

>
> The problem here is to make sure we really need the opcode swap
> functionality, it's really something that is going to kill
> dispatch performance.
> If a module wants to change the meaning of, eg the + operator,
> it can simply request the compiler to insert a call to a
> subroutine, instead of changing the meaning assigned to the
> VM opcode. The compiler is free to inline the sub, of course,
> just don't cripple the normal case with unnecessary overhead
> and let the special case pay the price of flexibility.
> Of course, if the special case is not so special, a _new_
> opcode can be introduced, but there is really no reason to
> change the meaning of an opcode on the fly, IMHO.
> Comment, or flame, away.

But how are you going to introduce the new opcode?  Recompile Perl?  
Unacceptable.  We understand that from a classic language perspective, we're 
slow and lumbering.  We're Perl.  We need that flexibilty.  We're trying to 
make that flexibility as fast as possible.

I've got three different opcode loops so far for Parrot.  (Linux(x86)/gcc, 
Solaris(SPARC)/Forte, and Solaris(SPARC)/gcc).  I've tried most every 
combination I can think of (am still working off the last couple, as a 
matter of fact).  (Particularly ever since I received the inexplicable 
slowdown adding a default case.)  Take nothing for granted, and try it all.  
I've posted some of my more ridiculous failures, and have posted what I have 
found to be my best numbers.  Anyone is free to come up with a better, 
faster solution that meets the requirements.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Reply via email to