On 10/07/01 Bryan C. Warnock wrote:
> while (*pc) {
>     switch (*pc) {
>     }
> }

With the early mono interpreter I observed a 10% slowdown when I
checked that the instruction pointer was within the method's code:
that is not a check you need on every opcode dispatch, but only with
branches, so it makes sense to have a simple while (1) loop.

About the goto label feature discussed elsewhere, if the dispatch
loop is emitted at compile time, there is no compatibility problem
with non-gcc compilers, since we know what compiler we are going to use.
I got speedups in the 10-20% range with dispatch-intensive benchmarks in
mono. It can also be coded in a way similar to the switch code
with a couple of defines, if needed, so that the same code compiles
on both gcc and strict ANSI compilers.

> I don't see (on simple inspection) a default case, which implies that all 
> functions would be in the switch.  There's two problems with that.  First, 
> you can't then swap out (or add) opcode functions, which compilation units 
> need to do.  They're all fixed, unless you handle opcode differentiation 
> within each case.  (And there some other problems associated with that, too, 
> but they're secondary.)  Second, the switch will undoubtedly grow too large 
> to be efficient.  A full switch with as few as 512 branches already shows 
> signs of performance degradation. [1]  Thirdly, any function calls that you 
> then *do* have to make, come at the expense of the switching branch, on top 
> of normal function call overhead.
> 
> I've found [2] that the fastest solution (on the platforms I've tested) are 
> within the family:
> 
> while (*pc) {
>     if (*pc > CORE_OPCODE_NUMBER) {
>         pc = func_table[*pc]();
>     } else {
>         switch (*pc) {
>     }
> }
> 
> That keeps the switch branching small.  Do this:
> 
> while (*pc) {
>     switch (*pc) {
>         case : ...
>         default: pc = func_table[*pc]();
>     }
> }
> 
> seems simpler, but introduces some potential page (or at least i-cache(?)) 
> thrashing, as you've got to do a significant jump just in order to jump 
> again.  The opcode comparison, followed by a small jump, behaves much nicer.

... but adds a comparison even for opcodes that don't need it.
As with the check for the program counter, it's a check that
not all the opcodes need and as such should be left out of the
fast path. This means that the first 200 and something opcodes
are the most common ones _and_ the ones that need to be fast:
there is no point in avoiding a jump for calls to exit(),
read() etc (assuming those need opcodes at all).

The problem here is to make sure we really need the opcode swap
functionality, it's really something that is going to kill
dispatch performance.
If a module wants to change the meaning of, eg the + operator,
it can simply request the compiler to insert a call to a
subroutine, instead of changing the meaning assigned to the
VM opcode. The compiler is free to inline the sub, of course,
just don't cripple the normal case with unnecessary overhead
and let the special case pay the price of flexibility.
Of course, if the special case is not so special, a _new_
opcode can be introduced, but there is really no reason to
change the meaning of an opcode on the fly, IMHO.
Comment, or flame, away.

lupus

-- 
-----------------------------------------------------------------
[EMAIL PROTECTED]                                     debian/rules
[EMAIL PROTECTED]                             Monkeys do it better

Reply via email to