1) Assuming a core set of unoverrideable opcodes 0-128 (so I don't need to 
differentiate between core and alternate opcodes.)
2) Maintaining each operation as a block (so that any necessary variables 
are declared locally to each case.)
3) Incrementing the pc pointer directly.
4) Accessing the necessary registers as current written (from the 
interpreter struct.)

Benchmarks on test.pasm:

Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long 
Function table: 31,712,475 ops/sec
Switch hybrid: 39,215,686 ops/sec (+24%)

Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long
Function table: 13,181,019 ops/sec
Switch hybrid: 18,416,206 ops/sec (+40%)

This is relatively consistent with my pre-Parrot testing.  If the model 
holds, reserving 256 (vice 128, which we're almost at) will reduce the 
difference slightly.  (Obviously, by clustering most often used codes to the 
front, you'd probably get better performance since you're not traipsing all 
about memory any longer.  Currently, for instance, comparision and branches 
are 40-60 code blocks away, while 'end' (which occurs once) is at offset 0. 
The ops used in this test are mostly up front.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Reply via email to