1) Assuming a core set of unoverrideable opcodes 0-128 (so I don't need to differentiate between core and alternate opcodes.) 2) Maintaining each operation as a block (so that any necessary variables are declared locally to each case.) 3) Incrementing the pc pointer directly. 4) Accessing the necessary registers as current written (from the interpreter struct.)
Benchmarks on test.pasm: Linux 2.4.7, Athlon 1GHz, gcc 2.96 -O2 long/double/long Function table: 31,712,475 ops/sec Switch hybrid: 39,215,686 ops/sec (+24%) Solaris 8, UltraSPARC IIe 502MHz, Forte C 6.02 -fast long/double/long Function table: 13,181,019 ops/sec Switch hybrid: 18,416,206 ops/sec (+40%) This is relatively consistent with my pre-Parrot testing. If the model holds, reserving 256 (vice 128, which we're almost at) will reduce the difference slightly. (Obviously, by clustering most often used codes to the front, you'd probably get better performance since you're not traipsing all about memory any longer. Currently, for instance, comparision and branches are 40-60 code blocks away, while 'end' (which occurs once) is at offset 0. The ops used in this test are mostly up front. -- Bryan C. Warnock [EMAIL PROTECTED]