Below are some considerations WRT current opcode count.

leo
Too many opcodes

gcc 2.95.4 doesn't compile the switch core optimized. People have
repeatedly reported about troubles with the CGoto core - now the CGP
core is as big and compiles as slow.

I'm not speaking of the pain (and the additional coffee cups) it takes
here to recompile Parrot optimized on my AMD 800 and I'm doing that
frequently, believe me.

We have to reduce the opcode count drastically.

1) Opcode variants with constants

Dan has already stated that all binary opcodes with two constant
arguments can go away. The same applies to compare ops. Imcc can
handle that (and does it already, mostly)

2) Opcode variants with mixed arguments

Honestly

   acos Nx, Iy

and tons of other such opcodes are just overkill. If I want a numeric
result, I just pass in a numeric argument. If people really want
that, imcc has already some hooks to create from above

   set $N0, Iy
   acos Nx, $N0

or convert an int constant to a double constant.

Well and above opcode isn't just one, these are two due to
constant/non-constant argument addressing.

3) Function-like opcodes

Stat, gmtime, seek, tell, send, poll, recv, gcd, lcm, pack, rand,
split, sleep, and what not are all functions in C or perl and any
other language I know. These are *not* opcodes in any hardware CPU I
know (maybe VAXens have it ;)
And most of these don't warrant the little speed gain as an opcode.

4) A scheme for calling functions.

a) we need a class for a namespace, e.g. the interpreter (Python might
   have a "math" object for the call below:)

   $P0 = getinterp

b) we do a method call

   $N0 = $P0."sin"(3.14)

c) add a method to classes/ParrotInterpreter.pmc:

    METHOD FLOATVAL sin(FLOATVAL f) {
        return sin(f);
    }

d) and add the signature "dIOd" to call_list.txt.

e) a table of builtins


Quite easy and straightforward - and I hear all loudly crying - SLOW.

5) Ok - let's look (unoptimized build - see above ;) and parrot -C
(-j is the same, except that PIC is only hacked partially into -C)

Timings for 1 Meg sinus function opcodes [1] and methods [2]

  sin opcode:                 0.23 s
  sin method:                 3.20 s

Ok, too slow man. But here comes the PIC [4]:

  sin method PIC:             0.50 s
  sin method PIC no I0..I5    0.37 s   [3]
  PIC w inlining:             0.42 s
  PIC w inlining no I0..I5    0.29 s   [3]

So, it's slightly slower, but not much. Actually with the vastly
reduced run core size average execution speed could increase due to
less cache misses. But anyway the small advantage for all these opcodes
isn't worth the pain.

If you are unsure what PIC is, grep for the subject in p6i or consult
the recent summary, which has a link too.

Thanks for considering this approach,
leo


[1] opcode loop
    n = 3.14
lp:
    $N0 = sin n
    dec i
    if i goto lp


[2] method call loop
    n = 3.14
    $P0 = getinterp
lp:
    $N0 = $P0."sin"( n )
    dec i
    if i goto lp

[3] handcrafted code, which imcc can emit, when it's known that a
builtin NCI function with a known signature is called:

lp:
    set N5, n
    callmethodcc "sin"
    dec i
    if i, lp
    # result in N5

[4] The opcode function - please note that for the non-inlined
case, one function fits all opcodes with the same signature.
Additionally the call overhead can be reduced by omitting the
interpreter and the object argument.


PC_METH_CALL_n_n:
    {
        FLOATVAL num;
#if PIC_INLINE
        num = REG_NUM(5);
        REG_NUM(5) = sin(num);
#else
        Parrot_PIC *pic;
        typedef FLOATVAL (*func_dd)(Interp*, PMC*, FLOATVAL);
        func_dd f;

        pic = (Parrot_PIC *) cur_opcode[1];
        num = REG_NUM(5);
        f = (func_dd)pic->f.real_function;
        REG_NUM(5) = (f)(0, 0, num);
#endif
        goto *((void*)*(cur_opcode += 2));
    }

And we could provide a few opcodes with fixed signatures so that
function call register passing (in N5) isn't needed.

e.g.

   call_dd("sin", Ndest, Nsrc)

Reply via email to