Re: Too many opcodes

2004-12-03 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

   2) The assembler and PIR compiler need to be taught appropriate
 transforms

Any objections if I handled unary opcodes with constant arguments inside
IMCC? We have still opcodes like:

   sin_n_nc# sin Nx, 3.14

The created code would be

   set Nx, 0.001593...

Only numeric constants with N registers.

leo


Re: Too many opcodes

2004-11-30 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

 ... The answer isn't to reduce the op count. The
 answer's to make the cores manageable, which doesn't require tossing
 ops out.

It seems that it was a bit unclear what my patches did. The confusion
seem to arise from the usage of the term opcode. I used it as opcode in
the sense: it's handled directly by the run core. The switched core has
a case statement for it, the CGoto core has an entry in it's address
table and the JIT emits a machine code equivalent.

Your usage of opcode seems to be the outer view of a programmer: can I
write:

  acos Nx, Iy

or

  add Nx, Iy, Nz

 It's perfectly fine for a good chunk of the ops to not be in the main
 switch or cgoto loop, and have to be dispatched as indirect
 functions,

That's exactly what I've written in that mail.

   1) Op functions tagged (either in their definitions for all
 permutations, or in the ops numbering metadata file for individual
 functions) as to whether they're in the core loop or not. Ones that
 aren't hit the switch's default: case (and the cgoto core's
 equivalent, and the JIT's perfectly capable of handling this too) and
 get dispatched indirectly.

This is mainly for the function- or method-like opcodes I presume.

   2) The assembler and PIR compiler need to be taught appropriate
 transforms, which then *could* allow for add N2, I3, N3 to be
 turned into add N2, N3, I3 if we decide that in commutative IxN ops
 it's OK to make them NxI and so on. (Comparisons too, up to a point
 -- we can't do this with PMCs)

Yep, that's what my patch did. And I did *not* touch PMCs.

   3) The loadable opcode library stuff needs to be double-checked to
 make sure it works right, so we can create loadable libraries and
 actually load them in

   4) The metadata in packfiiles to indicate which loadable opcode
 libraries are in force for code in each segment needs to be
 double-checked to make sure it works right

Lets postpone the loadable ops stuff a bit. We have to lay out first,
where they are in force, what about multiple threads and so on.

 The list of opcode functions is going to grow a lot, and there's
 really no reason that it shouldn't. With proper infrastructure there
 just isn't any need for there to be a difference between opcode
 functions and library functions.

Ok. And I've made a proposal for the infrastructure too. Please read
again the mail and the two about PIC.

leo


PIC again (was: Too many opcodes)

2004-11-30 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

 4) A scheme for calling functions.

 a) we need a class for a namespace, e.g. the interpreter (Python might
have a math object for the call below:)

$P0 = getinterp

 b) we do a method call

$N0 = $P0.sin(3.14)

 c) add a method to classes/ParrotInterpreter.pmc:

 METHOD FLOATVAL sin(FLOATVAL f) {
 return sin(f);
 }

 d) and add the signature dIOd to call_list.txt.

 e) a table of builtins


 Quite easy and straightforward - and I hear all loudly crying - SLOW.

 5) Ok - let's look (unoptimized build - see above ;) and parrot -C
 (-j is the same, except that PIC is only hacked partially into -C)

 Timings for 1 Meg sinus function opcodes [1] and methods [2]

   sin opcode: 0.23 s
   sin method: 3.20 s

 Ok, too slow man. But here comes the PIC [4]:

   sin method PIC: 0.50 s
   sin method PIC no I0..I50.37 s   [3]

And, if that's a C function, which can be looked up via Parrot_dlsym[5],
the function can be called directly

  sin method PIC no I0..I50.31 s

[5] f = Parrot_dlsym(NULL, sin);

If that doesn't work with the OS, the method is still there as a fallback.

The whole PIC functionality has currently 10 opcodes (the method call is
just duplicated here as the code isn't integrated):

static void * pic_ops_addr[] = {
PC_MMD_OP_ppp,
PC_MMD_OP_ppp_PASM,
PC_MMD_OP_ppi,
PC_MMD_OP_ppi_PASM,
PC_MMD_OP_ppn,
PC_MMD_OP_ppn_PASM,
/* TBD i_p_p */

PC_METH_CALL_s,
PC_METH_CALL_sc,
PC_CALL_nn,
PC_CALL_nn_C

That's all what is needed to do any MMD function either overridden or in
C, any method call (again PASM or builtin) and almost all trig and
alike opcodes. Basically we need 2 entries per function signature,
that's all (the _C variant isn't strictly needed but it saves one
function call).

Again I'm not speaking of any changes to the surface. I'm still speaking
of the internal implementation to handle these opcodes.

Tha assembler syntax doesn't change:

   $N0 = sin 3.14

But the run core just gets:

   $N0 = Pclass.sin(3.14)

where Pclass just defines the namespace, where this function is
searched for, e.g. math.sin(3.14) for Python.

leo


Re: PIC again (was: Too many opcodes)

2004-11-30 Thread Dan Sugalski
[Snip]
This is interesting. After we're functionally complete we can revisit it.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Too many opcodes

2004-11-29 Thread Dan Sugalski
At 9:20 AM +0100 11/24/04, Leopold Toetsch wrote:
Too many opcodes
Bluntly, no. Not too many opcodes.
This has been an ongoing issue. I'm well aware that you want to to 
trim down the opcode count for ages and replace a lot of them with 
functions with a lightweight calling convention. Well, we already 
*have* that. We call them (wait for it) *opcodes*. That's one of the 
really big points of all this. You're micro-optimizing things, and 
you're not going the right way with it.

Yes, I'm well aware that the computed goto and switch cores are big, 
and problematic. The answer isn't to reduce the op count. The 
answer's to make the cores manageable, which doesn't require tossing 
ops out. It requires being somewhat careful with what ops we put *in*.

It's perfectly fine for a good chunk of the ops to not be in the main 
switch or cgoto loop, and have to be dispatched as indirect 
functions, the same as any opcode function from a loadable opcode 
library is. (Hell, some of these can go into a loadable opcode 
library if we want, to make sure the infrastructure works, including 
the packfile metadata that indicates which loadable op libraries need 
to be loaded) I'm also fine with making some of the ops phantom 
opcodes, ones that the assembler quietly rewrites. That's fine too, 
and something I'd like to get in.

So, short answer: Ops aren't going away.
Longer answer: We need to add in the following facilities:
 1) Op functions tagged (either in their definitions for all 
permutations, or in the ops numbering metadata file for individual 
functions) as to whether they're in the core loop or not. Ones that 
aren't hit the switch's default: case (and the cgoto core's 
equivalent, and the JIT's perfectly capable of handling this too) and 
get dispatched indirectly.

 2) The assembler and PIR compiler need to be taught appropriate 
transforms, which then *could* allow for add N2, I3, N3 to be 
turned into add N2, N3, I3 if we decide that in commutative IxN ops 
it's OK to make them NxI and so on. (Comparisons too, up to a point 
-- we can't do this with PMCs)

 3) The loadable opcode library stuff needs to be double-checked to 
make sure it works right, so we can create loadable libraries and 
actually load them in

 4) The metadata in packfiiles to indicate which loadable opcode 
libraries are in force for code in each segment needs to be 
double-checked to make sure it works right

 5) The ops file to C converter needs to have a knockout list so we 
can note which combinations aren't supported (and believe me, I fully 
plan on trimming hard, but only *after* we're functionally complete) 
or, if we'd rather, it can respect the ops numbering list and just 
not generate ops not on it.

Once this is done the only difference between 'real' opcodes and 
fixed-arg low-level functions is which are in the switch/cgoto/jit 
cores and which aren't, something that should be transparent to the 
bytecode and tunable as we need to. Which is as it should be.

The list of opcode functions is going to grow a lot, and there's 
really no reason that it shouldn't. With proper infrastructure there 
just isn't any need for there to be a difference between opcode 
functions and library functions.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Too many opcodes

2004-11-29 Thread Dan Sugalski
At 8:46 PM -0500 11/29/04, Dan Sugalski wrote:
It requires being somewhat careful with what ops we put *in*.
And since I wasn't clear (This stuff always obviously makes little 
sense only after I send things...), I meant in the switch/cgoto/jit 
core loop, not what ops are actually ops.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Too many opcodes

2004-11-25 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:

 3) Function-like opcodes

 Stat, gmtime, seek, tell, send, poll, recv, gcd, lcm, pack, rand,
 split, sleep, and what not are all functions in C or perl and any
 other language I know. These are *not* opcodes in any hardware CPU I
 know (maybe VAXens have it ;)

Mumbling to myself: there is of course another argument, why these
opcodes shouldn't be opcodes. It's called JIT. You could of course say,
ok, JIT core is an optimization.

We have currently:

$ perl build_tools/list_unjitted.pl i386# [1]
...
Not jitted: 1316
Jitted: 217
Total ops: 1533

While some of these non-JITted opcodes can and will be done (e.g. the
is_cmp_i_x_x non-branching compare ops) the vast majority of opcodes
will never be JITed. Each function call opcode would need work. It's a
PITA.

OTOH, when I've a table of builtin functions with function signatures
and a methodcall syntax, just that one method call opcode has to be done.
That's all.

The JIT runcore is only fast for a *sequence* of JITted opcodes. One or
two integer operations interrupted by a non-JITted function call don't
speedup at all, because the JIT core has first to load CPU registers
from Parrot registers and then store CPU register back before the
function call.

Having so many un-JITtable opcodes prohibits an efficient JIT core.

leo

[1] this tool should be in tools/dev and it's inaccurate, as ops not
included in ops/ops.num aren't listed and JITted vtable functions are
missing too, but anyway the magnitude of the counts are ok.


Too many opcodes

2004-11-24 Thread Leopold Toetsch
Below are some considerations WRT current opcode count.
leo
Too many opcodes

gcc 2.95.4 doesn't compile the switch core optimized. People have
repeatedly reported about troubles with the CGoto core - now the CGP
core is as big and compiles as slow.

I'm not speaking of the pain (and the additional coffee cups) it takes
here to recompile Parrot optimized on my AMD 800 and I'm doing that
frequently, believe me.

We have to reduce the opcode count drastically.

1) Opcode variants with constants

Dan has already stated that all binary opcodes with two constant
arguments can go away. The same applies to compare ops. Imcc can
handle that (and does it already, mostly)

2) Opcode variants with mixed arguments

Honestly

   acos Nx, Iy

and tons of other such opcodes are just overkill. If I want a numeric
result, I just pass in a numeric argument. If people really want
that, imcc has already some hooks to create from above

   set $N0, Iy
   acos Nx, $N0

or convert an int constant to a double constant.

Well and above opcode isn't just one, these are two due to
constant/non-constant argument addressing.

3) Function-like opcodes

Stat, gmtime, seek, tell, send, poll, recv, gcd, lcm, pack, rand,
split, sleep, and what not are all functions in C or perl and any
other language I know. These are *not* opcodes in any hardware CPU I
know (maybe VAXens have it ;)
And most of these don't warrant the little speed gain as an opcode.

4) A scheme for calling functions.

a) we need a class for a namespace, e.g. the interpreter (Python might
   have a math object for the call below:)

   $P0 = getinterp

b) we do a method call

   $N0 = $P0.sin(3.14)

c) add a method to classes/ParrotInterpreter.pmc:

METHOD FLOATVAL sin(FLOATVAL f) {
return sin(f);
}

d) and add the signature dIOd to call_list.txt.

e) a table of builtins


Quite easy and straightforward - and I hear all loudly crying - SLOW.

5) Ok - let's look (unoptimized build - see above ;) and parrot -C
(-j is the same, except that PIC is only hacked partially into -C)

Timings for 1 Meg sinus function opcodes [1] and methods [2]

  sin opcode: 0.23 s
  sin method: 3.20 s

Ok, too slow man. But here comes the PIC [4]:

  sin method PIC: 0.50 s
  sin method PIC no I0..I50.37 s   [3]
  PIC w inlining: 0.42 s
  PIC w inlining no I0..I50.29 s   [3]

So, it's slightly slower, but not much. Actually with the vastly
reduced run core size average execution speed could increase due to
less cache misses. But anyway the small advantage for all these opcodes
isn't worth the pain.

If you are unsure what PIC is, grep for the subject in p6i or consult
the recent summary, which has a link too.

Thanks for considering this approach,
leo


[1] opcode loop
n = 3.14
lp:
$N0 = sin n
dec i
if i goto lp


[2] method call loop
n = 3.14
$P0 = getinterp
lp:
$N0 = $P0.sin( n )
dec i
if i goto lp

[3] handcrafted code, which imcc can emit, when it's known that a
builtin NCI function with a known signature is called:

lp:
set N5, n
callmethodcc sin
dec i
if i, lp
# result in N5

[4] The opcode function - please note that for the non-inlined
case, one function fits all opcodes with the same signature.
Additionally the call overhead can be reduced by omitting the
interpreter and the object argument.


PC_METH_CALL_n_n:
{
FLOATVAL num;
#if PIC_INLINE
num = REG_NUM(5);
REG_NUM(5) = sin(num);
#else
Parrot_PIC *pic;
typedef FLOATVAL (*func_dd)(Interp*, PMC*, FLOATVAL);
func_dd f;

pic = (Parrot_PIC *) cur_opcode[1];
num = REG_NUM(5);
f = (func_dd)pic-f.real_function;
REG_NUM(5) = (f)(0, 0, num);
#endif
goto *((void*)*(cur_opcode += 2));
}

And we could provide a few opcodes with fixed signatures so that
function call register passing (in N5) isn't needed.

e.g.

   call_dd(sin, Ndest, Nsrc)


Re: Too many opcodes

2004-11-24 Thread Leopold Toetsch
Nicholas Clark [EMAIL PROTECTED] wrote:
 On Wed, Nov 24, 2004 at 09:20:42AM +0100, Leopold Toetsch wrote:

 2) Opcode variants with mixed arguments

 Honestly

acos Nx, Iy

 and tons of other such opcodes are just overkill.

 Heck, why do we even have transcendental maths ops that take integer
 arguments or return integer results?

We have only the former. Returning integers would be still more silly.

 ... Can't we kill the lot?

Well, sure. But:

$ tail -1 ops/ops.num
get_repr_s_p1532

We've additionally ~50 unblessed opcodes in experimental.ops. Now
tossing just the integer variants of these transcendentals reduces the
opcode count by 50.

 For everything that's intrinsically a function on real numbers, just take
 have N and P register variants.

Ehem, that increaes the opcode count. And how do you override an opcode?
What about:

  $P0 = new Complex
  $P0 = 1 + 2i
  $P1 = sin $P0   # now what

I've shown a way to get rid of all these function-like opcodes.

  use overload 'sin' = \my_sin;

becomes trivial then. 'sin' is a method call, always. And there is of
course Python:

  r = math.sin(s)

 Nicholas Clark

leo