Re: Too many opcodes
Dan Sugalski [EMAIL PROTECTED] wrote: 2) The assembler and PIR compiler need to be taught appropriate transforms Any objections if I handled unary opcodes with constant arguments inside IMCC? We have still opcodes like: sin_n_nc# sin Nx, 3.14 The created code would be set Nx, 0.001593... Only numeric constants with N registers. leo
Re: Too many opcodes
Dan Sugalski [EMAIL PROTECTED] wrote: ... The answer isn't to reduce the op count. The answer's to make the cores manageable, which doesn't require tossing ops out. It seems that it was a bit unclear what my patches did. The confusion seem to arise from the usage of the term opcode. I used it as opcode in the sense: it's handled directly by the run core. The switched core has a case statement for it, the CGoto core has an entry in it's address table and the JIT emits a machine code equivalent. Your usage of opcode seems to be the outer view of a programmer: can I write: acos Nx, Iy or add Nx, Iy, Nz It's perfectly fine for a good chunk of the ops to not be in the main switch or cgoto loop, and have to be dispatched as indirect functions, That's exactly what I've written in that mail. 1) Op functions tagged (either in their definitions for all permutations, or in the ops numbering metadata file for individual functions) as to whether they're in the core loop or not. Ones that aren't hit the switch's default: case (and the cgoto core's equivalent, and the JIT's perfectly capable of handling this too) and get dispatched indirectly. This is mainly for the function- or method-like opcodes I presume. 2) The assembler and PIR compiler need to be taught appropriate transforms, which then *could* allow for add N2, I3, N3 to be turned into add N2, N3, I3 if we decide that in commutative IxN ops it's OK to make them NxI and so on. (Comparisons too, up to a point -- we can't do this with PMCs) Yep, that's what my patch did. And I did *not* touch PMCs. 3) The loadable opcode library stuff needs to be double-checked to make sure it works right, so we can create loadable libraries and actually load them in 4) The metadata in packfiiles to indicate which loadable opcode libraries are in force for code in each segment needs to be double-checked to make sure it works right Lets postpone the loadable ops stuff a bit. We have to lay out first, where they are in force, what about multiple threads and so on. The list of opcode functions is going to grow a lot, and there's really no reason that it shouldn't. With proper infrastructure there just isn't any need for there to be a difference between opcode functions and library functions. Ok. And I've made a proposal for the infrastructure too. Please read again the mail and the two about PIC. leo
PIC again (was: Too many opcodes)
Leopold Toetsch [EMAIL PROTECTED] wrote: 4) A scheme for calling functions. a) we need a class for a namespace, e.g. the interpreter (Python might have a math object for the call below:) $P0 = getinterp b) we do a method call $N0 = $P0.sin(3.14) c) add a method to classes/ParrotInterpreter.pmc: METHOD FLOATVAL sin(FLOATVAL f) { return sin(f); } d) and add the signature dIOd to call_list.txt. e) a table of builtins Quite easy and straightforward - and I hear all loudly crying - SLOW. 5) Ok - let's look (unoptimized build - see above ;) and parrot -C (-j is the same, except that PIC is only hacked partially into -C) Timings for 1 Meg sinus function opcodes [1] and methods [2] sin opcode: 0.23 s sin method: 3.20 s Ok, too slow man. But here comes the PIC [4]: sin method PIC: 0.50 s sin method PIC no I0..I50.37 s [3] And, if that's a C function, which can be looked up via Parrot_dlsym[5], the function can be called directly sin method PIC no I0..I50.31 s [5] f = Parrot_dlsym(NULL, sin); If that doesn't work with the OS, the method is still there as a fallback. The whole PIC functionality has currently 10 opcodes (the method call is just duplicated here as the code isn't integrated): static void * pic_ops_addr[] = { PC_MMD_OP_ppp, PC_MMD_OP_ppp_PASM, PC_MMD_OP_ppi, PC_MMD_OP_ppi_PASM, PC_MMD_OP_ppn, PC_MMD_OP_ppn_PASM, /* TBD i_p_p */ PC_METH_CALL_s, PC_METH_CALL_sc, PC_CALL_nn, PC_CALL_nn_C That's all what is needed to do any MMD function either overridden or in C, any method call (again PASM or builtin) and almost all trig and alike opcodes. Basically we need 2 entries per function signature, that's all (the _C variant isn't strictly needed but it saves one function call). Again I'm not speaking of any changes to the surface. I'm still speaking of the internal implementation to handle these opcodes. Tha assembler syntax doesn't change: $N0 = sin 3.14 But the run core just gets: $N0 = Pclass.sin(3.14) where Pclass just defines the namespace, where this function is searched for, e.g. math.sin(3.14) for Python. leo
Re: PIC again (was: Too many opcodes)
[Snip] This is interesting. After we're functionally complete we can revisit it. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Too many opcodes
At 9:20 AM +0100 11/24/04, Leopold Toetsch wrote: Too many opcodes Bluntly, no. Not too many opcodes. This has been an ongoing issue. I'm well aware that you want to to trim down the opcode count for ages and replace a lot of them with functions with a lightweight calling convention. Well, we already *have* that. We call them (wait for it) *opcodes*. That's one of the really big points of all this. You're micro-optimizing things, and you're not going the right way with it. Yes, I'm well aware that the computed goto and switch cores are big, and problematic. The answer isn't to reduce the op count. The answer's to make the cores manageable, which doesn't require tossing ops out. It requires being somewhat careful with what ops we put *in*. It's perfectly fine for a good chunk of the ops to not be in the main switch or cgoto loop, and have to be dispatched as indirect functions, the same as any opcode function from a loadable opcode library is. (Hell, some of these can go into a loadable opcode library if we want, to make sure the infrastructure works, including the packfile metadata that indicates which loadable op libraries need to be loaded) I'm also fine with making some of the ops phantom opcodes, ones that the assembler quietly rewrites. That's fine too, and something I'd like to get in. So, short answer: Ops aren't going away. Longer answer: We need to add in the following facilities: 1) Op functions tagged (either in their definitions for all permutations, or in the ops numbering metadata file for individual functions) as to whether they're in the core loop or not. Ones that aren't hit the switch's default: case (and the cgoto core's equivalent, and the JIT's perfectly capable of handling this too) and get dispatched indirectly. 2) The assembler and PIR compiler need to be taught appropriate transforms, which then *could* allow for add N2, I3, N3 to be turned into add N2, N3, I3 if we decide that in commutative IxN ops it's OK to make them NxI and so on. (Comparisons too, up to a point -- we can't do this with PMCs) 3) The loadable opcode library stuff needs to be double-checked to make sure it works right, so we can create loadable libraries and actually load them in 4) The metadata in packfiiles to indicate which loadable opcode libraries are in force for code in each segment needs to be double-checked to make sure it works right 5) The ops file to C converter needs to have a knockout list so we can note which combinations aren't supported (and believe me, I fully plan on trimming hard, but only *after* we're functionally complete) or, if we'd rather, it can respect the ops numbering list and just not generate ops not on it. Once this is done the only difference between 'real' opcodes and fixed-arg low-level functions is which are in the switch/cgoto/jit cores and which aren't, something that should be transparent to the bytecode and tunable as we need to. Which is as it should be. The list of opcode functions is going to grow a lot, and there's really no reason that it shouldn't. With proper infrastructure there just isn't any need for there to be a difference between opcode functions and library functions. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Too many opcodes
At 8:46 PM -0500 11/29/04, Dan Sugalski wrote: It requires being somewhat careful with what ops we put *in*. And since I wasn't clear (This stuff always obviously makes little sense only after I send things...), I meant in the switch/cgoto/jit core loop, not what ops are actually ops. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Too many opcodes
Leopold Toetsch [EMAIL PROTECTED] wrote: 3) Function-like opcodes Stat, gmtime, seek, tell, send, poll, recv, gcd, lcm, pack, rand, split, sleep, and what not are all functions in C or perl and any other language I know. These are *not* opcodes in any hardware CPU I know (maybe VAXens have it ;) Mumbling to myself: there is of course another argument, why these opcodes shouldn't be opcodes. It's called JIT. You could of course say, ok, JIT core is an optimization. We have currently: $ perl build_tools/list_unjitted.pl i386# [1] ... Not jitted: 1316 Jitted: 217 Total ops: 1533 While some of these non-JITted opcodes can and will be done (e.g. the is_cmp_i_x_x non-branching compare ops) the vast majority of opcodes will never be JITed. Each function call opcode would need work. It's a PITA. OTOH, when I've a table of builtin functions with function signatures and a methodcall syntax, just that one method call opcode has to be done. That's all. The JIT runcore is only fast for a *sequence* of JITted opcodes. One or two integer operations interrupted by a non-JITted function call don't speedup at all, because the JIT core has first to load CPU registers from Parrot registers and then store CPU register back before the function call. Having so many un-JITtable opcodes prohibits an efficient JIT core. leo [1] this tool should be in tools/dev and it's inaccurate, as ops not included in ops/ops.num aren't listed and JITted vtable functions are missing too, but anyway the magnitude of the counts are ok.
Too many opcodes
Below are some considerations WRT current opcode count. leo Too many opcodes gcc 2.95.4 doesn't compile the switch core optimized. People have repeatedly reported about troubles with the CGoto core - now the CGP core is as big and compiles as slow. I'm not speaking of the pain (and the additional coffee cups) it takes here to recompile Parrot optimized on my AMD 800 and I'm doing that frequently, believe me. We have to reduce the opcode count drastically. 1) Opcode variants with constants Dan has already stated that all binary opcodes with two constant arguments can go away. The same applies to compare ops. Imcc can handle that (and does it already, mostly) 2) Opcode variants with mixed arguments Honestly acos Nx, Iy and tons of other such opcodes are just overkill. If I want a numeric result, I just pass in a numeric argument. If people really want that, imcc has already some hooks to create from above set $N0, Iy acos Nx, $N0 or convert an int constant to a double constant. Well and above opcode isn't just one, these are two due to constant/non-constant argument addressing. 3) Function-like opcodes Stat, gmtime, seek, tell, send, poll, recv, gcd, lcm, pack, rand, split, sleep, and what not are all functions in C or perl and any other language I know. These are *not* opcodes in any hardware CPU I know (maybe VAXens have it ;) And most of these don't warrant the little speed gain as an opcode. 4) A scheme for calling functions. a) we need a class for a namespace, e.g. the interpreter (Python might have a math object for the call below:) $P0 = getinterp b) we do a method call $N0 = $P0.sin(3.14) c) add a method to classes/ParrotInterpreter.pmc: METHOD FLOATVAL sin(FLOATVAL f) { return sin(f); } d) and add the signature dIOd to call_list.txt. e) a table of builtins Quite easy and straightforward - and I hear all loudly crying - SLOW. 5) Ok - let's look (unoptimized build - see above ;) and parrot -C (-j is the same, except that PIC is only hacked partially into -C) Timings for 1 Meg sinus function opcodes [1] and methods [2] sin opcode: 0.23 s sin method: 3.20 s Ok, too slow man. But here comes the PIC [4]: sin method PIC: 0.50 s sin method PIC no I0..I50.37 s [3] PIC w inlining: 0.42 s PIC w inlining no I0..I50.29 s [3] So, it's slightly slower, but not much. Actually with the vastly reduced run core size average execution speed could increase due to less cache misses. But anyway the small advantage for all these opcodes isn't worth the pain. If you are unsure what PIC is, grep for the subject in p6i or consult the recent summary, which has a link too. Thanks for considering this approach, leo [1] opcode loop n = 3.14 lp: $N0 = sin n dec i if i goto lp [2] method call loop n = 3.14 $P0 = getinterp lp: $N0 = $P0.sin( n ) dec i if i goto lp [3] handcrafted code, which imcc can emit, when it's known that a builtin NCI function with a known signature is called: lp: set N5, n callmethodcc sin dec i if i, lp # result in N5 [4] The opcode function - please note that for the non-inlined case, one function fits all opcodes with the same signature. Additionally the call overhead can be reduced by omitting the interpreter and the object argument. PC_METH_CALL_n_n: { FLOATVAL num; #if PIC_INLINE num = REG_NUM(5); REG_NUM(5) = sin(num); #else Parrot_PIC *pic; typedef FLOATVAL (*func_dd)(Interp*, PMC*, FLOATVAL); func_dd f; pic = (Parrot_PIC *) cur_opcode[1]; num = REG_NUM(5); f = (func_dd)pic-f.real_function; REG_NUM(5) = (f)(0, 0, num); #endif goto *((void*)*(cur_opcode += 2)); } And we could provide a few opcodes with fixed signatures so that function call register passing (in N5) isn't needed. e.g. call_dd(sin, Ndest, Nsrc)
Re: Too many opcodes
Nicholas Clark [EMAIL PROTECTED] wrote: On Wed, Nov 24, 2004 at 09:20:42AM +0100, Leopold Toetsch wrote: 2) Opcode variants with mixed arguments Honestly acos Nx, Iy and tons of other such opcodes are just overkill. Heck, why do we even have transcendental maths ops that take integer arguments or return integer results? We have only the former. Returning integers would be still more silly. ... Can't we kill the lot? Well, sure. But: $ tail -1 ops/ops.num get_repr_s_p1532 We've additionally ~50 unblessed opcodes in experimental.ops. Now tossing just the integer variants of these transcendentals reduces the opcode count by 50. For everything that's intrinsically a function on real numbers, just take have N and P register variants. Ehem, that increaes the opcode count. And how do you override an opcode? What about: $P0 = new Complex $P0 = 1 + 2i $P1 = sin $P0 # now what I've shown a way to get rid of all these function-like opcodes. use overload 'sin' = \my_sin; becomes trivial then. 'sin' is a method call, always. And there is of course Python: r = math.sin(s) Nicholas Clark leo