Leo --
Ah. It seems the point of divergence is slow_core vs. cg_core, et al.
As you have figured out, I've been referring to performance of the non-cg,
non-prederef, non-JIT (read: "slow" ;) core.
I don't know much about the CG core, but prederef and JIT should be able
to work with dynamic optables. For prederef and JIT, optable mucking does
expire your prederefed and JITted blocks (in general), but for
conventional use (preamble setup), you don't pay a price during mainline
execution once you've set up your optable. You only pay an additional cost
if your program is dynamic enough to muck with its optable in the middle
somewhere, so you have to pay to re-prederef or re-JIT stuff (and a use
tax like that seems appropriate to me).
Of all the cores, the CG core is the most "crystalized" (rigid), so it
stands to reason that it would not be a good match for dynamic optables.
While I don't think I'm sophisticated enough to pull it off on my own, I
do think it should be possible to use what was learned to build the JIT
system to build the equivalent of a CG core on the fly, given its
structure. I think the information and basic capabilities are already
there: The JIT system knows already how to compile a sequence of ops to
machine code -- using this plus enough know-how to plop in the right JMP
instructions pretty much gets you there. A possible limitation to the
coolness, here: I think the JIT system bails out for the non-inline ops
and just calls the opfunc (please forgive if my understanding of what JIT
does and doesn't do is out of date). I think the CG core doesn't have to
take the hit of that extra indirection for non-inline ops. If so, then the
hypothetical dynamic core construction approach just described would
approach the speed of the CG core, but would fall somewhat short on
workloads that involve lots of non-inline ops (FWIW, there are more inline
ops than not in the current *.ops files).
Then, you get CG (-esque) speed along with the dynamic capabilities. Its
cheating, to be sure, but I like that kind of cheating. :) Further,
DCC would work with dynamically loaded oplibs (presumably using purely the
JIT-func-call technique, although I suppose its possible to do even
better), where the CG core would not.
It would be interesting to see where DCC would fit on the performance
spectrum compared to JIT, for mops.pasm and for other examples with
broader op usage...
Regards,
-- Gregor
Leopold Toetsch <[EMAIL PROTECTED]>
11/04/2002 08:45 AM
To: [EMAIL PROTECTED]
cc: Brent Dax <[EMAIL PROTECTED]>, "'Andy Dougherty'"
<[EMAIL PROTECTED]>, Josh Wilmes <[EMAIL PROTECTED]>, "'Perl6
Internals'" <[EMAIL PROTECTED]>
Subject: Re: Need for fingerprinting? [was: Re: What to do if
Digest::MD5 is
unavailable?]
[EMAIL PROTECTED] wrote:
> Leo --
>
> ... Optable build time is not a function of program
> size, but rather of optable size
Ok, I see that, but ...
> I don't think it remains a problem how to run ops from different oplibs
> _fast_.
.... the problem is, that as soon as there are dynamic oblibs, they can't
be run in the CGoto core, which is normally the fastest core, when
executions time is depending on opcode dispatch time. JIT is (much)
faster, in almost integer only code, e.g. mops.pasm, but for more
complex programs, involving PMCs, JIT is currently slower.
> ... Op lookup is already fast ...
I rewrote find_op, to build a lookup hash at runtime, when it's needed.
This is 2-3 times faster then the find_op with the static lookup table
in the core_ops.c file.
> ... After the
> preamble, while the program is running, the cost of having a dynamic
> optable is absolutely *nil*, whether the ops in question were statically
> or dynamically loaded (if you don't see that, then either I'm very
wrong,
> or I haven't given you the right mental picture of what I'm talking
> about).
The cost is only almost *nil*, if program execution time doesn't depend
on opcode dispatch time. E.g. mops.pasm has ~50% execution time in
cg_core (i.e. the computed goto core). Running the normal fast_core
slows this down by ~30%.
This might or might not be true for RL applications, but I hope, that
the optimizer will bring us near above relations for average programs.
Nethertheless I see the need for dynamic oplibs. If e.g. a program pulls
in obsure.ops, it could as well pay the penalty for using these.
> Regards,
>
> -- Gregor
leo