On Fri, Dec 21, 2001 at 12:03:51AM +0000, Tom Hughes wrote:

> It looks like it is going to need some work before it can work for
> other instruction sets though, at least for RISC systems where the
> operands are typically encoded with the opcode as part of a single
> word and the range of immediate constants is often restricted.
> 
> I'm thinking it will need some way of indicating field widths and
> shifts for the operands and opcode so they can be merged into an
> instruction word and also some way of handling a constant pool so
> that arbitrary addresses can be loaded using PC relative loads.

Another thing that struck me on reading it was:

=item C<B<&IR>>I<n>

Place the address of the C<INTVAL> register specified in the I<n>th argument.


RISC chips have lots of general purpose registers. It's likely that there
will be enough spare that several can be used to map to parrot registers.
Say 4 are available, it would be useful to be able to say that an op
requires the value of rN and rM, and modifies rD. The JIT compiler would make
a sandwich with the code to read in N and M into two of the real CPU registers,
the op filling, and then some more code to write D back to memory.
However, if the JIT can see that N is already in memory from the previous
OP, or D is going to be used and modified by the next op, it can skip, defer
or whatever some of the memory reads and writes.

[And provided the descriptions are this helpful it doesn't have to do it
immediately. It becomes possible to write a better optimising JIT that makes
sandwiches with multiple fillings or even Scooby Snacks, while the initial
JIT insists that the only recipe available is bread, 1 filling, bread]

mops will be fast if

REDO:   sub    I4, I4, I3
        if     I4, REDO

maps to

REDO:
        load I4 from memory (which will be in the L1 cache)
        load I3 from memory
        I4 = I4 - I3
        store I4 to memory
        
        load I4 from memory
        is it 0?
        goto REDO if true


it will be slightly faster if it maps to

REDO:
        load I4 from memory (which will be in the L1 cache)
        load I3 from memory
        I4 = I4 - I3
        store I4 to memory

        # I4 still in a CPU register
        is it 0?
        goto REDO if so

and faster still if the JIT can see how to push things out of the loop:

        load I4 from memory
        load I3 from memory
REDO:
        I4 = I4 - I3

        is it 0?
        goto REDO if so

        store I4 to memory

(does threading mess this idea up?)

Nicholas Clark

Reply via email to