I explained very badly. The issue is not spilling (at the parrot 
level)

The problem is: if you only pick the highest priority parrot registers 
and put them in real registers you are losing oportunities where 
copying the date once will save you from copying it many times. You 
are, in some sense, underspilling.

Let's see an example. Imagine you are compilling this imc, to be run 
in a machine which has 3 registers free (after temporaries):

set $I1, 1
add $I1, $I1, 1
print $I1

set $I2, 1
add $I2, $I2, 1
print $I2

set $I3, 1
add $I3, $I3, 1
print $I3

set $I4, 1
add $I4, $I4, 1
print $I4

set $I5, 1
add $I5, $I5, 1
print $I5

print $I1
print $I2
print $I3
print $I4
print $I5

Very silly code indeed, but you get the idea.

Since we have only 5 vars, imcc would turn this into:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

set I4, 1
add I4, I4, 1
print I4

set I5, 1
add I5, I5, 1
print I5

print I1
print I2
print I3
print I4
print I5

Now, assuming you put registers I1-I3 in real registers, what would it 
take to execute this code in JIT?

It would have to move the values of I4 and I5 from memory to registers 
a total of 10 times (4 saves and 6 restores if you assume the JIT is 
smart)

[This particular example could be improved by making the jit look if 
the same parrot register is going to be used in the next op, but 
that's not the point]

But, if IMCC knew that there were really only 3 registers in the 
machine, it would generate:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 1

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 3

print I1
print I2
fast_restore I3, 3
print I3
fast_restore I3, 2
print I3
fast_restore I3, 1
print I3

When running this code in the JIT, it would only require 6 moves (3 
saves, 3 restores): exactly the ones generated by imcc. 

In reality this would be even better, because as you have the garantee 
of having the data already in real registers you need less 
temporaries and so have more machine registers free.

> So the final goal could be, to emit these load/stores too, which
> then could be optimized to avoid duplicate loading/storing. Or imcc
> could emit a register move, if in the next instruction the parrot
> register is used again.

Yes, that's the idea: making imcc generate the loads/stores, using the 
info about how many registers are actually available in the real 
machine _and_ its own knowledge about the program flow.

An even better goal would be to have imcc know how many temporaries 
every JITed op requires, and use this information during register 
allocation.

All this is obviously machine dependent: the code generated should 
only run in the machine it was compiled for. So we should always keep 
the original imc code in case we copy the pbc file to another 
machine.

-angel


Reply via email to