On 30/10/2013 3:02 PM, Richard Henderson wrote:
On 10/30/2013 02:08 PM, Sebastian Macke wrote:
Do you have a publicly accessible tree with all your patches applied?
I'd like to re-read the logic in the proper context.
After you are the second who demanded it:
https://github.com/s-macke/qemu/tree/or32-optimize
Ok, the logic as written is correct as far as I can see.
It's a little convoluted though, and I think there may be a way to
streamline it. But I'll have to think about that some more.
r~
Yeah it is. Because of the delayed slot and the way QEMU is doing its
translation it is hard to see and distinguish all different code paths.
When is which information available.
At the moment the biggest time eater and most complex code part is the
branching part.
1. l.sf..... <- set flag if condition is fullfiled (setcond instruction)
2. l.bf <- branch if flag (a brcond instruction or with the last
suggestion you gave a movcond instruction)
3. delayed slot instruction which could fail.
4. Actual jump. We need a branch here for the two different slots for
chaining. And we need all information about point 2)
So at the moment we put three branches/conditions in the translated
code. One setcond, one movcond and one brcond.
In principle point 1 and 2 could be fused with some coding effort. But I
am not sure if one brcond is faster then one setcond+movcond.
And maybe the branching in No 4. could be avoided by some internal code
change in the way QEMU does its block chaining.