[Qemu-devel] Expensive emulation of CPU condition flags

Shuang Zhai Thu, 30 Jun 2016 20:45:06 -0700

Hi everyone.


In running an ARMv7 guest on an x86 host, we observed that a guest instruction 
affecting condition flags is often translated into 10+ host instructions. The 
reason seems to be the way that the frontend emulates the condition flags. For 
instance:


Target ARM instruction:

cmp  r9, 0x21 ;


IR instruction:

movi_i32 tmp5,$0x21

sub_i32 NF,r9,tmp5

mov_i32 ZF,NF

setcond_i32 CF,r9,tmp5,geu

xor_i32 VF,NF,r9

xor_i32 tmp7,r9,tmp5

and_i32 VF,VF,tmp7


Host x86 instruction:


sub    $0x21,%ebx

mov    %ebx,0x208(%r14)

mov    %ebx,%r12d

mov    %r12d,0x20c(%r14)

cmp    $0x21,%ebp

setae  %r13b

movzbl %r13b,%r13d

mov    %r13d,0x200(%r14)

xor    %ebp,%ebx

xor    $0x21,%ebp

and    %ebp,%ebx

mov    %ebx,0x204(%r14)


Imaging in a tight loop where a cmp instruction is used to compute the 
termination condition, this can be pretty expensive. And lazy evaluation seems 
not to help here.


We wonder if there exists any optimization, e.g., directly mapping the frontend 
flags to that of the backend? Any suggestions are appreciated.


Shuang

[Qemu-devel] Expensive emulation of CPU condition flags

Reply via email to