Hi everyone.
In running an ARMv7 guest on an x86 host, we observed that a guest instruction affecting condition flags is often translated into 10+ host instructions. The reason seems to be the way that the frontend emulates the condition flags. For instance: Target ARM instruction: cmp r9, 0x21 ; IR instruction: movi_i32 tmp5,$0x21 sub_i32 NF,r9,tmp5 mov_i32 ZF,NF setcond_i32 CF,r9,tmp5,geu xor_i32 VF,NF,r9 xor_i32 tmp7,r9,tmp5 and_i32 VF,VF,tmp7 Host x86 instruction: sub $0x21,%ebx mov %ebx,0x208(%r14) mov %ebx,%r12d mov %r12d,0x20c(%r14) cmp $0x21,%ebp setae %r13b movzbl %r13b,%r13d mov %r13d,0x200(%r14) xor %ebp,%ebx xor $0x21,%ebp and %ebp,%ebx mov %ebx,0x204(%r14) Imaging in a tight loop where a cmp instruction is used to compute the termination condition, this can be pretty expensive. And lazy evaluation seems not to help here. We wonder if there exists any optimization, e.g., directly mapping the frontend flags to that of the backend? Any suggestions are appreciated. Shuang