A bug reported by Hartmut Penner visible on s390 host and ppc64 guest: IN: 0x00000000001ff378: lwz r3,436(r7) 0x00000000001ff37c: lis r0,-17874 0x00000000001ff380: ori r0,r0,35747 0x00000000001ff384: mulhwu r4,r3,r0 0x00000000001ff388: rlwinm r5,r4,29,3,31 0x00000000001ff38c: rldicr r6,r5,4,59 0x00000000001ff390: rlwinm r4,r4,31,1,29 0x00000000001ff394: subf r4,r4,r6 0x00000000001ff398: subf r4,r5,r4 0x00000000001ff39c: subf r3,r4,r3 0x00000000001ff3a0: cmpwi r3,0 0x00000000001ff3a4: bne- 0x1ff494
Excerping the relevant opcodes from the op and op_opt dump: ---- 0x1ff378 qemu_ld32u r3,tmp2,$0x1 ---- 0x1ff37c movi_i64 r0,$0xffffffffba2e0000 ---- 0x1ff384 mov_i32 tmp0,r3 mov_i32 tmp1,r0 ext32u_i64 tmp2,tmp0 becomes ---- 0x1ff378 qemu_ld32u r3,tmp2,$0x1 ---- 0x1ff380 movi_i64 r0,$0xffffffffba2e8ba3 ---- 0x1ff384 mov_i32 tmp0,r3 *** mov_i64 tmp2,tmp0 *** We dropped the ext32u opcode because we "proved" the value was already zero-extended. Except we allowed that to go through a 32-bit temporary, which is illegal. Ideally we'd have transformed that last to mov_i64 tmp2,r3 which would have been fine, but we have no optimization pass that looks across multiple tcg opcodes. This specific bug may only be visible on s390. It is unique in that it has a 32-bit register move operation that does not modify the high bits of the 64-bit register. Unlike x86_64 + aarch64 which zero the high bits, or sparc64 + ppc64 that copy all 64-bits even for a 32-bit move. But the point that the high bits of a 32-bit operation must be considered to be garbage still stands. Hartmut reports success after this patch set, but didn't explicitly give me a Tested-by. ;-) r~ Richard Henderson (2): tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov* tcg/optimize: Remember garbage high bits for 32-bit ops tcg/optimize.c | 150 +++++++++++++++++++++++++++++++-------------------------- 1 file changed, 82 insertions(+), 68 deletions(-) -- 1.9.0