https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97747
Bug ID: 97747 Summary: missed combine opt with logical ops after zero extended load Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Consider this testcase struct { unsigned int a : 1; unsigned int b : 1; unsigned int c : 1; unsigned int d : 1; unsigned int pad1 : 28; } s; void sub (void) { s.a = 1; s.c = 1; } Compiling with -O2 -S for ARM I get sub: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movw r2, #:lower16:.LANCHOR0 movt r2, #:upper16:.LANCHOR0 ldrb r3, [r2] @ zero_extendqisi2 bic r3, r3, #5 orr r3, r3, #5 strb r3, [r2] bx lr The bic bit-clear instruction is obviously unnecessary. In the combine dump file I see that we have (insn 9 7 11 2 (set (reg:SI 120) (and:SI (reg:SI 119 [ MEM <unsigned char> [(struct *)&sD.5619] ]) (const_int -6 [0xfffffffffffffffa]))) "tmp.c":13:7 90 {*arm_andsi3_insn} (expr_list:REG_DEAD (reg:SI 119 [ MEM <unsigned char> [(struct *)&sD.5619] ]) (nil))) (insn 11 9 13 2 (set (reg:SI 122) (ior:SI (reg:SI 120) (const_int 5 [0x5]))) "tmp.c":13:7 106 {*iorsi3_insn} (expr_list:REG_DEAD (reg:SI 120) (nil))) And the combiner does: Trying 9 -> 11: 9: r120:SI=r119:SI&0xfffffffffffffffa REG_DEAD r119:SI 11: r122:SI=r120:SI|0x5 REG_DEAD r120:SI Failed to match this instruction: (set (reg:SI 122) (ior:SI (and:SI (reg:SI 119 [ MEM <unsigned char> [(struct *)&sD.5619] ]) (const_int 250 [0xfa])) (const_int 5 [0x5]))) The problem here is that the ARM port generated a zero_extend for the load byte, so combine knows that r120 has only 8 nonzero bits, it modified the -6 to 250 and then fails to notice that the and operation can be folded away because in SImode the operation is no longer redundant with the modified constant. On targets that do not generate the zero_extend, the and -6 operation gets optimized away in combine. For instance, with the current RISC-V port I get sub: lui a4,%hi(s) lbu a5,%lo(s)(a4) ori a5,a5,5 sb a5,%lo(s)(a4) ret This likely fails on any target where movqi generates a zero extended load.