https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97747

            Bug ID: 97747
           Summary: missed combine opt with logical ops after zero
                    extended load
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Consider this testcase
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Compiling with -O2 -S for ARM I get
sub:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r2, #:lower16:.LANCHOR0
        movt    r2, #:upper16:.LANCHOR0
        ldrb    r3, [r2]        @ zero_extendqisi2
        bic     r3, r3, #5
        orr     r3, r3, #5
        strb    r3, [r2]
        bx      lr
The bic bit-clear instruction is obviously unnecessary.

In the combine dump file I see that we have
(insn 9 7 11 2 (set (reg:SI 120)
        (and:SI (reg:SI 119 [ MEM <unsigned char> [(struct  *)&sD.5619] ])
            (const_int -6 [0xfffffffffffffffa]))) "tmp.c":13:7 90
{*arm_andsi3_insn}
     (expr_list:REG_DEAD (reg:SI 119 [ MEM <unsigned char> [(struct 
*)&sD.5619] ])
        (nil)))
(insn 11 9 13 2 (set (reg:SI 122)
        (ior:SI (reg:SI 120)
            (const_int 5 [0x5]))) "tmp.c":13:7 106 {*iorsi3_insn}
     (expr_list:REG_DEAD (reg:SI 120)
        (nil)))

And the combiner does:
Trying 9 -> 11:
    9: r120:SI=r119:SI&0xfffffffffffffffa
      REG_DEAD r119:SI
   11: r122:SI=r120:SI|0x5
      REG_DEAD r120:SI
Failed to match this instruction:
(set (reg:SI 122)
    (ior:SI (and:SI (reg:SI 119 [ MEM <unsigned char> [(struct  *)&sD.5619] ])
            (const_int 250 [0xfa]))
        (const_int 5 [0x5])))

The problem here is that the ARM port generated a zero_extend for the load
byte, so combine knows that r120 has only 8 nonzero bits, it modified the -6 to
250 and then fails to notice that the and operation can be folded away because
in SImode the operation is no longer redundant with the modified constant.

On targets that do not generate the zero_extend, the and -6 operation gets
optimized away in combine.  For instance, with the current RISC-V port I get
sub:
        lui     a4,%hi(s)
        lbu     a5,%lo(s)(a4)
        ori     a5,a5,5
        sb      a5,%lo(s)(a4)
        ret

This likely fails on any target where movqi generates a zero extended load.

Reply via email to