https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209
Bug ID: 86209 Summary: Peephole does not happen because the type of zero/sign extended operands is not the same. Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sameerad at gcc dot gnu.org Target Milestone: --- While implementing peephole2 for combining shorter types loads/stores into larger type load/store, following testcase was found for aarch64 for which peephole does not happen because the type of zero/sign extended operands is not the same. Test program: unsigned short subus (unsigned short *array) { return array[0] + array[1]; } Expander generated RTL: (insn 6 3 7 2 (set (reg:HI 96) (mem:HI (reg/v/f:DI 94 [ array ]) [1 *array_4(D)+0 S2 A16])) (nil)) (insn 7 6 8 2 (set (reg:HI 97) (mem:HI (plus:DI (reg/v/f:DI 94 [ array ]) (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D) + 2B]+0 S2 A16])) (nil)) (insn 8 7 9 2 (set (reg:SI 99) (subreg:SI (reg:HI 97) 0)) (nil)) (insn 9 8 10 2 (set (reg:SI 98) (plus:SI (subreg:SI (reg:HI 96) 0) (reg:SI 99))) (expr_list:REG_EQUAL (plus:SI (subreg:SI (reg:HI 96) 0) (subreg:SI (reg:HI 97) 0)) (nil))) The combiner combines insn 7 and 8 to generate zero extension to SI mode. (insn 8 7 9 2 (set (reg:SI 99 [ MEM[(short unsigned int *)array_4(D) + 2B] ]) (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 94 [ array ]) (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64} (expr_list:REG_DEAD (reg/v/f:DI 94 [ array ]) (nil))) The reload pass removes SUBREGs, which holds information about desired type, because of which HImode regs are zero extended to DImode. (insn 8 7 6 2 (set (reg:SI 1 x1 [orig:99 MEM[(short unsigned int *)array_4(D) + 2B] ] [99]) (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 0 x0 [orig:94 array ] [94]) (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64} (nil)) (insn 6 8 9 2 (set (reg:DI 0 x0) (zero_extend:DI (mem:HI (reg/v/f:DI 0 x0 [orig:94 array ] [94]) [1 *array_4(D)+0 S2 A16]))) {*zero_extendhidi2_aarch64} (nil)) (insn 9 6 14 2 (set (reg:SI 0 x0 [98]) (plus:SI (reg:SI 0 x0 [orig:96 *array_4(D) ] [96]) (reg:SI 1 x1 [orig:99 MEM[(short unsigned int *)array_4(D) + 2B] ] [99]))){*addsi3_aarch64} (nil)) (insn 14 9 15 2 (set (reg/i:HI 0 x0) (reg:HI 0 x0 [98])) {*movhi_aarch64} (nil)) (insn 15 14 17 2 (use (reg/i:HI 0 x0)) (nil)) (note 17 15 18 NOTE_INSN_DELETED) (note 18 17 0 NOTE_INSN_DELETED) Now as both memory accesses have different extended types, they cannot be combined by peephole. Because of this, even when sched_fusion has brought the loads/stores closer, they cannot be merged.