[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 --- Comment #6 from HaoChen Gui --- I made a patch to convert ashift to move when the second operand is const0_rtx. With the patch, the expand dump is just like aarch64's. But the problem is still there. I tested the patch with SPECint. All the object files are the same as base. Seems it is always optimized at later passes.
[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 HaoChen Gui changed: What|Removed |Added CC||guihaoc at gcc dot gnu.org --- Comment #5 from HaoChen Gui --- (In reply to Segher Boessenkool from comment #4) > On aarch64 we have (in expand): > > ;; i_4 = i_3 << 64; > > (insn 10 9 11 (set (subreg:DI (reg/v:TI 94 [ i ]) 8) > (subreg:DI (reg/v:TI 93 [ i ]) 0)) "100694.c":4:6 -1 > (nil)) > > (insn 11 10 0 (set (subreg:DI (reg/v:TI 94 [ i ]) 0) > (const_int 0 [0])) "100694.c":4:6 -1 > (nil)) > > But on rs6000 we get: > > ;; i_4 = i_3 << 64; > > (insn 10 9 11 (set (subreg:DI (reg/v:TI 119 [ i ]) 0) > (ashift:DI (subreg:DI (reg/v:TI 118 [ i ]) 8) > (const_int 0 [0]))) "100694.c":4:6 -1 > (nil)) > > (insn 11 10 0 (set (subreg:DI (reg/v:TI 119 [ i ]) 8) > (const_int 0 [0])) "100694.c":4:6 -1 > (nil)) > > What the what. On rs6000, the insn 10 is optimized at forward propagation pass. test.c.261r.fwprop1: (insn 10 5 11 2 (set (subreg:DI (reg/v:TI 119 [ i ]) 8) (reg/v:DI 122 [ hi ])) "test.c":4:6 670 {*movdi_internal64} (expr_list:REG_DEAD (reg:DI 126 [ i ]) Seems aarch64 optimizes it at expand pass. Now the problem is "ior" operation is done with TImode on rs6000 while it is done with two subreg:DI on aarch64. The subreg pass can decomposes the register which is always used by subreg. If the ior is done with two subreg:DI on rs6000, it can be optimized by subreg pass. on rs6000: (insn 14 13 15 2 (set (reg:TI 125 [ i ]) (ior:TI (reg:TI 124 [ lo ]) (reg/v:TI 119 [ i ]))) "test.c":5:6 494 {*boolti3_internal} on aarch64 (insn 21 20 22 2 (set (reg:DI 100) (ior:DI (subreg:DI (reg:TI 99) 0) (subreg:DI (reg/v:TI 94 [ i ]) 0))) "/app/example.c":5:6 521 {iordi3} (insn 23 22 24 2 (set (reg:DI 101) (ior:DI (subreg:DI (reg:TI 99) 8) (subreg:DI (reg/v:TI 94 [ i ]) 8))) "/app/example.c":5:6 521 {iordi3}
[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 --- Comment #4 from Segher Boessenkool --- On aarch64 we have (in expand): ;; i_4 = i_3 << 64; (insn 10 9 11 (set (subreg:DI (reg/v:TI 94 [ i ]) 8) (subreg:DI (reg/v:TI 93 [ i ]) 0)) "100694.c":4:6 -1 (nil)) (insn 11 10 0 (set (subreg:DI (reg/v:TI 94 [ i ]) 0) (const_int 0 [0])) "100694.c":4:6 -1 (nil)) But on rs6000 we get: ;; i_4 = i_3 << 64; (insn 10 9 11 (set (subreg:DI (reg/v:TI 119 [ i ]) 0) (ashift:DI (subreg:DI (reg/v:TI 118 [ i ]) 8) (const_int 0 [0]))) "100694.c":4:6 -1 (nil)) (insn 11 10 0 (set (subreg:DI (reg/v:TI 119 [ i ]) 8) (const_int 0 [0])) "100694.c":4:6 -1 (nil)) What the what.
[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 --- Comment #3 from Segher Boessenkool --- Should this not be handled by the subreg passes?
[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 Roger Sayle changed: What|Removed |Added CC||roger at nextmovesoftware dot com --- Comment #2 from Roger Sayle --- On x86, I proposed tackling this type of poor code generation issue for TImode operations by introducing a zero_extendditi2 pattern. Currently rs6000.md (also) doesn't provide a zero extension operation from DImode to TImode, so the middle-end expands things using SUBREGs, which unfortunately interferes with combine's ability to optimize things. Improving x86_64's TImode operations is still a work in progress, but a patch for issues similar to rs6000.md's was posted here: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596165.html Perhaps similar zero_extend and *concat operations would help on powerpc*?
[Bug target/100694] PPC: initialization of __int128 is very inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694 Segher Boessenkool changed: What|Removed |Added Last reconfirmed||2021-05-20 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Segher Boessenkool --- The important difference between powerpc64 and aarch64 is that the store is in TImode for powerpc64, but as two DImode stores for aarch64, right after expand already (and before expand the code was identical). Confirmed.