[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-28 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

--- Comment #6 from HaoChen Gui  ---
I made a patch to convert ashift to move when the second operand is const0_rtx.
With the patch, the expand dump is just like aarch64's. But the problem is
still there. 
I tested the patch with SPECint. All the object files are the same as base.
Seems it is always optimized at later passes.

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-25 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

HaoChen Gui  changed:

   What|Removed |Added

 CC||guihaoc at gcc dot gnu.org

--- Comment #5 from HaoChen Gui  ---
(In reply to Segher Boessenkool from comment #4)
> On aarch64 we have (in expand):
> 
> ;; i_4 = i_3 << 64;
> 
> (insn 10 9 11 (set (subreg:DI (reg/v:TI 94 [ i ]) 8)
> (subreg:DI (reg/v:TI 93 [ i ]) 0)) "100694.c":4:6 -1
>  (nil))
> 
> (insn 11 10 0 (set (subreg:DI (reg/v:TI 94 [ i ]) 0)
> (const_int 0 [0])) "100694.c":4:6 -1
>  (nil))
> 
> But on rs6000 we get:
> 
> ;; i_4 = i_3 << 64;
> 
> (insn 10 9 11 (set (subreg:DI (reg/v:TI 119 [ i ]) 0)
> (ashift:DI (subreg:DI (reg/v:TI 118 [ i ]) 8)
> (const_int 0 [0]))) "100694.c":4:6 -1
>  (nil))
> 
> (insn 11 10 0 (set (subreg:DI (reg/v:TI 119 [ i ]) 8)
> (const_int 0 [0])) "100694.c":4:6 -1
>  (nil))
> 
> What the what.

On rs6000, the insn 10 is optimized at forward propagation pass.
test.c.261r.fwprop1:
(insn 10 5 11 2 (set (subreg:DI (reg/v:TI 119 [ i ]) 8)
(reg/v:DI 122 [ hi ])) "test.c":4:6 670 {*movdi_internal64}
 (expr_list:REG_DEAD (reg:DI 126 [ i ])

Seems aarch64 optimizes it at expand pass.

Now the problem is "ior" operation is done with TImode on rs6000 while it is
done with two subreg:DI on aarch64.  The subreg pass can decomposes the
register which is always used by subreg. If the ior is done with two subreg:DI
on rs6000, it can be optimized by subreg pass. 

on rs6000:
(insn 14 13 15 2 (set (reg:TI 125 [ i ])
(ior:TI (reg:TI 124 [ lo ])
(reg/v:TI 119 [ i ]))) "test.c":5:6 494 {*boolti3_internal}

on aarch64
(insn 21 20 22 2 (set (reg:DI 100)
(ior:DI (subreg:DI (reg:TI 99) 0)
(subreg:DI (reg/v:TI 94 [ i ]) 0))) "/app/example.c":5:6 521
{iordi3}
(insn 23 22 24 2 (set (reg:DI 101)
(ior:DI (subreg:DI (reg:TI 99) 8)
(subreg:DI (reg/v:TI 94 [ i ]) 8))) "/app/example.c":5:6 521
{iordi3}

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

--- Comment #4 from Segher Boessenkool  ---
On aarch64 we have (in expand):

;; i_4 = i_3 << 64;

(insn 10 9 11 (set (subreg:DI (reg/v:TI 94 [ i ]) 8)
(subreg:DI (reg/v:TI 93 [ i ]) 0)) "100694.c":4:6 -1
 (nil))

(insn 11 10 0 (set (subreg:DI (reg/v:TI 94 [ i ]) 0)
(const_int 0 [0])) "100694.c":4:6 -1
 (nil))

But on rs6000 we get:

;; i_4 = i_3 << 64;

(insn 10 9 11 (set (subreg:DI (reg/v:TI 119 [ i ]) 0)
(ashift:DI (subreg:DI (reg/v:TI 118 [ i ]) 8)
(const_int 0 [0]))) "100694.c":4:6 -1
 (nil))

(insn 11 10 0 (set (subreg:DI (reg/v:TI 119 [ i ]) 8)
(const_int 0 [0])) "100694.c":4:6 -1
 (nil))

What the what.

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

--- Comment #3 from Segher Boessenkool  ---
Should this not be handled by the subreg passes?

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2022-07-04 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #2 from Roger Sayle  ---
On x86, I proposed tackling this type of poor code generation issue for TImode
operations by introducing a zero_extendditi2 pattern.  Currently rs6000.md
(also)
doesn't provide a zero extension operation from DImode to TImode, so the
middle-end expands things using SUBREGs, which unfortunately interferes with
combine's ability to optimize things.  Improving x86_64's TImode operations
is still a work in progress, but a patch for issues similar to rs6000.md's was
posted here: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596165.html
Perhaps similar zero_extend and *concat operations would help on powerpc*?

[Bug target/100694] PPC: initialization of __int128 is very inefficient

2021-05-20 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100694

Segher Boessenkool  changed:

   What|Removed |Added

   Last reconfirmed||2021-05-20
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Segher Boessenkool  ---
The important difference between powerpc64 and aarch64 is that the store
is in TImode for powerpc64, but as two DImode stores for aarch64, right
after expand already (and before expand the code was identical).

Confirmed.