[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2022-06-16 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

HaoChen Gui  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Assignee|unassigned at gcc dot gnu.org  |guihaoc at gcc dot 
gnu.org
 Resolution|--- |FIXED

--- Comment #18 from HaoChen Gui  ---
Fixed by r13-1131

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2022-06-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #17 from CVS Commits  ---
The master branch has been updated by HaoChen Gui :

https://gcc.gnu.org/g:8d1c6e7038b0c281ac2678f2f615806a7aac9174

commit r13-1131-g8d1c6e7038b0c281ac2678f2f615806a7aac9174
Author: Haochen Gui 
Date:   Mon May 30 09:12:34 2022 +0800

rs6000: add V1TI into vector comparison expand [PR103316]

This patch adds V1TI mode into a new mode iterator used in vector
comparison,shift and rotation expands.  It also merges some vector comparison,
shift and rotation expands for V1T1 and other vector integer modes as they have
the similar patterns.  The expands for V1TI only are removed.

gcc/
PR target/103316
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
Enable
gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
* config/rs6000/vector.md (VEC_IC): New mode iterator.  Add support
for new Power10 V1TI instructions.
(vec_cmp): Set mode iterator to VEC_IC.
(vec_cmpu): Likewise.
(vector_nlt): Set mode iterator to VEC_IC.
(vector_nltv1ti): Remove.
(vector_gtu): Set mode iterator to VEC_IC.
(vector_gtuv1ti): Remove.
(vector_nltu): Set mode iterator to VEC_IC.
(vector_nltuv1ti): Remove.
(vector_geu): Set mode iterator to VEC_IC.
(vector_ngt): Likewise.
(vector_ngtv1ti): Remove.
(vector_ngtu): Set mode iterator to VEC_IC.
(vector_ngtuv1ti): Remove.
(vector_gtu__p): Set mode iterator to VEC_IC.
(vector_gtu_v1ti_p): Remove.
(vrotl3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
(vrotlv1ti3): Remove.
(vashr3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
(vashrv1ti3): Remove.

gcc/testsuite/
PR target/103316
* gcc.target/powerpc/pr103316.c: New.
* gcc.target/powerpc/fold-vec-cmp-int128.c: New.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2022-03-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #16 from Segher Boessenkool  ---
There are many patterns that use VEC_I, and not all have a V1TI variant
currently, so adding V1TI to it is not suitable for now.  It is better to
add a new iterator for now.

This whole thing desperately needs a big cleanup :-(

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2022-03-06 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

HaoChen Gui  changed:

   What|Removed |Added

 CC||guihaoc at gcc dot gnu.org

--- Comment #15 from HaoChen Gui  ---
I drafted a patch to define separate expanders for V1TI. It works. I wonder if
I shall add V1TI into VEC_I and define an unified expander for V16QI V8HI V4SI
V2DI and V1TI. Also some insn patterns should be merged then. Please advice.
Thanks a lot.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #14 from Bill Schmidt  ---
(In reply to Segher Boessenkool from comment #13)
> (In reply to Bill Schmidt from comment #11)
> > As I mentioned privately, we could do with an audit of our implementation of
> > standard patterns in general, since  we tend to find such missing cases more
> > often than I'd like...
> 
> It would be great if there was some standard way of seeing what targets
> implement (and do not implement!) what.  It will be hugely useful when
> implementing a new port for example, but also great when maintaining one.

I agree.  Some tooling around this would be a big benefit to the community. 
Maybe some GSOC person would like to do that. :)

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #13 from Segher Boessenkool  ---
(In reply to Bill Schmidt from comment #11)
> As I mentioned privately, we could do with an audit of our implementation of
> standard patterns in general, since  we tend to find such missing cases more
> often than I'd like...

It would be great if there was some standard way of seeing what targets
implement (and do not implement!) what.  It will be hugely useful when
implementing a new port for example, but also great when maintaining one.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #12 from Segher Boessenkool  ---
When is the lowering done currently?  Only for the ops that have no other way
of doing, and things are merged back to an __int128 immediately after that?

If that is what is going on, then that is unfortunate, but it may the best
we can do yes.  Sigh.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #11 from Bill Schmidt  ---
As I mentioned privately, we could do with an audit of our implementation of
standard patterns in general, since  we tend to find such missing cases more
often than I'd like...

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #10 from Bill Schmidt  ---
FWIW, I think the vector lowering pass is reasonable.  These things always look
horrible in isolation, but the lowering allows more optimization when the
target doesn't really support the data type.

This is just an oversight in our back end, and once we correct it, this should
all fall out nicely (I hope).

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #9 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #7)
> > I still think it would be best if Gimple did *never* split data.  It
> > simply does not know enough about the machine and what the eventual
> > machine code will be like to do so advantageously.  This is the kind
> > of thing that RTL can do much better, is much better positioned to do
> > (and in fact it does do this, in all subregN passes).
> 
> Well, we need to be able to RTL expand the GIMPLE and vector lowering
> will ensure we can.  Otherwise we'll just ICE ;)

Aha.  But RTL can handle this itself already.  There is just a pass ordering
problem maybe?

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #8 from Segher Boessenkool  ---
Btw:

>   mfvsrd 9,34
>   mfvsrld 8,34
>   mfvsrd 11,35
>   mfvsrld 10,35
>   li 7,1
>   cmpd 0,9,11
>   bgt 0,.L2
>   cmpld 0,9,11
>   beq 0,.L5
> .L3:
>   li 7,0

The fall-through here makes the code worse.

> .L2:
>   subfic 10,7,0
>   subfe 11,11,11

And we shouldn't generate this for p10 at all anyway!  Something
with setbc would be better.

If there were no branches here RTL could have made the code a bit
more reasonable again, but with branches, no such chance :-(

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #7 from Richard Biener  ---
(In reply to Segher Boessenkool from comment #6)
> Ah, now I see.  Thanks!
> 
> Power10 has some new 128-bit insns (and p9 and p8 did before, too).
> 
> I still think it would be best if Gimple did *never* split data.  It
> simply does not know enough about the machine and what the eventual
> machine code will be like to do so advantageously.  This is the kind
> of thing that RTL can do much better, is much better positioned to do
> (and in fact it does do this, in all subregN passes).

Well, we need to be able to RTL expand the GIMPLE and vector lowering
will ensure we can.  Otherwise we'll just ICE ;)

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #6 from Segher Boessenkool  ---
Ah, now I see.  Thanks!

Power10 has some new 128-bit insns (and p9 and p8 did before, too).

I still think it would be best if Gimple did *never* split data.  It
simply does not know enough about the machine and what the eventual
machine code will be like to do so advantageously.  This is the kind
of thing that RTL can do much better, is much better positioned to do
(and in fact it does do this, in all subregN passes).

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #5 from Bill Schmidt  ---
At first glance, this is probably because vector.md's definition of
vec_cmp isn't defined for V1TImode.  Probably needs to be changed
to use VEC_IP rather than VEC_I and implement all the cases for 128-bit.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #4 from Bill Schmidt  ---
Above was compiled with -O2 -mcpu=power10.

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #3 from Bill Schmidt  ---
Sure.  Consider:

#include 

vector bool __int128
foo (vector signed __int128 a, vector signed __int128 b)
{
  return vec_cmpgt (a, b);
}

With gimple folding we emulate in 64-bit mode:

mfvsrd 9,34
mfvsrld 8,34
mfvsrd 11,35
mfvsrld 10,35
li 7,1
cmpd 0,9,11
bgt 0,.L2
cmpld 0,9,11
beq 0,.L5
.L3:
li 7,0
.L2:
subfic 10,7,0
subfe 11,11,11
mtvsrdd 34,11,10
blr
.p2align 4,,15
.L5:
cmpld 0,8,10
bgt 0,.L2
b .L3

The desired code generation is just

vcmpgtsq 2,2,3

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

--- Comment #2 from Segher Boessenkool  ---
Do you maybe have some simple example (of what we generate, and what you say
it should be)?

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.0|---

[Bug target/103316] PowerPC: Gimple folding of int128 comparisons produces suboptimal code

2021-11-18 Thread wschmidt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103316

Bill Schmidt  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
 CC||bergner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
 Ever confirmed|0   |1
 Target||powerpc*-*-*
   Last reconfirmed||2021-11-18
   Target Milestone|--- |12.0

--- Comment #1 from Bill Schmidt  ---
Confirmed.