[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-30 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #12 from Christophe Lyon  ---

> Can you open a new bugreport?

Sure, I filed PR94401

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-30 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #11 from rguenther at suse dot de  ---
On Mon, 30 Mar 2020, clyon at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332
> 
> Christophe Lyon  changed:
> 
>What|Removed |Added
> 
>  CC||clyon at gcc dot gnu.org
> 
> --- Comment #10 from Christophe Lyon  ---
> Hi,
> This caused a regression on aarch64:
> FAIL: gcc.dg/vect/pr92420.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr92420.c execution test

Can you open a new bugreport?

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-30 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #10 from Christophe Lyon  ---
Hi,
This caused a regression on aarch64:
FAIL: gcc.dg/vect/pr92420.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr92420.c execution test

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-27 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Kewen Lin  ---
Should be fixed by latest trunk on ppc64le P9.

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-27 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:8d689cf43b501a2f5c077389adbb6d2bfa530ca9

commit r10-7415-g8d689cf43b501a2f5c077389adbb6d2bfa530ca9
Author: Kewen Lin 
Date:   Fri Mar 27 04:51:12 2020 -0500

Fix PR90332 by extending half size vector mode

As PR90332 shows, the current scalar epilogue peeling for gaps
elimination requires expected vec_init optab with two half size
vector mode.  On Power, we don't support vector mode like V8QI,
so can't support optab like vec_initv16qiv8qi.  But we want to
leverage existing scalar mode like DI to init the desirable
vector mode.  This patch is to extend the existing support for
Power, as evaluated on Power9 we can see expected 1.9% speed up
on SPEC2017 525.x264_r.

As Richi suggested, add one function vector_vector_composition_type
to refactor existing related codes and also make use of it further.

Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9,
as well as x86_64-redhat-linux.

gcc/ChangeLog

2020-03-27  Kewen Lin  

PR tree-optimization/90332
* tree-vect-stmts.c (vector_vector_composition_type): New function.
(get_group_load_store_type): Adjust to call
vector_vector_composition_type,
extend it to construct with scalar types.
(vectorizable_load): Likewise.

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2020-03-11 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org

--- Comment #7 from Kewen Lin  ---
(In reply to Richard Biener from comment #5)
> I don't see a vec_initv16qiv8qi on power either, so that might be it -
> there's no
> effective target for building a vector from halves (and I wonder how
> code-generation fares here).
> 
> So an option is to simply xfail for all but x86_64-*-* and i?86-*-* ...
> 
> Or try more fancy code-generation options (build from two large integer
> modes,
> but I don't see vec_initv2didi either).

It's wired, I found rs6000 has supported vec_initv2didi.
gcc/insn-opinit.c:  { 0x2f0a36, CODE_FOR_vec_initv2didi },

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-06-06 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #6 from ktkachov at gcc dot gnu.org ---
Author: ktkachov
Date: Thu Jun  6 13:59:07 2019
New Revision: 272002

URL: https://gcc.gnu.org/viewcvs?rev=272002=gcc=rev
Log:
[AArch64] PR tree-optimization/90332: Implement vec_init where N is a
vector mode

This patch fixes the failing gcc.dg/vect/slp-reduc-sad-2.c testcase on aarch64
by implementing a vec_init optab that can handle two half-width vectors
producing a full-width one
by concatenating them.

In the gcc.dg/vect/slp-reduc-sad-2.c case it's a V8QI reg concatenated with a
V8QI const_vector of zeroes.
This can be implemented efficiently using the aarch64_combinez pattern that
just loads a D-register to make
use of the implicit zero-extending semantics of that load.
Otherwise it concatenates the two vector using aarch64_simd_combine.

With this patch I'm seeing the effect from richi's original patch that added
gcc.dg/vect/slp-reduc-sad-2.c on aarch64
and 525.x264_r improves by about 1.5%.

PR tree-optimization/90332
* config/aarch64/aarch64.c (aarch64_expand_vector_init):
Handle VALS containing two vectors.
* config/aarch64/aarch64-simd.md (*aarch64_combinez): Rename
to...
(@aarch64_combinez): ... This.
(*aarch64_combinez_be): Rename to...
(@aarch64_combinez_be): ... This.
(vec_init): New define_expand.
* config/aarch64/iterators.md (Vhalf): Handle V8HF.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64-simd.md
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/config/aarch64/iterators.md

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #5 from Richard Biener  ---
I don't see a vec_initv16qiv8qi on power either, so that might be it - there's
no
effective target for building a vector from halves (and I wonder how
code-generation fares here).

So an option is to simply xfail for all but x86_64-*-* and i?86-*-* ...

Or try more fancy code-generation options (build from two large integer modes,
but I don't see vec_initv2didi either).

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-05-07 Thread seurer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #4 from seurer at gcc dot gnu.org ---
This still fails (just on power 9) even with the above change.  On all the
other powerpc64 targets it comes up as unsupported.

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-05-07 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
btw, I vaguely remember trying out Richard's patch back in stage 3 on aarch64
and it didn't end up triggering there because we had not implemented the
vec_initv16qiv8qi optab on aarch64, that is a vector construction of a v16qi
vector from a v8qi initialiser.

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

--- Comment #2 from Richard Biener  ---
Hmm, can't get the test to FAIL with a cross, somehow the dejagnu tests always
end up UNSUPPORTED.  The testcase for x86_64 has

/* With AVX256 or more we do not pull off the trick eliding the epilogue.  */
/* { dg-additional-options "-mprefer-avx128" { target { x86_64-*-* i?86-*-* } }
} */

so we require the use of V16QImode -> V4SImode SAD with the V16QImode loads
split into two V8QImode ones.  There were insufficient dejagnu effective
targets to model the restriction in

+ /* If the gap splits the vector in half and the target
+can do half-vector operations avoid the epilogue peeling
+by simply loading half of the vector only.  Usually
+the construction with an upper zero half will be elided.  */
+ dr_alignment_support alignment_support_scheme;
+ scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
+ machine_mode vmode;
+ if (overrun_p
+ && !masked_p
+ && (((alignment_support_scheme
+ = vect_supportable_dr_alignment (first_dr_info, false)))
+  == dr_aligned
+ || alignment_support_scheme == dr_unaligned_supported)
+ && known_eq (nunits, (group_size - gap) * 2)
+ && mode_for_vector (elmode, (group_size - gap)).exists ()
+ && VECTOR_MODE_P (vmode)
+ && targetm.vector_mode_supported_p (vmode)
+ && (convert_optab_handler (vec_init_optab,
+TYPE_MODE (vectype), vmode)
+ != CODE_FOR_nothing))
+   overrun_p = false;

I see we probably need hw_misalign, so does

Index: gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c
===
--- gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (revision 270899)
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (working copy)
@@ -25,5 +25,5 @@ int x264_pixel_sad_8x8( uint8_t *pix1, u

 /* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } }
*/
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue
loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue
loop" "vect" { xfail { ! vect_hw_misalign } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

fix everything?

[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails

2019-05-06 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332

Christophe Lyon  changed:

   What|Removed |Added

 Target|powerpc64*-unknown-linux-gn |powerpc64*-unknown-linux-gn
   |u   |u aarch64
 CC||clyon at gcc dot gnu.org

--- Comment #1 from Christophe Lyon  ---
Seen on aarch64 too.