[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #12 from Christophe Lyon --- > Can you open a new bugreport? Sure, I filed PR94401
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #11 from rguenther at suse dot de --- On Mon, 30 Mar 2020, clyon at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 > > Christophe Lyon changed: > >What|Removed |Added > > CC||clyon at gcc dot gnu.org > > --- Comment #10 from Christophe Lyon --- > Hi, > This caused a regression on aarch64: > FAIL: gcc.dg/vect/pr92420.c -flto -ffat-lto-objects execution test > FAIL: gcc.dg/vect/pr92420.c execution test Can you open a new bugreport?
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 Christophe Lyon changed: What|Removed |Added CC||clyon at gcc dot gnu.org --- Comment #10 from Christophe Lyon --- Hi, This caused a regression on aarch64: FAIL: gcc.dg/vect/pr92420.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr92420.c execution test
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #9 from Kewen Lin --- Should be fixed by latest trunk on ppc64le P9.
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #8 from CVS Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:8d689cf43b501a2f5c077389adbb6d2bfa530ca9 commit r10-7415-g8d689cf43b501a2f5c077389adbb6d2bfa530ca9 Author: Kewen Lin Date: Fri Mar 27 04:51:12 2020 -0500 Fix PR90332 by extending half size vector mode As PR90332 shows, the current scalar epilogue peeling for gaps elimination requires expected vec_init optab with two half size vector mode. On Power, we don't support vector mode like V8QI, so can't support optab like vec_initv16qiv8qi. But we want to leverage existing scalar mode like DI to init the desirable vector mode. This patch is to extend the existing support for Power, as evaluated on Power9 we can see expected 1.9% speed up on SPEC2017 525.x264_r. As Richi suggested, add one function vector_vector_composition_type to refactor existing related codes and also make use of it further. Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9, as well as x86_64-redhat-linux. gcc/ChangeLog 2020-03-27 Kewen Lin PR tree-optimization/90332 * tree-vect-stmts.c (vector_vector_composition_type): New function. (get_group_load_store_type): Adjust to call vector_vector_composition_type, extend it to construct with scalar types. (vectorizable_load): Likewise.
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 Kewen Lin changed: What|Removed |Added CC||linkw at gcc dot gnu.org --- Comment #7 from Kewen Lin --- (In reply to Richard Biener from comment #5) > I don't see a vec_initv16qiv8qi on power either, so that might be it - > there's no > effective target for building a vector from halves (and I wonder how > code-generation fares here). > > So an option is to simply xfail for all but x86_64-*-* and i?86-*-* ... > > Or try more fancy code-generation options (build from two large integer > modes, > but I don't see vec_initv2didi either). It's wired, I found rs6000 has supported vec_initv2didi. gcc/insn-opinit.c: { 0x2f0a36, CODE_FOR_vec_initv2didi },
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #6 from ktkachov at gcc dot gnu.org --- Author: ktkachov Date: Thu Jun 6 13:59:07 2019 New Revision: 272002 URL: https://gcc.gnu.org/viewcvs?rev=272002=gcc=rev Log: [AArch64] PR tree-optimization/90332: Implement vec_init where N is a vector mode This patch fixes the failing gcc.dg/vect/slp-reduc-sad-2.c testcase on aarch64 by implementing a vec_init optab that can handle two half-width vectors producing a full-width one by concatenating them. In the gcc.dg/vect/slp-reduc-sad-2.c case it's a V8QI reg concatenated with a V8QI const_vector of zeroes. This can be implemented efficiently using the aarch64_combinez pattern that just loads a D-register to make use of the implicit zero-extending semantics of that load. Otherwise it concatenates the two vector using aarch64_simd_combine. With this patch I'm seeing the effect from richi's original patch that added gcc.dg/vect/slp-reduc-sad-2.c on aarch64 and 525.x264_r improves by about 1.5%. PR tree-optimization/90332 * config/aarch64/aarch64.c (aarch64_expand_vector_init): Handle VALS containing two vectors. * config/aarch64/aarch64-simd.md (*aarch64_combinez): Rename to... (@aarch64_combinez): ... This. (*aarch64_combinez_be): Rename to... (@aarch64_combinez_be): ... This. (vec_init): New define_expand. * config/aarch64/iterators.md (Vhalf): Handle V8HF. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-simd.md trunk/gcc/config/aarch64/aarch64.c trunk/gcc/config/aarch64/iterators.md
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #5 from Richard Biener --- I don't see a vec_initv16qiv8qi on power either, so that might be it - there's no effective target for building a vector from halves (and I wonder how code-generation fares here). So an option is to simply xfail for all but x86_64-*-* and i?86-*-* ... Or try more fancy code-generation options (build from two large integer modes, but I don't see vec_initv2didi either).
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #4 from seurer at gcc dot gnu.org --- This still fails (just on power 9) even with the above change. On all the other powerpc64 targets it comes up as unsupported.
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #3 from ktkachov at gcc dot gnu.org --- btw, I vaguely remember trying out Richard's patch back in stage 3 on aarch64 and it didn't end up triggering there because we had not implemented the vec_initv16qiv8qi optab on aarch64, that is a vector construction of a v16qi vector from a v8qi initialiser.
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 --- Comment #2 from Richard Biener --- Hmm, can't get the test to FAIL with a cross, somehow the dejagnu tests always end up UNSUPPORTED. The testcase for x86_64 has /* With AVX256 or more we do not pull off the trick eliding the epilogue. */ /* { dg-additional-options "-mprefer-avx128" { target { x86_64-*-* i?86-*-* } } } */ so we require the use of V16QImode -> V4SImode SAD with the V16QImode loads split into two V8QImode ones. There were insufficient dejagnu effective targets to model the restriction in + /* If the gap splits the vector in half and the target +can do half-vector operations avoid the epilogue peeling +by simply loading half of the vector only. Usually +the construction with an upper zero half will be elided. */ + dr_alignment_support alignment_support_scheme; + scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype)); + machine_mode vmode; + if (overrun_p + && !masked_p + && (((alignment_support_scheme + = vect_supportable_dr_alignment (first_dr_info, false))) + == dr_aligned + || alignment_support_scheme == dr_unaligned_supported) + && known_eq (nunits, (group_size - gap) * 2) + && mode_for_vector (elmode, (group_size - gap)).exists () + && VECTOR_MODE_P (vmode) + && targetm.vector_mode_supported_p (vmode) + && (convert_optab_handler (vec_init_optab, +TYPE_MODE (vectype), vmode) + != CODE_FOR_nothing)) + overrun_p = false; I see we probably need hw_misalign, so does Index: gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c === --- gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (revision 270899) +++ gcc/testsuite/gcc.dg/vect/slp-reduc-sad-2.c (working copy) @@ -25,5 +25,5 @@ int x264_pixel_sad_8x8( uint8_t *pix1, u /* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } } */ /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ -/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue loop" "vect" } } */ +/* { dg-final { scan-tree-dump-not "access with gaps requires scalar epilogue loop" "vect" { xfail { ! vect_hw_misalign } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ fix everything?
[Bug tree-optimization/90332] New test case gcc.dg/vect/slp-reduc-sad-2.c in r270847 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90332 Christophe Lyon changed: What|Removed |Added Target|powerpc64*-unknown-linux-gn |powerpc64*-unknown-linux-gn |u |u aarch64 CC||clyon at gcc dot gnu.org --- Comment #1 from Christophe Lyon --- Seen on aarch64 too.