[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #18 from Rainer Orth --- SPARC testsuite failures fixed for GCC 14.0.1.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #17 from GCC Commits --- The master branch has been updated by Rainer Orth : https://gcc.gnu.org/g:96b63fa255e343bb9b3e7f77302213a91ce96293 commit r14-9427-g96b63fa255e343bb9b3e7f77302213a91ce96293 Author: Rainer Orth Date: Mon Mar 11 15:45:17 2024 +0100 testsuite: vect: Require vect_perm in several tests [PR114071, PR113557, PR96109] Several vectorization tests FAIL on 32 and 64-bit Solaris/SPARC: FAIL: gcc.dg/vect/pr37027.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/pr37027.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/pr67790.c -flto -ffat-lto-objects scan-tree-dump vect "vectorizing stmts using SLP" FAIL: gcc.dg/vect/pr67790.c scan-tree-dump vect "vectorizing stmts using SLP" FAIL: gcc.dg/vect/slp-47.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-48.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 FAIL: gcc.dg/vect/slp-reduc-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-2.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-2.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-reduc-8.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" FAIL: gcc.dg/vect/slp-reduc-8.c scan-tree-dump vect "vectorized 1 loops" FAIL: gcc.dg/vect/vect-multi-peel-gaps.c -flto -ffat-lto-objects scan-tree-dump vect "LOOP VECTORIZED" FAIL: gcc.dg/vect/vect-multi-peel-gaps.c scan-tree-dump vect "LOOP VECTORIZED" The dumps show variations of /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17: note: ==> examining statement: _4 = a[i_19].f2; /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17: missed: unsupported vect permute { 1 0 3 2 5 4 } /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:24:17: missed: unsupported load permutation /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/pr37027.c:27:17: missed: not vectorized: relevant stmt not supported: _4 = a[i_19].f2; so I think the tests should require vect_perm. This is what this patch does Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11. 2024-02-22 Rainer Orth gcc/testsuite: PR tree-optimization/114071 * gcc.dg/vect/pr37027.c: Require vect_perm. * gcc.dg/vect/pr67790.c: Likewise. * gcc.dg/vect/slp-reduc-1.c: Likewise. * gcc.dg/vect/slp-reduc-2.c: Likewise. * gcc.dg/vect/slp-reduc-7.c: Likewise. * gcc.dg/vect/slp-reduc-8.c: Likewise. PR tree-optimization/113557 * gcc.dg/vect/vect-multi-peel-gaps.c (scan-tree-dump): Also require vect_perm. PR testsuite/96109 * gcc.dg/vect/slp-47.c: Require vect_perm. * gcc.dg/vect/slp-48.c: Likewise.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #16 from Rainer Orth --- The two tests still/again? FAIL on 32 and 64-bit Solaris/SPARC. If I understand the dumps correctly /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-47.c:9:21: note: ==> examining statement: _3 = y[_2]; /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-47.c:9:21: missed: unsupported vect permute { 1 0 3 2 5 4 } /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-47.c:9:21: missed: unsupported load permutation /vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-47.c:11:17: missed: not vectorized: relevant stmt not supported: _3 = y[_2]; the tests should also require vect_perm?
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #15 from Rainer Orth --- Created attachment 57507 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57507=edit current sparc-sun-solaris2.11 dumps
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #14 from David Binderman --- Sorry wrong bug report.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 David Binderman changed: What|Removed |Added CC||dcb314 at hotmail dot com --- Comment #13 from David Binderman --- The bug first seems to appear sometime between g:93f803d53b5ccaab and g:68f7cb6cf9e8b9f2, some 39 commits.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 Jakub Jelinek changed: What|Removed |Added Target Milestone|11.4|11.5 --- Comment #12 from Jakub Jelinek --- GCC 11.4 is being released, retargeting bugs to GCC 11.5.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 Richard Biener changed: What|Removed |Added Target Milestone|11.3|11.4 --- Comment #11 from Richard Biener --- GCC 11.3 is being released, retargeting bugs to GCC 11.4.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 Richard Biener changed: What|Removed |Added Summary|[11/12 Regression] |[11 Regression] |gcc.dg/vect/slp-47.c etc. |gcc.dg/vect/slp-47.c etc. |FAIL|FAIL Known to fail||11.2.0 Known to work||12.0 --- Comment #10 from Richard Biener --- Should be fixed on trunk, backporting the whole series of changes is likely not a good idea.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 --- Comment #6 from Richard Biener --- So the question is whether we can make vect_compute_data_ref_alignment vectype agnostic (to also fix the required same vectype for multi SLP nodes covering the same DRs) and instead make DR_MISALIGNMENT a function we pass vectype (and an element offset?) to, computing whether the step maintains alignment on the fly (and have the offset specify that "N-1 elements before the address" we do for negative step)? Basically only record info for the scalar access. Specifically I wonder whether this will work to determine the base_misaligned info and realigning the base.
[Bug testsuite/96109] [11 Regression] gcc.dg/vect/slp-47.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #5 from Richard Biener --- (In reply to Richard Biener from comment #3) > OK, it's indeed wrong which means we'd fall through the checks that prevent > SPARC from vectorizing here but then we'll create an unaligned access > anyway (because VMAT_STRIDED_SLP is too lazy to figure out appropriate > alignment). We're also assuming element alignment there. > > static double x[1024], y[1024]; > > void __attribute__((noipa)) > foo () > { > for (int i = 0; i < 511; ++i) > { > x[2*i] = y[1022 - 2*i - 1]; > x[2*i+1] = y[1022 - 2*i]; > } > } > > int main() > { > for (int i = 0; i < 1024; ++i) > x[i] = 0, y[i] = i; > foo (); > for (int i = 0; i < 1022; ++i) > if (x[i] != y[1022 - (i^1)]) > __builtin_abort (); > if (x[1022] != 0 || x[1023] != 0) > __builtin_abort (); > return 0; > } So for example on aarch64-linux with -O3 -mstrict-align -fno-vect-cost-model we analyze this as x.c:6:21: note: vect_model_load_cost: aligned. x.c:6:21: note: vect_model_load_cost: inside_cost = 3, prologue_cost = 0 . but correctly emit an unaligned load (dump with -gimple to see the alignment) _20 = + 8168ul; ... _19 = __PHI (__BB5: _16, __BB2: _20); ... _15 = __MEM ((double *)_19); and correctly (but not optimal) expand it via extract-bit-field to ;; _15 = MEM [(double *)ivtmp_19]; (insn 12 11 13 (clobber (reg:V2DF 95 [ _15 ])) "x.c":8:17 -1 (nil)) (insn 13 12 14 (set (subreg:DI (reg:V2DF 95 [ _15 ]) 0) (mem:DI (reg:DI 92 [ ivtmp.13 ]) [1 MEM [(double *)ivtmp_19]+0 S8 A64])) "x.c":8:17 -1 (nil)) (insn 14 13 0 (set (subreg:DI (reg:V2DF 95 [ _15 ]) 8) (mem:DI (plus:DI (reg:DI 92 [ ivtmp.13 ]) (const_int 8 [0x8])) [1 MEM [(double *)ivtmp_19]+8 S8 A64])) "x.c":8:17 -1 (nil)) so we're not getting a runtime fail here but clearly the vectorizers idea of alignment of the access is bogus (and if VMAT_STRIDED_SLP were less lazy and trusted the computed alignment info we'd miscompile). I guess we're getting away with this because RTL expansion has fallback code to correctly expand misaligned accesses on strict-align targets. But clearly it's not what the vectorizer costs (it also oddly costs a vec_construct when you enable the cost model but doesn't emit any in the end).