[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #12 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:306713c953d509720dc394c43c0890548bb0ae07 commit r14-8393-g306713c953d509720dc394c43c0890548bb0ae07 Author: Tamar Christina Date: Wed Jan 24 15:56:50 2024 + AArch64: Do not allow SIMD clones with simdlen 1 [PR113552] The AArch64 vector PCS does not allow simd calls with simdlen 1, however due to a bug we currently do allow it for num == 0. This causes us to emit a symbol that doesn't exist and we fail to link. gcc/ChangeLog: PR tree-optimization/113552 * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1. gcc/testsuite/ChangeLog: PR tree-optimization/113552 * gcc.target/aarch64/pr113552.c: New test. * gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #11 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:d5d43dc399bb0f15084827c59a025189c630afdd commit r14-8357-gd5d43dc399bb0f15084827c59a025189c630afdd Author: Richard Biener Date: Tue Jan 23 12:53:04 2024 +0100 tree-optimization/113552 - fix num_call accounting in simd clone vectorization The following avoids using exact_log2 on the number of SIMD clone calls to be emitted when vectorizing calls since that can easily be not a power of two in which case it will return -1. For different simd clones the number of calls will differ by a multiply with a power of two only so using floor_log2 is good enough here. PR tree-optimization/113552 * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use floor_log2 instead of exact_log2 on the number of calls.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #10 from Richard Biener --- I'll fix the exact_log2 issue.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #9 from Tamar Christina --- (In reply to Richard Biener from comment #7) > So - the target should reject this clone or not generate it in the first > place. And of course the cost thing should be fixed which will likely mask > the issue in the target. Yeah, looks like there's a bug in aarch64_simd_clone_compute_vecsize_and_simdlen that's also present on the branches. I'll submit a patch.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #8 from Richard Biener --- diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 09749ae3817..1ddbe7a2f6b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4071,7 +4071,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, || (nargs != simd_nargs)) continue; if (num_calls != 1) - this_badness += exact_log2 (num_calls) * 4096; + this_badness += floor_log2 (num_calls) * 4096 + num_calls; if (n->simdclone->inbranch) this_badness += 8192; int target_badness = targetm.simd_clone.usable (n); "fixes" it
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #7 from Richard Biener --- OK, maybe the costing is simply not taking into account that we chose the simdlen == 1 variant which _does_ exist! It's the chosen one: 4052bestn = cgraph_node::get (simd_clone_info[0]); (gdb) p bestn $5 = (gdb) p bestn->simdclone->simdlen $6 = {coeffs = {1, 0}} and it's usable 4077int target_badness = targetm.simd_clone.usable (n); 4078if (target_badness < 0) (returns 0) But note we do 4073if (num_calls != 1) 4074 this_badness += exact_log2 (num_calls) * 4096; which of course is quite bogus since we have 12 calls and exact_log2 will return -1 here. Maybe we want ceil_log2 here. when we try the simdlen == 2 variant that also turns out usable but the calculates badness is the same so we stick to the simdlen == 1 one. So - the target should reject this clone or not generate it in the first place. And of course the cost thing should be fixed which will likely mask the issue in the target.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Richard Biener changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #6 from Richard Biener --- (In reply to Tamar Christina from comment #5) > __attribute__ ((__simd__ ("notinbranch"), const)) > double cos (double); So here the backend is then probably responsible to parse this into a valid list of simdlen cases. > void foo (float *a, double *b) > { > for (int i = 0; i < 12; i+=3) > { > b[i] = cos (5.0 * a[i]); > b[i+1] = cos (5.0 * a[i+1]); > b[i+2] = cos (5.0 * a[i+2]); > } > } > > Simple C example that shows the problem. > > This seems to happen when SLP succeeds and the group size is a non power of > two. > The vectorizer then unrolls to make it a power of two and during > vectorization > it seems to destroy the vector, make the call and reconstruct it. > > So this seems like an SLP vectorization bug. I can't seem to trigger it > however on GCC < 14 since SLP consistently fails for all my examples because > it tries a mode that's larger than the vector size. On the 13 branch and x86_64 the above results in a large VF and using _ZGVbN2v_cos, same on trunk. > So It may be a GCC 14 only regression, but I think it's latent in the > vectorizer. I think there's sth odd with the backend here, but I can confirm the behavior. Note it analyzes and costs VF == 4 and V2DF resulting in 6 calls but then code generation comes along doing sth different!?
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Tamar Christina changed: What|Removed |Added Status|WAITING |NEW --- Comment #5 from Tamar Christina --- __attribute__ ((__simd__ ("notinbranch"), const)) double cos (double); void foo (float *a, double *b) { for (int i = 0; i < 12; i+=3) { b[i] = cos (5.0 * a[i]); b[i+1] = cos (5.0 * a[i+1]); b[i+2] = cos (5.0 * a[i+2]); } } Simple C example that shows the problem. This seems to happen when SLP succeeds and the group size is a non power of two. The vectorizer then unrolls to make it a power of two and during vectorization it seems to destroy the vector, make the call and reconstruct it. So this seems like an SLP vectorization bug. I can't seem to trigger it however on GCC < 14 since SLP consistently fails for all my examples because it tries a mode that's larger than the vector size. So It may be a GCC 14 only regression, but I think it's latent in the vectorizer.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #4 from Tamar Christina --- (In reply to nsz from comment #2) > is this fortran only? > No it should be C as well, I was just reducing from a Fortran workload that failed so I can see what the vectorizer was doing.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 --- Comment #3 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Hum, the vectorizer looks at the simd specs and if it says 1-lane variants > (simdlen == 1) are available it will happily create them. > My understanding is that the spec just says "All SIMD variants are available" but technically V1DF is FP not SIMD. > Can you provide the testcase amended with the used SIMD "declarations" > (as with the fortran syntax or with a C testcase)? fair point: !GCC$ builtin (cos) attributes simd (notinbranch) SUBROUTINE a(b) DIMENSION b(3,0) COMMON c DO 4 m=1,c DO 4 d=1,3 b(d,m)=b(d,m)+COS(5.0D00*m) 4 CONTINUE END DIMENSION e(53) DIMENSION f(6,91),g(6,91),h(6,91), * i(6,91),j(6,91),k(6,86) DIMENSION l(107) END where just aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast -w cosmo.fppized3.f is enough.
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #2 from nsz at gcc dot gnu.org --- is this fortran only? glibc release is in a week, we can still do something (or backport a fix). the vector abi does not allow 1 lane in this case https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#L867 c annotation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/bits/math-vector.h;h=04837bdcd7c0d0ce91192e09fc2d6614cae289c2;hb=HEAD fortran annotation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/finclude/math-vector-fortran.h;h=92e15f0d6a758258f5728e628bbb2422b176fa95;hb=HEAD i think the bug can be reproduced with older glibc by adding !GCC$ builtin (cos) attributes simd (notinbranch)
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Priority|P1 |P2 CC||rguenth at gcc dot gnu.org Last reconfirmed||2024-01-23 Status|UNCONFIRMED |WAITING Target Milestone|14.0|11.5 --- Comment #1 from Richard Biener --- Hum, the vectorizer looks at the simd specs and if it says 1-lane variants (simdlen == 1) are available it will happily create them. Can you provide the testcase amended with the used SIMD "declarations" (as with the fortran syntax or with a C testcase)?
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Tamar Christina changed: What|Removed |Added Target Milestone|--- |14.0 Priority|P3 |P1 Component|middle-end |tree-optimization