[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:306713c953d509720dc394c43c0890548bb0ae07

commit r14-8393-g306713c953d509720dc394c43c0890548bb0ae07
Author: Tamar Christina 
Date:   Wed Jan 24 15:56:50 2024 +

AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

The AArch64 vector PCS does not allow simd calls with simdlen 1,
however due to a bug we currently do allow it for num == 0.

This causes us to emit a symbol that doesn't exist and we fail to link.

gcc/ChangeLog:

PR tree-optimization/113552
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1.

gcc/testsuite/ChangeLog:

PR tree-optimization/113552
* gcc.target/aarch64/pr113552.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #11 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d5d43dc399bb0f15084827c59a025189c630afdd

commit r14-8357-gd5d43dc399bb0f15084827c59a025189c630afdd
Author: Richard Biener 
Date:   Tue Jan 23 12:53:04 2024 +0100

tree-optimization/113552 - fix num_call accounting in simd clone
vectorization

The following avoids using exact_log2 on the number of SIMD clone calls
to be emitted when vectorizing calls since that can easily be not
a power of two in which case it will return -1.  For different simd
clones the number of calls will differ by a multiply with a power of two
only so using floor_log2 is good enough here.

PR tree-optimization/113552
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
floor_log2 instead of exact_log2 on the number of calls.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #10 from Richard Biener  ---
I'll fix the exact_log2 issue.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Tamar Christina  ---
(In reply to Richard Biener from comment #7)
> So - the target should reject this clone or not generate it in the first
> place.  And of course the cost thing should be fixed which will likely mask
> the issue in the target.

Yeah, looks like there's a bug in
aarch64_simd_clone_compute_vecsize_and_simdlen that's also present on the
branches.  I'll submit a patch.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #8 from Richard Biener  ---
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 09749ae3817..1ddbe7a2f6b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4071,7 +4071,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
|| (nargs != simd_nargs))
  continue;
if (num_calls != 1)
- this_badness += exact_log2 (num_calls) * 4096;
+ this_badness += floor_log2 (num_calls) * 4096 + num_calls;
if (n->simdclone->inbranch)
  this_badness += 8192;
int target_badness = targetm.simd_clone.usable (n);


"fixes" it

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #7 from Richard Biener  ---
OK, maybe the costing is simply not taking into account that we chose the
simdlen == 1 variant which _does_ exist!  It's the chosen one:

4052bestn = cgraph_node::get (simd_clone_info[0]);
(gdb) p bestn
$5 = 
(gdb) p bestn->simdclone->simdlen 
$6 = {coeffs = {1, 0}}

and it's usable

4077int target_badness = targetm.simd_clone.usable (n);
4078if (target_badness < 0)

(returns 0)

But note we do

4073if (num_calls != 1)
4074  this_badness += exact_log2 (num_calls) * 4096;

which of course is quite bogus since we have 12 calls and exact_log2 will
return -1 here.  Maybe we want ceil_log2 here.

when we try the simdlen == 2 variant that also turns out usable but
the calculates badness is the same so we stick to the simdlen == 1 one.

So - the target should reject this clone or not generate it in the first
place.  And of course the cost thing should be fixed which will likely mask
the issue in the target.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Richard Biener  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Richard Biener  ---
(In reply to Tamar Christina from comment #5)
> __attribute__ ((__simd__ ("notinbranch"), const))
> double cos (double);

So here the backend is then probably responsible to parse this into a valid
list of simdlen cases.

> void foo (float *a, double *b)
> {
> for (int i = 0; i < 12; i+=3)
>   {
> b[i] = cos (5.0 * a[i]);
> b[i+1] = cos (5.0 * a[i+1]);
> b[i+2] = cos (5.0 * a[i+2]);
>   }
> }
> 
> Simple C example that shows the problem.
> 
> This seems to happen when SLP succeeds and the group size is a non power of
> two.
> The vectorizer then unrolls to make it a power of two and during
> vectorization
> it seems to destroy the vector, make the call and reconstruct it.
> 
> So this seems like an SLP vectorization bug.  I can't seem to trigger it
> however on GCC < 14 since SLP consistently fails for all my examples because
> it tries a mode that's larger than the vector size.

On the 13 branch and x86_64 the above results in a large VF and using
_ZGVbN2v_cos, same on trunk.

> So It may be a GCC 14 only regression, but I think it's latent in the
> vectorizer.

I think there's sth odd with the backend here, but I can confirm the
behavior.  Note it analyzes and costs VF == 4 and V2DF resulting in
6 calls but then code generation comes along doing sth different!?

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Tamar Christina  ---
__attribute__ ((__simd__ ("notinbranch"), const))
double cos (double);

void foo (float *a, double *b)
{
for (int i = 0; i < 12; i+=3)
  {
b[i] = cos (5.0 * a[i]);
b[i+1] = cos (5.0 * a[i+1]);
b[i+2] = cos (5.0 * a[i+2]);
  }
}

Simple C example that shows the problem.

This seems to happen when SLP succeeds and the group size is a non power of
two.
The vectorizer then unrolls to make it a power of two and during vectorization
it seems to destroy the vector, make the call and reconstruct it.

So this seems like an SLP vectorization bug.  I can't seem to trigger it
however on GCC < 14 since SLP consistently fails for all my examples because it
tries a mode that's larger than the vector size.

So It may be a GCC 14 only regression, but I think it's latent in the
vectorizer.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #4 from Tamar Christina  ---
(In reply to nsz from comment #2)
> is this fortran only?
> 

No it should be C as well, I was just reducing from a Fortran workload that
failed so I can see what the vectorizer was doing.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #3 from Tamar Christina  ---
(In reply to Richard Biener from comment #1)
> Hum, the vectorizer looks at the simd specs and if it says 1-lane variants
> (simdlen == 1) are available it will happily create them.
>

My understanding is that the spec just says "All SIMD variants are available"
but technically V1DF is FP not SIMD. 

> Can you provide the testcase amended with the used SIMD "declarations"
> (as with the fortran syntax or with a C testcase)?

fair point:

!GCC$ builtin (cos) attributes simd (notinbranch)

  SUBROUTINE a(b)
  DIMENSION b(3,0)
  COMMON c
  DO 4 m=1,c
 DO 4 d=1,3
 b(d,m)=b(d,m)+COS(5.0D00*m)
   4  CONTINUE
  END
  DIMENSION e(53)
  DIMENSION f(6,91),g(6,91),h(6,91),
 *  i(6,91),j(6,91),k(6,86)
  DIMENSION l(107)
  END

where just

aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast -w cosmo.fppized3.f

is enough.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #2 from nsz at gcc dot gnu.org ---
is this fortran only?

glibc release is in a week, we can still do something (or backport a fix).

the vector abi does not allow 1 lane in this case
https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#L867

c annotation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/bits/math-vector.h;h=04837bdcd7c0d0ce91192e09fc2d6614cae289c2;hb=HEAD
fortran annotation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/finclude/math-vector-fortran.h;h=92e15f0d6a758258f5728e628bbb2422b176fa95;hb=HEAD

i think the bug can be reproduced with older glibc by adding

!GCC$ builtin (cos) attributes simd (notinbranch)

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Priority|P1  |P2
 CC||rguenth at gcc dot gnu.org
   Last reconfirmed||2024-01-23
 Status|UNCONFIRMED |WAITING
   Target Milestone|14.0|11.5

--- Comment #1 from Richard Biener  ---
Hum, the vectorizer looks at the simd specs and if it says 1-lane variants
(simdlen == 1) are available it will happily create them.

Can you provide the testcase amended with the used SIMD "declarations"
(as with the fortran syntax or with a C testcase)?

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P1
  Component|middle-end  |tree-optimization