[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #18 from kugan at gcc dot gnu.org ---
Also, can we set INT_MAX when there is no explicit safelen specified in OMP.
Something like:

--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6975,14 +6975,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimple_seq *dlist,
 {
   tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
OMP_CLAUSE_SAFELEN);
-  poly_uint64 safe_len;
-  if (c == NULL_TREE
- || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), _len)
- && maybe_gt (safe_len, sctx.max_vf)))
+  if (c == NULL_TREE)
{
  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
  OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node,
-  sctx.max_vf);
+  INT_MAX);
  OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (ctx->stmt);
  gimple_omp_for_set_clauses (ctx->stmt, c);
}

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #11)
> (In reply to kugan from comment #9)
> > Looking at the options, looks to me that making loop->safelen a poly_in is
> > the way to go. (In reply to Jakub Jelinek from comment #4)
> > > The OpenMP safelen clause argument is a scalar integer, so using poly_int
> > > for something that must be an int doesn't make sense.
> > > Though, the above testcase actually doesn't use safelen clause, so safelen
> > > is there effectively infinity.
> > Thanks. I was looking at this to see if there is a way to handle this
> > differently. Looks to me that making loop->safelen a poly_int is the way to
> > handle at least the case when omp safelen clause is not provided.
> 
> Why?
> Then it just is INT_MAX value, which is a magic value that says that it is
> infinity.
> No need to say it is a poly_int infinity.

For this test case, omp_max_vf gets [16, 16] from the backend. This then
becomes 16. If we keep it as poly_int, it would pass maybe_lt (max_vf, min_vf))
after applying safelen?

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #10 from kugan at gcc dot gnu.org ---
Created attachment 57946
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57946=edit
patch

patch to make loop->safelen a poly_int

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #9 from kugan at gcc dot gnu.org ---
Looking at the options, looks to me that making loop->safelen a poly_in is the
way to go. (In reply to Jakub Jelinek from comment #4)
> The OpenMP safelen clause argument is a scalar integer, so using poly_int
> for something that must be an int doesn't make sense.
> Though, the above testcase actually doesn't use safelen clause, so safelen
> is there effectively infinity.
Thanks. I was looking at this to see if there is a way to handle this
differently. Looks to me that making loop->safelen a poly_int is the way to
handle at least the case when omp safelen clause is not provided. I am
interested in looking into this. Any suggestions? Here is a completely untested
diff that makes loop->safelen a poly_int.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 114653, which changed state.

Bug 114653 Summary: Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #8 from kugan at gcc dot gnu.org ---
*** Bug 114653 has been marked as a duplicate of this bug. ***

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from kugan at gcc dot gnu.org ---
Duplicate

*** This bug has been marked as a duplicate of bug 114635 ***

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #5 from kugan at gcc dot gnu.org ---
ddd for the :
 ref_a: 
_57 = D.4803[_20];
  ref_b: 
D.4803[_20] = _ifc__174;

We get DDR_ARE_DEPENDENT (ddr) == chrec_dont_know. Hence apply_safelen ().

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #4 from kugan at gcc dot gnu.org ---
This particular loop has loop->safelen set to 16. Does this mean this can never
be loop vectorized for VLA?

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #3 from kugan at gcc dot gnu.org ---
For SVE mode in vect_analyze_loop_2, we have

(gdb) p min_vf
$15 = {coeffs = {4, 4}}
(gdb) p max_vf
$16 = 16

Thus maybe_lt (max_vf, min_vf)) is false. This results in bad data dependence.

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #2 from kugan at gcc dot gnu.org ---
Thanks. I see the following in the log:
test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V4SF


test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx16QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V8QI

test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx8QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V4HI

test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx4QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V2SI

test.cpp:22:19: note:   worklist: examine stmt: _57 = D.4803[_20];
test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx2QI
test.cpp:22:19: missed: couldn't vectorize loop
test.cpp:22:19: missed: bad data dependence.

[Bug middle-end/114653] New: Not vectoring the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

Bug ID: 114653
   Summary: Not vectoring the loop with openmp reduction.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57910
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57910=edit
testcase

Main loop in the attached test case is not vectorized with -fopenmp. It gets
vectorized with -fopenmp-simd.

In the case of -fopenmp reduction variables lax,lay,laz gets assigned to an
array. data reference calculation for this seem to fail. See:

offset from base address: (ssizetype) ((sizetype) _20 * 4)
constant offset from base address: 0
step: 0
base alignment: 16
base misalignment: 0
offset alignment: 4
step alignment: 128
base_object: D.4806[_20]
Creating dr for D.4808[_20]
analyze_innermost: Applying pattern match.pd:219, generic-match-1.cc:3190
test.cpp:37:9: missed:  failed: evolution of offset is not affine.


command used: 
 test.cpp -Ofast -fopenmp -mcpu=neoverse-v2


gcc -v:
Using built-in specs.
COLLECT_GCC=/home/kvivekananda/install/bin/gcc
COLLECT_LTO_WRAPPER=/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/14.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-multiarch=yes
--enable-languages=c,c++,fortran,lto --disable-bootstrap
--prefix=/home/kvivekananda/install
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.1 20240314 (experimental) (GCC)

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #5 from kugan at gcc dot gnu.org ---
 -O3 -fno-tree-vectorize  and -O3 -fno-tree-vrp works. I looked at the ever
dump and it is not doing anything suspicious. Looks like range_info usage in
vectoriser is causing the problem.

[Bug libgomp/113698] GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-02-09 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698

--- Comment #4 from kugan at gcc dot gnu.org ---
Thanks for looking into this. The main reason we ere seeing performance issue
turned out to be due to glibc malloc issue in
https://sourceware.org/bugzilla/show_bug.cgi?id=30945

[Bug libgomp/113698] New: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-01-31 Thread kugan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698

Bug ID: 113698
   Summary: GNU OpenMP with OMP_PROC_BIND alters thread affinity
in a way that negatively affects performance
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57275
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57275=edit
testcase

When OMP_PROC_BIND=true it seems gomp set the affinity even before main()
starts. In particular, the main thread gets affinity 0x1 (i.e. pinned to the
first core). For the attached, I get

$ OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:ae511020
aff:...
duration: 402.949 msec

$ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:fffdded50020
aff:...0001
duration: 7879.59 msec

$ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:ae54c020
aff:...0001
duration: 311219 msec

Compiler options used:
gcc -O0 -fopenmp repro.c

gcc -v:


Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/11/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release
--build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)