[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #18 from kugan at gcc dot gnu.org --- Also, can we set INT_MAX when there is no explicit safelen specified in OMP. Something like: --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6975,14 +6975,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, { tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt), OMP_CLAUSE_SAFELEN); - poly_uint64 safe_len; - if (c == NULL_TREE - || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), _len) - && maybe_gt (safe_len, sctx.max_vf))) + if (c == NULL_TREE) { c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN); OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node, - sctx.max_vf); + INT_MAX); OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (ctx->stmt); gimple_omp_for_set_clauses (ctx->stmt, c); }
[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #12 from kugan at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #11) > (In reply to kugan from comment #9) > > Looking at the options, looks to me that making loop->safelen a poly_in is > > the way to go. (In reply to Jakub Jelinek from comment #4) > > > The OpenMP safelen clause argument is a scalar integer, so using poly_int > > > for something that must be an int doesn't make sense. > > > Though, the above testcase actually doesn't use safelen clause, so safelen > > > is there effectively infinity. > > Thanks. I was looking at this to see if there is a way to handle this > > differently. Looks to me that making loop->safelen a poly_int is the way to > > handle at least the case when omp safelen clause is not provided. > > Why? > Then it just is INT_MAX value, which is a magic value that says that it is > infinity. > No need to say it is a poly_int infinity. For this test case, omp_max_vf gets [16, 16] from the backend. This then becomes 16. If we keep it as poly_int, it would pass maybe_lt (max_vf, min_vf)) after applying safelen?
[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #10 from kugan at gcc dot gnu.org --- Created attachment 57946 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57946=edit patch patch to make loop->safelen a poly_int
[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #9 from kugan at gcc dot gnu.org --- Looking at the options, looks to me that making loop->safelen a poly_in is the way to go. (In reply to Jakub Jelinek from comment #4) > The OpenMP safelen clause argument is a scalar integer, so using poly_int > for something that must be an int doesn't make sense. > Though, the above testcase actually doesn't use safelen clause, so safelen > is there effectively infinity. Thanks. I was looking at this to see if there is a way to handle this differently. Looks to me that making loop->safelen a poly_int is the way to handle at least the case when omp safelen clause is not provided. I am interested in looking into this. Any suggestions? Here is a completely untested diff that makes loop->safelen a poly_int.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 114653, which changed state. Bug 114653 Summary: Not vectorizing the loop with openmp reduction. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 kugan at gcc dot gnu.org changed: What|Removed |Added CC||kugan at gcc dot gnu.org --- Comment #8 from kugan at gcc dot gnu.org --- *** Bug 114653 has been marked as a duplicate of this bug. ***
[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 kugan at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #6 from kugan at gcc dot gnu.org --- Duplicate *** This bug has been marked as a duplicate of bug 114635 ***
[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #5 from kugan at gcc dot gnu.org --- ddd for the : ref_a: _57 = D.4803[_20]; ref_b: D.4803[_20] = _ifc__174; We get DDR_ARE_DEPENDENT (ddr) == chrec_dont_know. Hence apply_safelen ().
[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #4 from kugan at gcc dot gnu.org --- This particular loop has loop->safelen set to 16. Does this mean this can never be loop vectorized for VLA?
[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #3 from kugan at gcc dot gnu.org --- For SVE mode in vect_analyze_loop_2, we have (gdb) p min_vf $15 = {coeffs = {4, 4}} (gdb) p max_vf $16 = 16 Thus maybe_lt (max_vf, min_vf)) is false. This results in bad data dependence.
[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #2 from kugan at gcc dot gnu.org --- Thanks. I see the following in the log: test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 = .MASK_LOAD (_53, 32B, _171); test.cpp:22:19: missed: bad operation or unsupported loop bound. test.cpp:22:19: note: * Analysis failed with vector mode V4SF test.cpp:22:19: note: === vect_analyze_data_ref_dependences === test.cpp:22:19: missed: bad data dependence. test.cpp:22:19: note: * Analysis failed with vector mode VNx16QI test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 = .MASK_LOAD (_53, 32B, _171); test.cpp:22:19: missed: bad operation or unsupported loop bound. test.cpp:22:19: note: * Analysis failed with vector mode V8QI test.cpp:22:19: note: === vect_analyze_data_ref_dependences === test.cpp:22:19: missed: bad data dependence. test.cpp:22:19: note: * Analysis failed with vector mode VNx8QI test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 = .MASK_LOAD (_53, 32B, _171); test.cpp:22:19: missed: bad operation or unsupported loop bound. test.cpp:22:19: note: * Analysis failed with vector mode V4HI test.cpp:22:19: note: === vect_analyze_data_ref_dependences === test.cpp:22:19: missed: bad data dependence. test.cpp:22:19: note: * Analysis failed with vector mode VNx4QI test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 = .MASK_LOAD (_53, 32B, _171); test.cpp:22:19: missed: bad operation or unsupported loop bound. test.cpp:22:19: note: * Analysis failed with vector mode V2SI test.cpp:22:19: note: worklist: examine stmt: _57 = D.4803[_20]; test.cpp:22:19: note: === vect_analyze_data_ref_dependences === test.cpp:22:19: missed: bad data dependence. test.cpp:22:19: note: * Analysis failed with vector mode VNx2QI test.cpp:22:19: missed: couldn't vectorize loop test.cpp:22:19: missed: bad data dependence.
[Bug middle-end/114653] New: Not vectoring the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 Bug ID: 114653 Summary: Not vectoring the loop with openmp reduction. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: kugan at gcc dot gnu.org Target Milestone: --- Created attachment 57910 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57910=edit testcase Main loop in the attached test case is not vectorized with -fopenmp. It gets vectorized with -fopenmp-simd. In the case of -fopenmp reduction variables lax,lay,laz gets assigned to an array. data reference calculation for this seem to fail. See: offset from base address: (ssizetype) ((sizetype) _20 * 4) constant offset from base address: 0 step: 0 base alignment: 16 base misalignment: 0 offset alignment: 4 step alignment: 128 base_object: D.4806[_20] Creating dr for D.4808[_20] analyze_innermost: Applying pattern match.pd:219, generic-match-1.cc:3190 test.cpp:37:9: missed: failed: evolution of offset is not affine. command used: test.cpp -Ofast -fopenmp -mcpu=neoverse-v2 gcc -v: Using built-in specs. COLLECT_GCC=/home/kvivekananda/install/bin/gcc COLLECT_LTO_WRAPPER=/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/14.0.1/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../gcc/configure --enable-multiarch=yes --enable-languages=c,c++,fortran,lto --disable-bootstrap --prefix=/home/kvivekananda/install Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.1 20240314 (experimental) (GCC)
[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683 --- Comment #5 from kugan at gcc dot gnu.org --- -O3 -fno-tree-vectorize and -O3 -fno-tree-vrp works. I looked at the ever dump and it is not doing anything suspicious. Looks like range_info usage in vectoriser is causing the problem.
[Bug libgomp/113698] GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698 --- Comment #4 from kugan at gcc dot gnu.org --- Thanks for looking into this. The main reason we ere seeing performance issue turned out to be due to glibc malloc issue in https://sourceware.org/bugzilla/show_bug.cgi?id=30945
[Bug libgomp/113698] New: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698 Bug ID: 113698 Summary: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: kugan at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 57275 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57275=edit testcase When OMP_PROC_BIND=true it seems gomp set the affinity even before main() starts. In particular, the main thread gets affinity 0x1 (i.e. pinned to the first core). For the attached, I get $ OMP_NUM_THREADS=72 ./a.out [main thread affinity right after main()]. tid:ae511020 aff:... duration: 402.949 msec $ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out [main thread affinity right after main()]. tid:fffdded50020 aff:...0001 duration: 7879.59 msec $ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out [main thread affinity right after main()]. tid:ae54c020 aff:...0001 duration: 311219 msec Compiler options used: gcc -O0 -fopenmp repro.c gcc -v: Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/11/lto-wrapper Target: aarch64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)