[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |11.2 Resolution|--- |FIXED --- Comment #8 from Tom de Vries --- I tried backporting to releases/gcc-10, but ran into: ... FAIL: libgomp.c/target-43.c (test for excess errors) Excess errors: unresolved symbol __sync_val_compare_and_swap_1 mkoffload: fatal error: /home/vries/oacc/trunk/install/offload-nvptx-none/bin//x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. ... So I guess backporting stops at gcc-11.
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #7 from CVS Commits --- The releases/gcc-11 branch has been updated by Tom de Vries : https://gcc.gnu.org/g:f94c6caac7f03815c26c03a532f834c37517519c commit r11-8324-gf94c6caac7f03815c26c03a532f834c37517519c Author: Tom de Vries Date: Wed Apr 28 16:00:01 2021 +0200 [omp, simt] Fix expand_GOMP_SIMT_* When running the test-case included in this patch using an nvptx accelerator, it fails in execution. The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away during pass_jump as "trivially dead insns". This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY: ... class expand_operand ops[3]; create_output_operand (&ops[0], target, mode); ... expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops); ... which doesn't guarantee that target is assigned to by the expanded insn. F.i., if target is: ... (gdb) call debug_rtx ( target ) (subreg/s/u:QI (reg:SI 40 [ _61 ]) 0) ... then after expand_insn, we have: ... (gdb) call debug_rtx ( ops[0].value ) (reg:QI 57) ... See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some calls [PR94941]" for a similar problem. Fix this in the same way, by adding: ... if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); ... where applicable in the expand_GOMP_SIMT_* functions. Tested libgomp on x86_64 with nvptx accelerator. gcc/ChangeLog: 2021-04-28 Tom de Vries PR target/100232 * internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC) (expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED) (expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY) (expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to. (cherry picked from commit 4d7c874e2c64ebf7631049ace642d246843febae)
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #6 from CVS Commits --- The master branch has been updated by Tom de Vries : https://gcc.gnu.org/g:4d7c874e2c64ebf7631049ace642d246843febae commit r12-249-g4d7c874e2c64ebf7631049ace642d246843febae Author: Tom de Vries Date: Wed Apr 28 16:00:01 2021 +0200 [omp, simt] Fix expand_GOMP_SIMT_* When running the test-case included in this patch using an nvptx accelerator, it fails in execution. The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away during pass_jump as "trivially dead insns". This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY: ... class expand_operand ops[3]; create_output_operand (&ops[0], target, mode); ... expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops); ... which doesn't guarantee that target is assigned to by the expanded insn. F.i., if target is: ... (gdb) call debug_rtx ( target ) (subreg/s/u:QI (reg:SI 40 [ _61 ]) 0) ... then after expand_insn, we have: ... (gdb) call debug_rtx ( ops[0].value ) (reg:QI 57) ... See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some calls [PR94941]" for a similar problem. Fix this in the same way, by adding: ... if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); ... where applicable in the expand_GOMP_SIMT_* functions. Tested libgomp on x86_64 with nvptx accelerator. gcc/ChangeLog: 2021-04-28 Tom de Vries PR target/100232 * internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC) (expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED) (expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY) (expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #5 from Tom de Vries --- https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569038.html
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #4 from Tom de Vries --- This commit: ... commit 3af3bec2e4d344bd54a134d8b2263f44d788c3d8 Author: Richard Sandiford Date: Mon May 4 21:21:16 2020 +0100 internal-fn: Avoid dropping the lhs of some calls [PR94941] ... adds: ... expand_insn (get_multi_vector_move (type, optab), 2, ops); + if (!rtx_equal_p (target, ops[0].value)) +emit_move_insn (target, ops[0].value); ... in expand_load_lanes_optab_fn and mentions: ... create_output_operand coerces an output operand to the insn's predicates, using a suggested rtx location if convenient. But if that rtx location is actually required rather than optional, the builder of the insn has to emit a move afterwards. (We could instead add a new interface that does this automatically, but that's future work.) This PR shows that we were failing to emit the move for some of the vector load internal functions. I think there are other routines in internal-fn.c that potentially have the same problem, but this patch is supposed to be a conservative subset suitable for backporting to GCC 10. ...
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 Tom de Vries changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Tom de Vries --- In expand_GOMP_SIMT_XCHG_BFLY, we have a subreg target: ... (gdb) call debug_rtx ( target ) (subreg/s/u:QI (reg:SI 40 [ _61 ]) 0) ... During expand_insn, the operands are legitimized, and this changes the state of the output operand to: ... (gdb) call debug_rtx ( ops[0].value ) (reg:QI 57) ... So the value is written to reg 57, but never actually copied back to reg 40. Tentative fix: ... diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index dd7173126fb..28ae3ed167a 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -361,6 +361,8 @@ expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt) create_input_operand (&ops[2], idx, SImode); gcc_assert (targetm.have_omp_simt_xchg_bfly ()); expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops); + if (ops[0].value != target) +emit_move_insn (target, ops[0].value); } /* Exchange between SIMT lanes according to given source lane index. */ ...
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #2 from Tobias Burnus --- (In reply to Tom de Vries from comment #1) > Can you try the patch for PR81778 ? > It's possible you're looking at a duplicate. Unfortunately, it does not seem to make a difference - it still fails
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #1 from Tom de Vries --- Can you try the patch for PR81778 ? It's possible you're looking at a duplicate.