[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |11.2
 Resolution|--- |FIXED

--- Comment #8 from Tom de Vries  ---
I tried backporting to releases/gcc-10, but ran into:
...
FAIL: libgomp.c/target-43.c (test for excess errors)
Excess errors:
unresolved symbol __sync_val_compare_and_swap_1
mkoffload: fatal error:
/home/vries/oacc/trunk/install/offload-nvptx-none/bin//x86_64-pc-linux-gnu-accel-nvptx-none-gcc
returned 1 exit status
compilation terminated.
...

So I guess backporting stops at gcc-11.

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #7 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Tom de Vries
:

https://gcc.gnu.org/g:f94c6caac7f03815c26c03a532f834c37517519c

commit r11-8324-gf94c6caac7f03815c26c03a532f834c37517519c
Author: Tom de Vries 
Date:   Wed Apr 28 16:00:01 2021 +0200

[omp, simt] Fix expand_GOMP_SIMT_*

When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2021-04-28  Tom de Vries  

PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

(cherry picked from commit 4d7c874e2c64ebf7631049ace642d246843febae)

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Tom de Vries :

https://gcc.gnu.org/g:4d7c874e2c64ebf7631049ace642d246843febae

commit r12-249-g4d7c874e2c64ebf7631049ace642d246843febae
Author: Tom de Vries 
Date:   Wed Apr 28 16:00:01 2021 +0200

[omp, simt] Fix expand_GOMP_SIMT_*

When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2021-04-28  Tom de Vries  

PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #5 from Tom de Vries  ---
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569038.html

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #4 from Tom de Vries  ---
This commit:
...
commit 3af3bec2e4d344bd54a134d8b2263f44d788c3d8
Author: Richard Sandiford 
Date:   Mon May 4 21:21:16 2020 +0100

internal-fn: Avoid dropping the lhs of some calls [PR94941]
...
adds:
...
   expand_insn (get_multi_vector_move (type, optab), 2, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
...
in expand_load_lanes_optab_fn and mentions:
...
create_output_operand coerces an output operand to the insn's
predicates, using a suggested rtx location if convenient.
But if that rtx location is actually required rather than
optional, the builder of the insn has to emit a move afterwards.

(We could instead add a new interface that does this automatically,
but that's future work.)

This PR shows that we were failing to emit the move for some of the
vector load internal functions.  I think there are other routines in
internal-fn.c that potentially have the same problem, but this patch is
supposed to be a conservative subset suitable for backporting to GCC 10.
...

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

Tom de Vries  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Tom de Vries  ---
In expand_GOMP_SIMT_XCHG_BFLY, we have a subreg target:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...

During expand_insn, the operands are legitimized, and this changes the state of
the output operand to:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

So the value is written to reg 57, but never actually copied back to reg 40.

Tentative fix:
...
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index dd7173126fb..28ae3ed167a 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -361,6 +361,8 @@ expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
   create_input_operand (&ops[2], idx, SImode);
   gcc_assert (targetm.have_omp_simt_xchg_bfly ());
   expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
+  if (ops[0].value != target)
+emit_move_insn (target, ops[0].value);
 }

 /* Exchange between SIMT lanes according to given source lane index.  */
...

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-23 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #2 from Tobias Burnus  ---
(In reply to Tom de Vries from comment #1)
> Can you try the patch for PR81778 ?
> It's possible you're looking at a duplicate.

Unfortunately, it does not seem to make a difference - it still fails

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #1 from Tom de Vries  ---
Can you try the patch for PR81778 ?

It's possible you're looking at a duplicate.