[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2023-05-08 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

Tobias Burnus  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Tobias Burnus  ---
FIXED for GCC 13(.2) + mainline/14.

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2023-05-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Tobias Burnus
:

https://gcc.gnu.org/g:615b920553fd28e9d4732dedcd799227e82cc011

commit r13-7306-g615b920553fd28e9d4732dedcd799227e82cc011
Author: Tobias Burnus 
Date:   Fri May 5 11:27:32 2023 +0200

nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge 

gcc/
PR libgomp/108098

* config/nvptx/mkoffload.cc (process): Emit dummy procedure
alongside reverse-offload function table to prevent NULL values
of the function addresses.

(cherry picked from commit 4359724cba31b2645f6106266bef019c3d6ef16a)

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2023-05-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:4359724cba31b2645f6106266bef019c3d6ef16a

commit r14-491-g4359724cba31b2645f6106266bef019c3d6ef16a
Author: Tobias Burnus 
Date:   Fri May 5 11:27:32 2023 +0200

nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]

Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
variables by NULL if a translation does not contain any executable code. It
works with CUDA 11.1.  The code of this commit is about reverse offload;
having NULL values disables the side of reverse offload during image load.

Solution is the same as found by Thomas for a related issue: Adding a dummy
procedure. Cf. the PR of this issue and Thomas' patch
"nvptx: Support global constructors/destructors via 'collect2'"
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html

As that approach also works here:

Co-authored-by: Thomas Schwinge 

gcc/
PR libgomp/108098

* config/nvptx/mkoffload.cc (process): Emit dummy procedure
alongside reverse-offload function table to prevent NULL values
of the function addresses.

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-16 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #4 from Tobias Burnus  ---
Indeed, the following seems to also help with an older CUDA / JIT compiler.
Motivated by Thomas' work.

If we are sure that CUDA 11.0 fixes it, we could generate that code only for:

  if (version2[0] < 7 || sm_ver2[0] < 8)

given that sm_80 is only supported since CUDA 11.0 and, likewise, CUDA 11.0
introduces PTX ISA version 7.0.

--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -358,4 +358,9 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n");

+  fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+  fprintf (out, "\t\".func __dummy$func ( )\"\n");
+  fprintf (out, "\t\"{\"\n");
+  fprintf (out, "\t\"}\"\n");
+
   size_t fidx = 0;
   for (id = func_ids; id; id = id->next)

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-16 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #3 from Tobias Burnus  ---
The problem - at least when testing on a system with:
  NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2

seems to be that libgomp/plugin/plugin-nvptx.c's GOMP_OFFLOAD_load_image has:

   fn_entries == 3 - and rev_fn_table != NULL (i.e. expect offloading)

and then runs:

  r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, , , module,
 "$offload_func_table");
  if (r != CUDA_SUCCESS)
GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
  assert (bytes == sizeof (uint64_t) * fn_entries);
  *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
  r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
  if (r != CUDA_SUCCESS)
GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));

So far so good - but all entries are NULL. This then disables the checking for
reverse offload on the host side. (It is not quite clear to me why it doesn't
run into an endless loop on the device side.)

The generated PTX code for reverse-offload-1{,-aux}.c is for that offload
table:

".version 6.0"
".target sm_35"
".file 1 \"\""
".extern .func tg_fn$_omp_fn$0$nohost$0 (.param .u64 %in_ar0);"
".extern .func main$_omp_fn$2$nohost$1 (.param .u64 %in_ar0);"
".visible .global .align 8 .u64 $offload_func_table[] = {"
"tg_fn$_omp_fn$0$nohost$0,"
"main$_omp_fn$2$nohost$1,"
"0,"
"0};\n";

which seems to be OK – and works with CUDA 11.  It looks as if the '>= sm_35'
is only one required criterion but that there are additional ones.

 * * *

I am relatively sure that it did work before, but it could well be that I only
checked that the device->host notification worked w/o trying any actual offload
(and before adding all NULL -> no reverse offload). And later when doing the
actual offload tests, I might have missed that machine. — Or I did something
different back then, but I don't know what.

 * * *

In patch "nvptx: Support global constructors/destructors via 'collect2'",
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html , Thomas
uses a dummy entry - possibly that would be also a solution:

+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+   function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+   evidently evaluate to NULL in JIT compilation.  Avoiding the use of
+   assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+   function next to the arrays apparently does work around this issue...

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-16 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

Thomas Schwinge  changed:

   What|Removed |Added

   Last reconfirmed||2022-12-16

--- Comment #2 from Thomas Schwinge  ---
(In reply to Tom de Vries from comment #1)
> I'm not sure if it matters for triggering this problem

It doesn't:

> version:  440.118.02

Same set of FAILs.

[Bug libgomp/108098] OpenMP/nvptx reverse offload execution test FAILs

2022-12-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108098

--- Comment #1 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #0)
> $ nvidia-smi
> [...]
> | NVIDIA-SMI 440.33.01Driver Version: 440.33.01CUDA Version: 10.2
> [...]
> |   0  Tesla K80  [...]
> [...]
> |   1  Tesla K80  [...]
> 

I'm not sure if it matters for triggering this problem, but if I look at this
board at nvidia drivers download and select cuda 10.2 and production branch, I
get :
...
version:440.118.02
Release Date:   2020.9.30
...

Then using the "Beta and Older Drivers" I find the version you're using is:
...
version: 440.33.01
Release date:  November 19, 2019
...

Please always use the latest drivers when reporting a problem.