https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105042

            Bug ID: 105042
           Summary: [libgomp, GOMP_NVPTX_JIT=-O0] Openacc testsuite
                    failures when X runs on nvidia driver
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

I usually have only an nvidia-compute$n driver package installed, but sometimes
(as happened when I updated the system yesterday) also x11-video-nvidia$n,
after which X is run on the nvidia card (instead of on the builtin intel
graphics).

With such a setup, I run into a cluster of FAILs, all for GOMP_NVPTX_JIT=-O0:
...
$ grep ^FAIL: 2/libgomp.sum
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vred2d-128.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
-DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1 -DGOMP_NVPTX_JIT=-O0 execution
test
FAIL: libgomp.oacc-fortran/parallel-dims.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os -DGOMP_NVPTX_JIT=-O0 execution
test
...

Note that this is with a patch from PR104423 that runs tests both with default
JIT optimization and GOMP_NVPTX_JIT=-O0, hence the -DGOMP_NVPTX_JIT=-O0 tag. 
But it can be reproduced by just doing:
...
export GOMP_NVPTX_JIT=-O0
...

It could be that the test-cases just need scaling down.  OTOH, it also could be
that there's an underlying problem that only surfaces when other processes are
run in parallel, or specifically, X.

This is on board K2000 with driver 470.103.01.

The board has 2GB of memory, and according to nvidia-smi, having the X
processes takes a couple of 100MBs, and ./parallel-dims.exe just takes 15MiB,
so at first glance it doesn't seem to be an out-of-board-memory thing.

I do observe reduced system responsiveness while running the tests, so maybe
it's the compute capacity rather than memory which is exhausted.

Reply via email to