On Fri, Dec 10, 2021 at 10:39 AM Paul Lin <paul...@lbl.gov> wrote: > Hi Mark, > > Regarding the error: > "PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all > CUDA-capable devices are busy or unavailable" > > how are you requesting a perlmutter compute node? >
I can run CUDA jobs so I think it is a problem with the (hypre) build. > > thanks > -paul > > > > On Fri, Dec 10, 2021 at 6:54 AM Mark Adams <mfad...@lbl.gov> wrote: > >> And more Perlmutter weirdness. >> >> If I configure with the above CRAY_ACCEL_TARGET=nvidia80 I get this >> (configure.log) error. (some CUDA aware MPI related errors) >> >> But if I configure with CRAY_ACCEL_TARGET="" it gets into Kokkos and I >> get this configure2.log with: >> >> #error -- unsupported pgc++ configuration! Only pgc++ 18, 19 and 20 are >> supported! >> >> I have not seen this before. >> >> As far as the first problem, If I load the cudatoolkit, which they say >> you can do *or* set CRAY_ACCEL_TARGET=nvidia80 , the problems go away or >> maybe fails before it gets to the first error, but it fails. >> I get the configure3 error that has these old warnings, but I'm not sure >> why it failed exactly. >> >> This was sort of working yesterday. I did rebase today, but even when >> working this has been fragile. >> >> Any suggestions? >> Thanks, >> Mark >> >> >> On Thu, Dec 9, 2021 at 1:59 PM Mark Adams <mfad...@lbl.gov> wrote: >> >>> Well I found, accidentally, that turning CUDA aware MPI on with export >>> CRAY_ACCEL_TARGET=nvidia80 >>> seems to have fixed this. >>> Not sure what is going on. >>> >>> On Thu, Dec 9, 2021 at 11:21 AM Mark Adams <mfad...@lbl.gov> wrote: >>> >>>> I am getting this error. I have built this w/o hypre and the test are >>>> fine, including the CUDA tests. >>>> Any ideas? >>>> >>>> I notice that the tests use -dm_mat_type aijcusparse with hypre. >>>> >>>> Thanks, >>>> Mark >>>> >>>> 08:13 nid003929 adams/fix_mat_ex5k= perlmutter:~/petsc$ make >>>> PETSC_DIR=/global/homes/m/madams/petsc >>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda check >>>> Running check examples to verify correct installation >>>> Using PETSC_DIR=/global/homes/m/madams/petsc and >>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda >>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI >>>> process >>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >>>> processes >>>> 1,5c1,70 >>>> < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. >>>> < 0 SNES Function norm 0.0406612 >>>> < 1 SNES Function norm 4.12227e-06 >>>> < 2 SNES Function norm 6.098e-11 >>>> < Number of SNES iterations = 2 >>>> --- >>>> > [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> > [0]PETSC ERROR: GPU error >>>> > [0]PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all >>>> CUDA-capable devices are busy or unavailable >>>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> > [0]PETSC ERROR: Petsc Development GIT revision: >>>> v3.16.1-442-gebb4a459f5 GIT Date: 2021-12-08 08:59:23 -0500 >>>> > [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tutorials/./ex19 >>>> on a named nid003929 by madams Thu Dec 9 08:13:49 2021 >>>> > [0]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 >>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --CXXFLAGS=" -g >>>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --FFLAGS=" >>>> -g -mp=gpu" --with-cc=cc --with-cxx=CC --with-fc=ftn >>>> --with-cudac=/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvcc >>>> --with-debugging=0 --download-hypre=1 --with-cuda=1 --with-cuda-arch=80 >>>> --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 >>>> --with-make-np=8 >>>> --prefix=/global/cfs/projectdirs/m3904/petsc/current/perlmutter-opt-nvidia21.9 >>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda >>>> > [0]PETSC ERROR: #1 initialize() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:70 >>>> > [0]PETSC ERROR: #2 getDevice() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:360 >>>> > [0]PETSC ERROR: #3 PetscDeviceCreate() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:102 >>>> > [0]PETSC ERROR: #4 PetscDeviceInitializeDefaultDevice_Internal() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:266 >>>> > [1]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> > [1]PETSC ERROR: GPU error >>>> > [1]PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all >>>> CUDA-capable devices are busy or unavailable >>>> > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> > [1]PETSC ERROR: Petsc Development GIT revision: >>>> v3.16.1-442-gebb4a459f5 GIT Date: 2021-12-08 08:59:23 -0500 >>>> > [1]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tutorials/./ex19 >>>> on a named nid003929 by madams Thu Dec 9 08:13:49 2021 >>>> > [1]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 >>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --CXXFLAGS=" -g >>>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --FFLAGS=" >>>> -g -mp=gpu" --with-cc=cc --with-cxx=CC --with-fc=ftn >>>> --with-cudac=/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvcc >>>> --with-debugging=0 --download-hypre=1 --with-cuda=1 --with-cuda-arch=80 >>>> --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 >>>> --with-make-np=8 >>>> --prefix=/global/cfs/projectdirs/m3904/petsc/current/perlmutter-opt-nvidia21.9 >>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda >>>> > [1]PETSC ERROR: #1 initialize() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:70 >>>> > [1]PETSC ERROR: #2 getDevice() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:360 >>>> > [1]PETSC ERROR: #3 PetscDeviceCreate() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:102 >>>> > [1]PETSC ERROR: #4 PetscDeviceInitializeDefaultDevice_Internal() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:266 >>>> > [1]PETSC ERROR: #5 PetscDeviceInitialize() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:227 >>>> > [1]PETSC ERROR: #6 PCCreate_HYPRE() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/hypre/hypre.c:2224 >>>> > [1]PETSC ERROR: #7 PCSetType() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:84 >>>> > [1]PETSC ERROR: #8 PCSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:154 >>>> > [1]PETSC ERROR: #9 KSPSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itcl.c:356 >>>> > [1]PETSC ERROR: #10 SNESSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:1113 >>>> > [1]PETSC ERROR: #11 main() at ex19.c:150 >>>> > [1]PETSC ERROR: PETSc Option Table entries: >>>> > [1]PETSC ERROR: -da_refine 3 >>>> > [1]PETSC ERROR: -dm_mat_type aijcusparse >>>> > [1]PETSC ERROR: -dm_vec_type cuda >>>> > [0]PETSC ERROR: #5 PetscDeviceInitialize() at >>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:227 >>>> > [0]PETSC ERROR: #6 PCCreate_HYPRE() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/hypre/hypre.c:2224 >>>> > [0]PETSC ERROR: #7 PCSetType() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:84 >>>> > [0]PETSC ERROR: #8 PCSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:154 >>>> > [0]PETSC ERROR: #9 KSPSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itcl.c:356 >>>> > [0]PETSC ERROR: #10 SNESSetFromOptions() at >>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:1113 >>>> > [0]PETSC ERROR: #11 main() at ex19.c:150 >>>> > [0]PETSC ERROR: PETSc Option Table entries: >>>> > [0]PETSC ERROR: -da_refine 3 >>>> > [0]PETSC ERROR: -dm_mat_type aijcusparse >>>> > [0]PETSC ERROR: -dm_vec_type cuda >>>> > [0]PETSC ERROR: -ksp_norm_type unpreconditioned >>>> > [0]PETSC ERROR: -nox >>>> > [0]PETSC ERROR: -nox_warning >>>> > [0]PETSC ERROR: -pc_type hypre >>>> > [0]PETSC ERROR: -snes_monitor_short >>>> > [0]PETSC ERROR: -use_gpu_aware_mpi 0 >>>> > [0]PETSC ERROR: ----------------End of Error Message -------send >>>> entire error message to petsc-ma...@mcs.anl.gov---------- >>>> > [1]PETSC ERROR: -ksp_norm_type unpreconditioned >>>> > [1]PETSC ERROR: -nox >>>> > [1]PETSC ERROR: -nox_warning >>>> > [1]PETSC ERROR: -pc_type hypre >>>> > [1]PETSC ERROR: -snes_monitor_short >>>> > [1]PETSC ERROR: -use_gpu_aware_mpi 0 >>>> > [1]PETSC ERROR: ----------------End of Error Message -------send >>>> entire error message to petsc-ma...@mcs.anl.gov---------- >>>> > MPICH Notice [Rank 0] [job id 832277.2] [Thu Dec 9 08:13:50 2021] >>>> [nid003929] - Abort(97) (rank 0 in comm 0): application called >>>> MPI_Abort(MPI_COMM_WORLD, 97) - process 0 >>>> > >>>> > aborting job: >>>> > application called MPI_Abort(MPI_COMM_WORLD, 97) - process 0 >>>> > MPICH Notice [Rank 1] [job id 832277.2] [Thu Dec 9 08:13:50 2021] >>>> [nid003929] - Abort(97) (rank 1 in comm 0): application called >>>> MPI_Abort(MPI_COMM_WORLD, 97) - process 1 >>>> > >>>> > aborting job: >>>> > application called MPI_Abort(MPI_COMM_WORLD, 97) - process 1 >>>> > srun: error: nid003929: task 1: Exited with exit code 255 >>>> > srun: launch/slurm: _step_signal: Terminating StepId=832277.2 >>>> > slurmstepd: error: *** STEP 832277.2 ON nid003929 CANCELLED AT >>>> 2021-12-09T16:13:50 *** >>>> > srun: error: nid003929: task 0: Exited with exit code 255 >>>> /global/homes/m/madams/petsc/src/snes/tutorials >>>> Possible problem with ex19 running with hypre, diffs above >>>> ========================================= >>>> C/C++ example src/snes/tutorials/ex19 run successfully with cuda >>>> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI >>>> process >>>> >>>