Hmm I suspect the problem is that GPU is simply too old yes, but perhaps there is a simple enough workaround available in the code as you suggest. I will investigate further on Monday.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

On Jan 6, 2023, at 09:55, Mark Lohry <[email protected]> wrote:


These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization.

Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea?

On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry <[email protected]> wrote:
It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause?

One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked.

On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang <[email protected]> wrote:
Jacob, is it because the cuda arch is too old? 

--Junchao Zhang


On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <[email protected]> wrote:
I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, static_cast<int>(device->deviceId)));   in the initialize function.


Running check examples to verify correct installation
Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
2,17c2,46
<   0 SNES Function norm 2.391552133017e-01
<     0 KSP Residual norm 2.928487269734e-01
<     1 KSP Residual norm 1.876489580142e-02
<     2 KSP Residual norm 3.291394847944e-03
<     3 KSP Residual norm 2.456493072124e-04
<     4 KSP Residual norm 1.161647147715e-05
<     5 KSP Residual norm 1.285648407621e-06
<   1 SNES Function norm 6.846805706142e-05
<     0 KSP Residual norm 2.292783790384e-05
<     1 KSP Residual norm 2.100673631699e-06
<     2 KSP Residual norm 2.121341386147e-07
<     3 KSP Residual norm 2.455932678957e-08
<     4 KSP Residual norm 1.753095730744e-09
<     5 KSP Residual norm 7.489214418904e-11
<   2 SNES Function norm 2.103908447865e-10
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc!
> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line
> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment
> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb  GIT Date: 2023-01-05 17:22:48 +0000
> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan  5 17:25:17 2023
> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/cupmcontext.cu:10
> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> [0]PETSC ERROR: #12 DMCreateGlobalVector() at /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> [0]PETSC ERROR: #13 main() at ex19.c:149


On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <[email protected]> wrote:
I'm trying to compile the cuda example

./config/examples/arch-ci-linux-cuda-double-64idx.py --with-cudac=/usr/local/cuda-11.5/bin/nvcc

and running make test passes the test ok diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy but the eager variant fails, pasted below.

I get a similar error running my client code, pasted after. There when running with -info, it seems that some lazy initialization happens first, and i also call VecCreateSeqCuda which seems to have no issue.

Any idea? This happens to be with an -sm 3.5 device if it matters, otherwise it's a recent cuda compiler+driver.


petsc test code output:



not ok sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # Error code: 97
# [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
# [0]PETSC ERROR: GPU error
# [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported
# [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
# [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
# [0]PETSC ERROR: ../ex1 on a  named lancer by mlohry Thu Jan  5 15:22:33 2023
# [0]PETSC ERROR: Configure options --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-cuda=1 --with-precision=double --with-clanguage=c --with-cudac=/usr/local/cuda-11.5/bin/nvcc PETSC_ARCH=arch-ci-linux-cuda-double-64idx
# [0]PETSC ERROR: #1 CUPMAwareMPI_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
# [0]PETSC ERROR: #2 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
# [0]PETSC ERROR: #3 init_device_id_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
# [0]PETSC ERROR: #4 getDevice() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
# [0]PETSC ERROR: #5 PetscDeviceCreate() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
# [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
# [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
# [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
# [0]PETSC ERROR: #9 PetscInitialize_Common() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
# [0]PETSC ERROR: #10 PetscInitialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
# [0]PETSC ERROR: #11 main() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
# [0]PETSC ERROR: PETSc Option Table entries:
# [0]PETSC ERROR: -default_device_type host
# [0]PETSC ERROR: -device_enable eager
# [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to [email protected]





solver code output:



[0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType host available, initializing
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host initialized, default device id 0, view FALSE, init type lazy
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda available, initializing
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda initialized, default device id 0, view FALSE, init type lazy
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not available
[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not available
[0] <sys> PetscInitialize_Common(): PETSc successfully started: number of processors = 1
[0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS lancer.(none)
[0] <sys> PetscInitialize_Common(): Running on machine: lancer
# [Info] Petsc initialization complete.
# [Trace] Timing: Starting solver...
# [Info] RNG initial conditions have mean 0.000004, renormalizing.
# [Trace] Timing: PetscTimeIntegrator initialization...
# [Trace] Timing: Allocating Petsc CUDA arrays...
[0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = 100000000
[0] <sys> configure(): Configured device 0
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
# [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 seconds.
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
[0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = 100000000
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
[0] <dm> DMGetDMTS(): Creating new DMTS
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
[0] <dm> DMGetDMSNES(): Creating new DMSNES
[0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write
# [Info] Initializing petsc with ode23 integrator
# [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 seconds.

[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
[0] <device> PetscDeviceContextSetupGlobalContext_Private(): Initializing global PetscDeviceContext with device type cuda
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: GPU error
[0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
[0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu Jan  5 15:39:14 2023
[0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ --download-hwloc=1
[0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255
[0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/cupmcontext.cu:10
[0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244
[0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259
[0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
[0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
[0]PETSC ERROR: #7 PetscDeviceContextGetCurrentContextAssertType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371
[0]PETSC ERROR: #8 PetscCUBLASGetHandle() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/cupmcontext.cu:23
[0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:261
[0]PETSC ERROR: #10 VecMAXPY() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221
[0]PETSC ERROR: #11 TSStep_RK() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814
[0]PETSC ERROR: #12 TSStep() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424
[0]PETSC ERROR: #13 TSSolve() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814


Reply via email to