These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization.
Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea? On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry <[email protected]> wrote: > It built+ran fine on a different system with an sm75 arch. Is there a > documented minimum version if that indeed is the cause? > > One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, > due to cusprase removing csrsv2Info_t (although it's still referenced in > their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 > worked. > > > On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang <[email protected]> > wrote: > >> Jacob, is it because the cuda arch is too old? >> >> --Junchao Zhang >> >> >> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <[email protected]> wrote: >> >>> I'm seeing the same thing on latest main with a different machine and >>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, >>> static_cast<int>(device->deviceId))); in the initialize function. >>> >>> >>> Running check examples to verify correct installation >>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >>> processes >>> 2,17c2,46 >>> < 0 SNES Function norm 2.391552133017e-01 >>> < 0 KSP Residual norm 2.928487269734e-01 >>> < 1 KSP Residual norm 1.876489580142e-02 >>> < 2 KSP Residual norm 3.291394847944e-03 >>> < 3 KSP Residual norm 2.456493072124e-04 >>> < 4 KSP Residual norm 1.161647147715e-05 >>> < 5 KSP Residual norm 1.285648407621e-06 >>> < 1 SNES Function norm 6.846805706142e-05 >>> < 0 KSP Residual norm 2.292783790384e-05 >>> < 1 KSP Residual norm 2.100673631699e-06 >>> < 2 KSP Residual norm 2.121341386147e-07 >>> < 3 KSP Residual norm 2.455932678957e-08 >>> < 4 KSP Residual norm 1.753095730744e-09 >>> < 5 KSP Residual norm 7.489214418904e-11 >>> < 2 SNES Function norm 2.103908447865e-10 >>> < Number of SNES iterations = 2 >>> --- >>> > [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [0]PETSC ERROR: GPU error >>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >>> Could be the program crashed before they were used or a spelling mistake, >>> etc! >>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >>> source: command line >>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >>> environment >>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >>> source: command line >>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Development GIT revision: >>> v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 +0000 >>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >>> Thu Jan 5 17:25:17 2023 >>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >>> > [0]PETSC ERROR: #1 initialize() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:10 >>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >>> > [0]PETSC ERROR: #4 >>> PetscDeviceContextSetDefaultDeviceForType_Internal() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >>> > [0]PETSC ERROR: #8 create() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >>> > [0]PETSC ERROR: #10 VecSetType() at >>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >>> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at >>> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 >>> > [0]PETSC ERROR: #13 main() at ex19.c:149 >>> >>> >>> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <[email protected]> wrote: >>> >>>> I'm trying to compile the cuda example >>>> >>>> ./config/examples/arch-ci-linux-cuda-double-64idx.py >>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> >>>> and running make test passes the test ok >>>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >>>> but the eager variant fails, pasted below. >>>> >>>> I get a similar error running my client code, pasted after. There when >>>> running with -info, it seems that some lazy initialization happens first, >>>> and i also call VecCreateSeqCuda which seems to have no issue. >>>> >>>> Any idea? This happens to be with an -sm 3.5 device if it matters, >>>> otherwise it's a recent cuda compiler+driver. >>>> >>>> >>>> petsc test code output: >>>> >>>> >>>> >>>> not ok >>>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >>>> Error code: 97 >>>> # [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> # [0]PETSC ERROR: GPU error >>>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation >>>> not supported >>>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>>> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 >>>> 15:22:33 2023 >>>> # [0]PETSC ERROR: Configure options >>>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >>>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >>>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >>>> --with-cuda=1 --with-precision=double --with-clanguage=c >>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >>>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >>>> # [0]PETSC ERROR: #2 initialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >>>> # [0]PETSC ERROR: #3 init_device_id_() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >>>> # [0]PETSC ERROR: #4 getDevice() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >>>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >>>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >>>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >>>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >>>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >>>> # [0]PETSC ERROR: #10 PetscInitialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >>>> # [0]PETSC ERROR: #11 main() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >>>> # [0]PETSC ERROR: PETSc Option Table entries: >>>> # [0]PETSC ERROR: -default_device_type host >>>> # [0]PETSC ERROR: -device_enable eager >>>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>>> entire error message to [email protected] >>>> >>>> >>>> >>>> >>>> >>>> solver code output: >>>> >>>> >>>> >>>> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off >>>> by default 0 >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType host available, initializing >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>>> host initialized, default device id 0, view FALSE, init type lazy >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType cuda available, initializing >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >>>> cuda initialized, default device id 0, view FALSE, init type lazy >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType hip not available >>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): >>>> PetscDeviceType sycl not available >>>> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number >>>> of processors = 1 >>>> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS >>>> lancer.(none) >>>> [0] <sys> PetscInitialize_Common(): Running on machine: lancer >>>> # [Info] Petsc initialization complete. >>>> # [Trace] Timing: Starting solver... >>>> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >>>> # [Trace] Timing: PetscTimeIntegrator initialization... >>>> # [Trace] Timing: Allocating Petsc CUDA arrays... >>>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags >>>> = 100000000 >>>> [0] <sys> configure(): Configured device 0 >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >>>> seconds. >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >>>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags >>>> = 100000000 >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] <dm> DMGetDMTS(): Creating new DMTS >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] <dm> DMGetDMSNES(): Creating new DMSNES >>>> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write >>>> # [Info] Initializing petsc with ode23 integrator >>>> # [Trace] Timing: PetscTimeIntegrator initialization finished in >>>> 0.016754 seconds. >>>> >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >>>> [0] <device> PetscDeviceContextSetupGlobalContext_Private(): >>>> Initializing global PetscDeviceContext with device type cuda >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: GPU error >>>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>>> supported >>>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>> shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >>>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >>>> Jan 5 15:39:14 2023 >>>> [0]PETSC ERROR: Configure options >>>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >>>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >>>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >>>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >>>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >>>> --download-hwloc=1 >>>> [0]PETSC ERROR: #1 initialize() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >>>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>>> cupmcontext.cu:10 >>>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >>>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >>>> at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >>>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>>> [0]PETSC ERROR: #7 >>>> PetscDeviceContextGetCurrentContextAssertType_Internal() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >>>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >>>> cupmcontext.cu:23 >>>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >>>> veccuda2.cu:261 >>>> [0]PETSC ERROR: #10 VecMAXPY() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >>>> [0]PETSC ERROR: #11 TSStep_RK() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >>>> [0]PETSC ERROR: #12 TSStep() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >>>> [0]PETSC ERROR: #13 TSSolve() at >>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >>>> >>>> >>>>
