Jacob, is it because the cuda arch is too old? --Junchao Zhang
On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <[email protected]> wrote: > I'm seeing the same thing on latest main with a different machine and > -sm52 card, cuda 11.8. make check fails with the below, where the indicated > line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, > static_cast<int>(device->deviceId))); in the initialize function. > > > Running check examples to verify correct installation > Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > 2,17c2,46 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.928487269734e-01 > < 1 KSP Residual norm 1.876489580142e-02 > < 2 KSP Residual norm 3.291394847944e-03 > < 3 KSP Residual norm 2.456493072124e-04 > < 4 KSP Residual norm 1.161647147715e-05 > < 5 KSP Residual norm 1.285648407621e-06 > < 1 SNES Function norm 6.846805706142e-05 > < 0 KSP Residual norm 2.292783790384e-05 > < 1 KSP Residual norm 2.100673631699e-06 > < 2 KSP Residual norm 2.121341386147e-07 > < 3 KSP Residual norm 2.455932678957e-08 > < 4 KSP Residual norm 1.753095730744e-09 > < 5 KSP Residual norm 7.489214418904e-11 > < 2 SNES Function norm 2.103908447865e-10 > < Number of SNES iterations = 2 > --- > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: GPU error > > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling mistake, > etc! > > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: > command line > > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment > > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: > environment > > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 > source: command line > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb > GIT Date: 2023-01-05 17:22:48 +0000 > > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry > Thu Jan 5 17:25:17 2023 > > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 > > [0]PETSC ERROR: #1 initialize() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 > > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ > cupmcontext.cu:10 > > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 > > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() > at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 > > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > > [0]PETSC ERROR: #7 GetHandleDispatch_() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 > > [0]PETSC ERROR: #8 create() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 > > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at > /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 > > [0]PETSC ERROR: #10 VecSetType() at > /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 > > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at > /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 > > [0]PETSC ERROR: #12 DMCreateGlobalVector() at > /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 > > [0]PETSC ERROR: #13 main() at ex19.c:149 > > > On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <[email protected]> wrote: > >> I'm trying to compile the cuda example >> >> ./config/examples/arch-ci-linux-cuda-double-64idx.py >> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> >> and running make test passes the test ok >> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >> but the eager variant fails, pasted below. >> >> I get a similar error running my client code, pasted after. There when >> running with -info, it seems that some lazy initialization happens first, >> and i also call VecCreateSeqCuda which seems to have no issue. >> >> Any idea? This happens to be with an -sm 3.5 device if it matters, >> otherwise it's a recent cuda compiler+driver. >> >> >> petsc test code output: >> >> >> >> not ok >> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >> Error code: 97 >> # [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> # [0]PETSC ERROR: GPU error >> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 >> 2023 >> # [0]PETSC ERROR: Configure options >> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 >> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g >> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 >> --with-cuda=1 --with-precision=double --with-clanguage=c >> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> PETSC_ARCH=arch-ci-linux-cuda-double-64idx >> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 >> # [0]PETSC ERROR: #2 initialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 >> # [0]PETSC ERROR: #3 init_device_id_() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 >> # [0]PETSC ERROR: #4 getDevice() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 >> # [0]PETSC ERROR: #5 PetscDeviceCreate() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 >> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 >> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 >> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 >> # [0]PETSC ERROR: #9 PetscInitialize_Common() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 >> # [0]PETSC ERROR: #10 PetscInitialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 >> # [0]PETSC ERROR: #11 main() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 >> # [0]PETSC ERROR: PETSc Option Table entries: >> # [0]PETSC ERROR: -default_device_type host >> # [0]PETSC ERROR: -device_enable eager >> # [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to [email protected] >> >> >> >> >> >> solver code output: >> >> >> >> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off >> by default 0 >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> host available, initializing >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >> host initialized, default device id 0, view FALSE, init type lazy >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> cuda available, initializing >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >> cuda initialized, default device id 0, view FALSE, init type lazy >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> hip not available >> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> sycl not available >> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number of >> processors = 1 >> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS >> lancer.(none) >> [0] <sys> PetscInitialize_Common(): Running on machine: lancer >> # [Info] Petsc initialization complete. >> # [Trace] Timing: Starting solver... >> # [Info] RNG initial conditions have mean 0.000004, renormalizing. >> # [Trace] Timing: PetscTimeIntegrator initialization... >> # [Trace] Timing: Allocating Petsc CUDA arrays... >> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = >> 100000000 >> [0] <sys> configure(): Configured device 0 >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 >> seconds. >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3 >> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = >> 100000000 >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] <dm> DMGetDMTS(): Creating new DMTS >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] <dm> DMGetDMSNES(): Creating new DMSNES >> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write >> # [Info] Initializing petsc with ode23 integrator >> # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 >> seconds. >> >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4 >> [0] <device> PetscDeviceContextSetupGlobalContext_Private(): Initializing >> global PetscDeviceContext with device type cuda >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: GPU error >> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu >> Jan 5 15:39:14 2023 >> [0]PETSC ERROR: Configure options >> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc >> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ >> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS >> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 >> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ >> --download-hwloc=1 >> [0]PETSC ERROR: #1 initialize() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255 >> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:10 >> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244 >> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >> at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259 >> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >> [0]PETSC ERROR: #7 >> PetscDeviceContextGetCurrentContextAssertType_Internal() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371 >> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:23 >> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/ >> veccuda2.cu:261 >> [0]PETSC ERROR: #10 VecMAXPY() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221 >> [0]PETSC ERROR: #11 TSStep_RK() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814 >> [0]PETSC ERROR: #12 TSStep() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424 >> [0]PETSC ERROR: #13 TSSolve() at >> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814 >> >> >>
