Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
Hmm I suspect the problem is that GPU is simply too old yes, but perhaps there is a simple enough workaround available in the code as you suggest. I will investigate further on Monday.Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 6, 2023, at 09:55, Mark Lohry wrote:These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization.Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea?On Fri, Jan 6, 2023 at 9:17 AM Mark Lohrywrote:It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause?One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked.On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang wrote:Jacob, is it because the cuda arch is too old? --Junchao ZhangOn Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote:I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, static_cast(device->deviceId))); in the initialize function. Running check examples to verify correct installationUsing PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debugC/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI processC/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes2,17c2,46< 0 SNES Function norm 2.391552133017e-01 < 0 KSP Residual norm 2.928487269734e-01 < 1 KSP Residual norm 1.876489580142e-02 < 2 KSP Residual norm 3.291394847944e-03 < 3 KSP Residual norm 2.456493072124e-04 < 4 KSP Residual norm 1.161647147715e-05 < 5 KSP Residual norm 1.285648407621e-06 < 1 SNES Function norm 6.846805706142e-05 < 0 KSP Residual norm 2.292783790384e-05 < 1 KSP Residual norm 2.100673631699e-06 < 2 KSP Residual norm 2.121341386147e-07 < 3 KSP Residual norm 2.455932678957e-08 < 4 KSP Residual norm 1.753095730744e-09 < 5 KSP Residual norm 7.489214418904e-11 < 2 SNES Function norm 2.103908447865e-10 < Number of SNES iterations = 2---> [0]PETSC ERROR: - Error Message --> [0]PETSC ERROR: GPU error> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc!> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 +> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan 5 17:25:17 2023> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1> [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/cupmcontext.cu:10> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84> [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499> [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10> [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31> [0]PETSC ERROR: #12 DMCreateGlobalVector() at
Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization. Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea? On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry wrote: > It built+ran fine on a different system with an sm75 arch. Is there a > documented minimum version if that indeed is the cause? > > One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, > due to cusprase removing csrsv2Info_t (although it's still referenced in > their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 > worked. > > > On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang > wrote: > >> Jacob, is it because the cuda arch is too old? >> >> --Junchao Zhang >> >> >> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: >> >>> I'm seeing the same thing on latest main with a different machine and >>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, >>> static_cast(device->deviceId))); in the initialize function. >>> >>> >>> Running check examples to verify correct installation >>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >>> processes >>> 2,17c2,46 >>> < 0 SNES Function norm 2.391552133017e-01 >>> < 0 KSP Residual norm 2.928487269734e-01 >>> < 1 KSP Residual norm 1.876489580142e-02 >>> < 2 KSP Residual norm 3.291394847944e-03 >>> < 3 KSP Residual norm 2.456493072124e-04 >>> < 4 KSP Residual norm 1.161647147715e-05 >>> < 5 KSP Residual norm 1.285648407621e-06 >>> < 1 SNES Function norm 6.846805706142e-05 >>> < 0 KSP Residual norm 2.292783790384e-05 >>> < 1 KSP Residual norm 2.100673631699e-06 >>> < 2 KSP Residual norm 2.121341386147e-07 >>> < 3 KSP Residual norm 2.455932678957e-08 >>> < 4 KSP Residual norm 1.753095730744e-09 >>> < 5 KSP Residual norm 7.489214418904e-11 >>> < 2 SNES Function norm 2.103908447865e-10 >>> < Number of SNES iterations = 2 >>> --- >>> > [0]PETSC ERROR: - Error Message >>> -- >>> > [0]PETSC ERROR: GPU error >>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >>> supported >>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >>> Could be the program crashed before they were used or a spelling mistake, >>> etc! >>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >>> source: command line >>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >>> environment >>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >>> source: command line >>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Development GIT revision: >>> v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 + >>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >>> Thu Jan 5 17:25:17 2023 >>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >>> > [0]PETSC ERROR: #1 initialize() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >>> cupmcontext.cu:10 >>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >>> > [0]PETSC ERROR: #4 >>> PetscDeviceContextSetDefaultDeviceForType_Internal() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >>> > [0]PETSC ERROR: #8 create() at >>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >>> > [0]PETSC ERROR: #10 VecSetType() at >>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >>> > [0]PETSC ERROR: #12
Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause? One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked. On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang wrote: > Jacob, is it because the cuda arch is too old? > > --Junchao Zhang > > > On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: > >> I'm seeing the same thing on latest main with a different machine and >> -sm52 card, cuda 11.8. make check fails with the below, where the indicated >> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, >> static_cast(device->deviceId))); in the initialize function. >> >> >> Running check examples to verify correct installation >> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug >> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI >> processes >> 2,17c2,46 >> < 0 SNES Function norm 2.391552133017e-01 >> < 0 KSP Residual norm 2.928487269734e-01 >> < 1 KSP Residual norm 1.876489580142e-02 >> < 2 KSP Residual norm 3.291394847944e-03 >> < 3 KSP Residual norm 2.456493072124e-04 >> < 4 KSP Residual norm 1.161647147715e-05 >> < 5 KSP Residual norm 1.285648407621e-06 >> < 1 SNES Function norm 6.846805706142e-05 >> < 0 KSP Residual norm 2.292783790384e-05 >> < 1 KSP Residual norm 2.100673631699e-06 >> < 2 KSP Residual norm 2.121341386147e-07 >> < 3 KSP Residual norm 2.455932678957e-08 >> < 4 KSP Residual norm 1.753095730744e-09 >> < 5 KSP Residual norm 7.489214418904e-11 >> < 2 SNES Function norm 2.103908447865e-10 >> < Number of SNES iterations = 2 >> --- >> > [0]PETSC ERROR: - Error Message >> -- >> > [0]PETSC ERROR: GPU error >> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not >> supported >> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >> Could be the program crashed before they were used or a spelling mistake, >> etc! >> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 >> source: command line >> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment >> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: >> environment >> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 >> source: command line >> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb >> GIT Date: 2023-01-05 17:22:48 + >> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry >> Thu Jan 5 17:25:17 2023 >> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 >> > [0]PETSC ERROR: #1 initialize() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 >> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at >> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ >> cupmcontext.cu:10 >> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 >> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() >> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 >> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 >> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at >> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 >> > [0]PETSC ERROR: #7 GetHandleDispatch_() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 >> > [0]PETSC ERROR: #8 create() at >> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 >> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at >> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 >> > [0]PETSC ERROR: #10 VecSetType() at >> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 >> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at >> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 >> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at >> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 >> > [0]PETSC ERROR: #13 main() at ex19.c:149 >> >> >> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: >> >>> I'm trying to compile the cuda example >>> >>> ./config/examples/arch-ci-linux-cuda-double-64idx.py >>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >>> >>> and running make test passes the test ok >>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >>> but the eager variant fails, pasted below. >>> >>> I get a similar
Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
Jacob, is it because the cuda arch is too old? --Junchao Zhang On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry wrote: > I'm seeing the same thing on latest main with a different machine and > -sm52 card, cuda 11.8. make check fails with the below, where the indicated > line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, > static_cast(device->deviceId))); in the initialize function. > > > Running check examples to verify correct installation > Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > 2,17c2,46 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.928487269734e-01 > < 1 KSP Residual norm 1.876489580142e-02 > < 2 KSP Residual norm 3.291394847944e-03 > < 3 KSP Residual norm 2.456493072124e-04 > < 4 KSP Residual norm 1.161647147715e-05 > < 5 KSP Residual norm 1.285648407621e-06 > < 1 SNES Function norm 6.846805706142e-05 > < 0 KSP Residual norm 2.292783790384e-05 > < 1 KSP Residual norm 2.100673631699e-06 > < 2 KSP Residual norm 2.121341386147e-07 > < 3 KSP Residual norm 2.455932678957e-08 > < 4 KSP Residual norm 1.753095730744e-09 > < 5 KSP Residual norm 7.489214418904e-11 > < 2 SNES Function norm 2.103908447865e-10 > < Number of SNES iterations = 2 > --- > > [0]PETSC ERROR: - Error Message > -- > > [0]PETSC ERROR: GPU error > > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling mistake, > etc! > > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: > command line > > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment > > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: > environment > > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 > source: command line > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb > GIT Date: 2023-01-05 17:22:48 + > > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry > Thu Jan 5 17:25:17 2023 > > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 > > [0]PETSC ERROR: #1 initialize() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 > > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at > /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ > cupmcontext.cu:10 > > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 > > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() > at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 > > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at > /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > > [0]PETSC ERROR: #7 GetHandleDispatch_() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 > > [0]PETSC ERROR: #8 create() at > /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 > > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at > /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 > > [0]PETSC ERROR: #10 VecSetType() at > /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 > > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at > /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 > > [0]PETSC ERROR: #12 DMCreateGlobalVector() at > /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 > > [0]PETSC ERROR: #13 main() at ex19.c:149 > > > On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: > >> I'm trying to compile the cuda example >> >> ./config/examples/arch-ci-linux-cuda-double-64idx.py >> --with-cudac=/usr/local/cuda-11.5/bin/nvcc >> >> and running make test passes the test ok >> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy >> but the eager variant fails, pasted below. >> >> I get a similar error running my client code, pasted after. There when >> running with -info, it seems that some lazy initialization happens first, >> and i also call VecCreateSeqCuda which seems to have no issue. >> >> Any idea? This happens to be with an -sm 3.5 device if it matters, >> otherwise it's a recent cuda compiler+driver. >> >> >> petsc test code output: >> >> >> >> not ok >> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # >> Error code: 97 >> # [0]PETSC ERROR: - Error
Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, static_cast(device->deviceId))); in the initialize function. Running check examples to verify correct installation Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes 2,17c2,46 < 0 SNES Function norm 2.391552133017e-01 < 0 KSP Residual norm 2.928487269734e-01 < 1 KSP Residual norm 1.876489580142e-02 < 2 KSP Residual norm 3.291394847944e-03 < 3 KSP Residual norm 2.456493072124e-04 < 4 KSP Residual norm 1.161647147715e-05 < 5 KSP Residual norm 1.285648407621e-06 < 1 SNES Function norm 6.846805706142e-05 < 0 KSP Residual norm 2.292783790384e-05 < 1 KSP Residual norm 2.100673631699e-06 < 2 KSP Residual norm 2.121341386147e-07 < 3 KSP Residual norm 2.455932678957e-08 < 4 KSP Residual norm 1.753095730744e-09 < 5 KSP Residual norm 7.489214418904e-11 < 2 SNES Function norm 2.103908447865e-10 < Number of SNES iterations = 2 --- > [0]PETSC ERROR: - Error Message -- > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 + > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan 5 17:25:17 2023 > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1 > [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249 > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/ cupmcontext.cu:10 > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247 > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260 > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52 > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84 > [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499 > [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069 > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10 > [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89 > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31 > [0]PETSC ERROR: #12 DMCreateGlobalVector() at /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023 > [0]PETSC ERROR: #13 main() at ex19.c:149 On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry wrote: > I'm trying to compile the cuda example > > ./config/examples/arch-ci-linux-cuda-double-64idx.py > --with-cudac=/usr/local/cuda-11.5/bin/nvcc > > and running make test passes the test ok > diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy > but the eager variant fails, pasted below. > > I get a similar error running my client code, pasted after. There when > running with -info, it seems that some lazy initialization happens first, > and i also call VecCreateSeqCuda which seems to have no issue. > > Any idea? This happens to be with an -sm 3.5 device if it matters, > otherwise it's a recent cuda compiler+driver. > > > petsc test code output: > > > > not ok > sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # > Error code: 97 > # [0]PETSC ERROR: - Error Message > -- > # [0]PETSC ERROR: GPU error > # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not > supported > # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > # [0]PETSC ERROR: Petsc
[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
I'm trying to compile the cuda example ./config/examples/arch-ci-linux-cuda-double-64idx.py --with-cudac=/usr/local/cuda-11.5/bin/nvcc and running make test passes the test ok diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy but the eager variant fails, pasted below. I get a similar error running my client code, pasted after. There when running with -info, it seems that some lazy initialization happens first, and i also call VecCreateSeqCuda which seems to have no issue. Any idea? This happens to be with an -sm 3.5 device if it matters, otherwise it's a recent cuda compiler+driver. petsc test code output: not ok sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # Error code: 97 # [0]PETSC ERROR: - Error Message -- # [0]PETSC ERROR: GPU error # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 2023 # [0]PETSC ERROR: Configure options --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-cuda=1 --with-precision=double --with-clanguage=c --with-cudac=/usr/local/cuda-11.5/bin/nvcc PETSC_ARCH=arch-ci-linux-cuda-double-64idx # [0]PETSC ERROR: #1 CUPMAwareMPI_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194 # [0]PETSC ERROR: #2 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71 # [0]PETSC ERROR: #3 init_device_id_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290 # [0]PETSC ERROR: #4 getDevice() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99 # [0]PETSC ERROR: #5 PetscDeviceCreate() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104 # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375 # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499 # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634 # [0]PETSC ERROR: #9 PetscInitialize_Common() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001 # [0]PETSC ERROR: #10 PetscInitialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267 # [0]PETSC ERROR: #11 main() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12 # [0]PETSC ERROR: PETSc Option Table entries: # [0]PETSC ERROR: -default_device_type host # [0]PETSC ERROR: -device_enable eager # [0]PETSC ERROR: End of Error Message ---send entire error message to petsc-ma...@mcs.anl.gov-- solver code output: [0] PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0 [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType host available, initializing [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host initialized, default device id 0, view FALSE, init type lazy [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda available, initializing [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda initialized, default device id 0, view FALSE, init type lazy [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not available [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not available [0] PetscInitialize_Common(): PETSc successfully started: number of processors = 1 [0] PetscGetHostName(): Rejecting domainname, likely is NIS lancer.(none) [0] PetscInitialize_Common(): Running on machine: lancer # [Info] Petsc initialization complete. # [Trace] Timing: Starting solver... # [Info] RNG initial conditions have mean 0.04, renormalizing. # [Trace] Timing: PetscTimeIntegrator initialization... # [Trace] Timing: Allocating Petsc CUDA arrays... [0] PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = 1 [0] configure(): Configured device 0 [0] PetscCommDuplicate(): Using internal PETSc