Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-06 Thread Jacob Faibussowitsch
Hmm I suspect the problem is that GPU is simply too old yes, but perhaps there is a simple enough workaround available in the code as you suggest. I will investigate further on Monday.Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 6, 2023, at 09:55, Mark Lohry  wrote:These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization.Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea?On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry  wrote:It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause?One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked.On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang  wrote:Jacob, is it because the cuda arch is too old? --Junchao ZhangOn Thu, Jan 5, 2023 at 4:30 PM Mark Lohry  wrote:I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(, static_cast(device->deviceId)));   in the initialize function. Running check examples to verify correct installationUsing PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debugC/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI processC/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes2,17c2,46<   0 SNES Function norm 2.391552133017e-01 <     0 KSP Residual norm 2.928487269734e-01 <     1 KSP Residual norm 1.876489580142e-02 <     2 KSP Residual norm 3.291394847944e-03 <     3 KSP Residual norm 2.456493072124e-04 <     4 KSP Residual norm 1.161647147715e-05 <     5 KSP Residual norm 1.285648407621e-06 <   1 SNES Function norm 6.846805706142e-05 <     0 KSP Residual norm 2.292783790384e-05 <     1 KSP Residual norm 2.100673631699e-06 <     2 KSP Residual norm 2.121341386147e-07 <     3 KSP Residual norm 2.455932678957e-08 <     4 KSP Residual norm 1.753095730744e-09 <     5 KSP Residual norm 7.489214418904e-11 <   2 SNES Function norm 2.103908447865e-10 < Number of SNES iterations = 2---> [0]PETSC ERROR: - Error Message --> [0]PETSC ERROR: GPU error> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc!> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb  GIT Date: 2023-01-05 17:22:48 +> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan  5 17:25:17 2023> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1> [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/cupmcontext.cu:10> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84> [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499> [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10> [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31> [0]PETSC ERROR: #12 DMCreateGlobalVector() at 

Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-06 Thread Mark Lohry
These cards do indeed not support cudaDeviceGetMemPool --
cudaDeviceGetAttribute on cudaDevAttrMemoryPoolsSupported return false,
meaning it doesn't support cudaMallocAsync, so the first point of failure
is the call to cudaDeviceGetMemPool in the initialization.

Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and
skip the mempool or is that a bad idea?

On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry  wrote:

> It built+ran fine on a different system with an sm75 arch. Is there a
> documented minimum version if that indeed is the cause?
>
> One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12,
> due to cusprase removing csrsv2Info_t (although it's still referenced in
> their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8
> worked.
>
>
> On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang 
> wrote:
>
>> Jacob, is it because the cuda arch is too old?
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry  wrote:
>>
>>> I'm seeing the same thing on latest main with a different machine and
>>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
>>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
>>> static_cast(device->deviceId)));   in the initialize function.
>>>
>>>
>>> Running check examples to verify correct installation
>>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>>> processes
>>> 2,17c2,46
>>> <   0 SNES Function norm 2.391552133017e-01
>>> < 0 KSP Residual norm 2.928487269734e-01
>>> < 1 KSP Residual norm 1.876489580142e-02
>>> < 2 KSP Residual norm 3.291394847944e-03
>>> < 3 KSP Residual norm 2.456493072124e-04
>>> < 4 KSP Residual norm 1.161647147715e-05
>>> < 5 KSP Residual norm 1.285648407621e-06
>>> <   1 SNES Function norm 6.846805706142e-05
>>> < 0 KSP Residual norm 2.292783790384e-05
>>> < 1 KSP Residual norm 2.100673631699e-06
>>> < 2 KSP Residual norm 2.121341386147e-07
>>> < 3 KSP Residual norm 2.455932678957e-08
>>> < 4 KSP Residual norm 1.753095730744e-09
>>> < 5 KSP Residual norm 7.489214418904e-11
>>> <   2 SNES Function norm 2.103908447865e-10
>>> < Number of SNES iterations = 2
>>> ---
>>> > [0]PETSC ERROR: - Error Message
>>> --
>>> > [0]PETSC ERROR: GPU error
>>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>>> supported
>>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
>>> Could be the program crashed before they were used or a spelling mistake,
>>> etc!
>>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3
>>> source: command line
>>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
>>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
>>> environment
>>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
>>> source: command line
>>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> > [0]PETSC ERROR: Petsc Development GIT revision:
>>> v3.18.3-352-g91c56366cb  GIT Date: 2023-01-05 17:22:48 +
>>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
>>> Thu Jan  5 17:25:17 2023
>>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
>>> > [0]PETSC ERROR: #1 initialize() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
>>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
>>> cupmcontext.cu:10
>>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
>>> > [0]PETSC ERROR: #4
>>> PetscDeviceContextSetDefaultDeviceForType_Internal() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
>>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
>>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
>>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
>>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
>>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
>>> > [0]PETSC ERROR: #8 create() at
>>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
>>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
>>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
>>> > [0]PETSC ERROR: #10 VecSetType() at
>>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
>>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
>>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
>>> > [0]PETSC ERROR: #12 

Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-06 Thread Mark Lohry
It built+ran fine on a different system with an sm75 arch. Is there a
documented minimum version if that indeed is the cause?

One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12,
due to cusprase removing csrsv2Info_t (although it's still referenced in
their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8
worked.


On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang 
wrote:

> Jacob, is it because the cuda arch is too old?
>
> --Junchao Zhang
>
>
> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry  wrote:
>
>> I'm seeing the same thing on latest main with a different machine and
>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
>> static_cast(device->deviceId)));   in the initialize function.
>>
>>
>> Running check examples to verify correct installation
>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>> processes
>> 2,17c2,46
>> <   0 SNES Function norm 2.391552133017e-01
>> < 0 KSP Residual norm 2.928487269734e-01
>> < 1 KSP Residual norm 1.876489580142e-02
>> < 2 KSP Residual norm 3.291394847944e-03
>> < 3 KSP Residual norm 2.456493072124e-04
>> < 4 KSP Residual norm 1.161647147715e-05
>> < 5 KSP Residual norm 1.285648407621e-06
>> <   1 SNES Function norm 6.846805706142e-05
>> < 0 KSP Residual norm 2.292783790384e-05
>> < 1 KSP Residual norm 2.100673631699e-06
>> < 2 KSP Residual norm 2.121341386147e-07
>> < 3 KSP Residual norm 2.455932678957e-08
>> < 4 KSP Residual norm 1.753095730744e-09
>> < 5 KSP Residual norm 7.489214418904e-11
>> <   2 SNES Function norm 2.103908447865e-10
>> < Number of SNES iterations = 2
>> ---
>> > [0]PETSC ERROR: - Error Message
>> --
>> > [0]PETSC ERROR: GPU error
>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>> supported
>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
>> Could be the program crashed before they were used or a spelling mistake,
>> etc!
>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3
>> source: command line
>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
>> environment
>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
>> source: command line
>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>> shooting.
>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
>>  GIT Date: 2023-01-05 17:22:48 +
>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
>> Thu Jan  5 17:25:17 2023
>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
>> > [0]PETSC ERROR: #1 initialize() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
>> cupmcontext.cu:10
>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
>> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
>> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
>> > [0]PETSC ERROR: #8 create() at
>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
>> > [0]PETSC ERROR: #10 VecSetType() at
>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
>> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at
>> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
>> > [0]PETSC ERROR: #13 main() at ex19.c:149
>>
>>
>> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry  wrote:
>>
>>> I'm trying to compile the cuda example
>>>
>>> ./config/examples/arch-ci-linux-cuda-double-64idx.py
>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>>
>>> and running make test passes the test ok
>>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
>>> but the eager variant fails, pasted below.
>>>
>>> I get a similar 

Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Junchao Zhang
Jacob, is it because the cuda arch is too old?

--Junchao Zhang


On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry  wrote:

> I'm seeing the same thing on latest main with a different machine and
> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
> static_cast(device->deviceId)));   in the initialize function.
>
>
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 2,17c2,46
> <   0 SNES Function norm 2.391552133017e-01
> < 0 KSP Residual norm 2.928487269734e-01
> < 1 KSP Residual norm 1.876489580142e-02
> < 2 KSP Residual norm 3.291394847944e-03
> < 3 KSP Residual norm 2.456493072124e-04
> < 4 KSP Residual norm 1.161647147715e-05
> < 5 KSP Residual norm 1.285648407621e-06
> <   1 SNES Function norm 6.846805706142e-05
> < 0 KSP Residual norm 2.292783790384e-05
> < 1 KSP Residual norm 2.100673631699e-06
> < 2 KSP Residual norm 2.121341386147e-07
> < 3 KSP Residual norm 2.455932678957e-08
> < 4 KSP Residual norm 1.753095730744e-09
> < 5 KSP Residual norm 7.489214418904e-11
> <   2 SNES Function norm 2.103908447865e-10
> < Number of SNES iterations = 2
> ---
> > [0]PETSC ERROR: - Error Message
> --
> > [0]PETSC ERROR: GPU error
> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
> Could be the program crashed before they were used or a spelling mistake,
> etc!
> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
> command line
> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
> environment
> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
> source: command line
> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
>  GIT Date: 2023-01-05 17:22:48 +
> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
> Thu Jan  5 17:25:17 2023
> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> > [0]PETSC ERROR: #1 initialize() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
> cupmcontext.cu:10
> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> > [0]PETSC ERROR: #8 create() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> > [0]PETSC ERROR: #10 VecSetType() at
> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at
> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> > [0]PETSC ERROR: #13 main() at ex19.c:149
>
>
> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry  wrote:
>
>> I'm trying to compile the cuda example
>>
>> ./config/examples/arch-ci-linux-cuda-double-64idx.py
>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>
>> and running make test passes the test ok
>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
>> but the eager variant fails, pasted below.
>>
>> I get a similar error running my client code, pasted after. There when
>> running with -info, it seems that some lazy initialization happens first,
>> and i also call VecCreateSeqCuda which seems to have no issue.
>>
>> Any idea? This happens to be with an -sm 3.5 device if it matters,
>> otherwise it's a recent cuda compiler+driver.
>>
>>
>> petsc test code output:
>>
>>
>>
>> not ok
>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
>> Error code: 97
>> # [0]PETSC ERROR: - Error 

Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Mark Lohry
I'm seeing the same thing on latest main with a different machine and -sm52
card, cuda 11.8. make check fails with the below, where the indicated line
249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
static_cast(device->deviceId)));   in the initialize function.


Running check examples to verify correct installation
Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
2,17c2,46
<   0 SNES Function norm 2.391552133017e-01
< 0 KSP Residual norm 2.928487269734e-01
< 1 KSP Residual norm 1.876489580142e-02
< 2 KSP Residual norm 3.291394847944e-03
< 3 KSP Residual norm 2.456493072124e-04
< 4 KSP Residual norm 1.161647147715e-05
< 5 KSP Residual norm 1.285648407621e-06
<   1 SNES Function norm 6.846805706142e-05
< 0 KSP Residual norm 2.292783790384e-05
< 1 KSP Residual norm 2.100673631699e-06
< 2 KSP Residual norm 2.121341386147e-07
< 3 KSP Residual norm 2.455932678957e-08
< 4 KSP Residual norm 1.753095730744e-09
< 5 KSP Residual norm 7.489214418904e-11
<   2 SNES Function norm 2.103908447865e-10
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: - Error Message
--
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
supported
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
Could be the program crashed before they were used or a spelling mistake,
etc!
> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
command line
> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
environment
> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
source: command line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
 GIT Date: 2023-01-05 17:22:48 +
> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu
Jan  5 17:25:17 2023
> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> [0]PETSC ERROR: #1 initialize() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
cupmcontext.cu:10
> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> [0]PETSC ERROR: #7 GetHandleDispatch_() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> [0]PETSC ERROR: #8 create() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
/home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> [0]PETSC ERROR: #10 VecSetType() at
/home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
/home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> [0]PETSC ERROR: #12 DMCreateGlobalVector() at
/home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> [0]PETSC ERROR: #13 main() at ex19.c:149


On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry  wrote:

> I'm trying to compile the cuda example
>
> ./config/examples/arch-ci-linux-cuda-double-64idx.py
> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>
> and running make test passes the test ok
> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
> but the eager variant fails, pasted below.
>
> I get a similar error running my client code, pasted after. There when
> running with -info, it seems that some lazy initialization happens first,
> and i also call VecCreateSeqCuda which seems to have no issue.
>
> Any idea? This happens to be with an -sm 3.5 device if it matters,
> otherwise it's a recent cuda compiler+driver.
>
>
> petsc test code output:
>
>
>
> not ok
> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
> Error code: 97
> # [0]PETSC ERROR: - Error Message
> --
> # [0]PETSC ERROR: GPU error
> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> # [0]PETSC ERROR: Petsc 

[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Mark Lohry
I'm trying to compile the cuda example

./config/examples/arch-ci-linux-cuda-double-64idx.py
--with-cudac=/usr/local/cuda-11.5/bin/nvcc

and running make test passes the test ok
diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
but the eager variant fails, pasted below.

I get a similar error running my client code, pasted after. There when
running with -info, it seems that some lazy initialization happens first,
and i also call VecCreateSeqCuda which seems to have no issue.

Any idea? This happens to be with an -sm 3.5 device if it matters,
otherwise it's a recent cuda compiler+driver.


petsc test code output:



not ok
sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
Error code: 97
# [0]PETSC ERROR: - Error Message
--
# [0]PETSC ERROR: GPU error
# [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
supported
# [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
# [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
# [0]PETSC ERROR: ../ex1 on a  named lancer by mlohry Thu Jan  5 15:22:33
2023
# [0]PETSC ERROR: Configure options
--package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2
--download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g
-O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1
--with-cuda=1 --with-precision=double --with-clanguage=c
--with-cudac=/usr/local/cuda-11.5/bin/nvcc
PETSC_ARCH=arch-ci-linux-cuda-double-64idx
# [0]PETSC ERROR: #1 CUPMAwareMPI_() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
# [0]PETSC ERROR: #2 initialize() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
# [0]PETSC ERROR: #3 init_device_id_() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
# [0]PETSC ERROR: #4 getDevice() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
# [0]PETSC ERROR: #5 PetscDeviceCreate() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
# [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
# [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
# [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
# [0]PETSC ERROR: #9 PetscInitialize_Common() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
# [0]PETSC ERROR: #10 PetscInitialize() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
# [0]PETSC ERROR: #11 main() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
# [0]PETSC ERROR: PETSc Option Table entries:
# [0]PETSC ERROR: -default_device_type host
# [0]PETSC ERROR: -device_enable eager
# [0]PETSC ERROR: End of Error Message ---send entire
error message to petsc-ma...@mcs.anl.gov--





solver code output:



[0]  PetscDetermineInitialFPTrap(): Floating point trapping is off by
default 0
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
host available, initializing
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host
initialized, default device id 0, view FALSE, init type lazy
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
cuda available, initializing
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda
initialized, default device id 0, view FALSE, init type lazy
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
hip not available
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
sycl not available
[0]  PetscInitialize_Common(): PETSc successfully started: number of
processors = 1
[0]  PetscGetHostName(): Rejecting domainname, likely is NIS
lancer.(none)
[0]  PetscInitialize_Common(): Running on machine: lancer
# [Info] Petsc initialization complete.
# [Trace] Timing: Starting solver...
# [Info] RNG initial conditions have mean 0.04, renormalizing.
# [Trace] Timing: PetscTimeIntegrator initialization...
# [Trace] Timing: Allocating Petsc CUDA arrays...
[0]  PetscCommDuplicate(): Duplicating a communicator 2 3 max tags =
1
[0]  configure(): Configured device 0
[0]  PetscCommDuplicate(): Using internal PETSc