https://gitlab.com/petsc/petsc/-/merge_requests/4512 
<https://gitlab.com/petsc/petsc/-/merge_requests/4512>

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Nov 1, 2021, at 11:00, Barry Smith <bsm...@petsc.dev> wrote:
> 
> 
>    PETSc code could check for the environmental variable 
> CUDA_VISIBLE_DEVICES=-1 if that makes sense to resolve the situation.
> 
> 
> 
>> On Nov 1, 2021, at 11:43 AM, Jacob Faibussowitsch <jacob....@gmail.com 
>> <mailto:jacob....@gmail.com>> wrote:
>> 
>> Looks like you are tripping up the following:
>> 
>> cerr = cupmGetDeviceCount(&ndev);
>> if (PetscUnlikely(cerr == cupmErrorStubLibrary)) {
>>   … // handle missing driver or stub library
>> } else {CHKERRCUPM(cerr);} // your error here
>> 
>> Is it an error if a user configures with cuda (i.e. signals intent to use 
>> cuda) but disables all the devices? On the one hand, yes this can be 
>> considered an error if the user inadvertently disables the devices via this 
>> environment variable without knowing, but on the other hand they should be 
>> able to freely set this variable without petsc crashing… Should we warn 
>> users? Handle this silently?
>> 
>> Note that petsc does provide '-device_enable none’ option to disable all 
>> devices, or if you only want to disable cuda devices '-device_enable_cuda 
>> none’ which should achieve the same effect as CUDA_VISIBLE_DEVICES=-1. But 
>> maybe it is too obscure to ask users to know about and use these flags 
>> instead of setting the cuda env variables. (Btw, can you test that using 
>> ‘-device_enable_cuda none’ does not crash when setting 
>> CUDA_VISIBLE_DEVICES=-1?)
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Nov 1, 2021, at 10:09, Stefano Zampini <stefano.zamp...@gmail.com 
>>> <mailto:stefano.zamp...@gmail.com>> wrote:
>>> 
>>> Just found out that if we configure with cuda and then want to run on CPU 
>>> only using CUDA_VISIBLE_DEVICES=-1 PETSc errors out. Is this intended 
>>> behavior? I supposed it should work
>>> This is with main
>>> 
>>> (ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check
>>> Running check examples to verify correct installation
>>> Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and 
>>> PETSC_ARCH=arch-ecrcml-cuda-double
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
>>> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
>>> Completed test examples
>>> 
>>> (ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check 
>>> CUDA_VISIBLE_DEVICES=1
>>> Running check examples to verify correct installation
>>> Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and 
>>> PETSC_ARCH=arch-ecrcml-cuda-double
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
>>> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
>>> Completed test examples
>>> 
>>> (ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check 
>>> CUDA_VISIBLE_DEVICES=-1
>>> Running check examples to verify correct installation
>>> Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and 
>>> PETSC_ARCH=arch-ecrcml-cuda-double
>>> Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
>>> See http://www.mcs.anl.gov/petsc/documentation/faq.html 
>>> <http://www.mcs.anl.gov/petsc/documentation/faq.html>
>>> [0]PETSC ERROR: --------------------- Error Message 
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: GPU error 
>>> [0]PETSC ERROR: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device 
>>> is detected
>>> [0]PETSC ERROR: See https://petsc.org/release/faq/ 
>>> <https://petsc.org/release/faq/> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.16.0-368-g72b201b202  
>>> GIT Date: 2021-10-29 14:48:19 +0300
>>> [0]PETSC ERROR: ./ex19 on a arch-ecrcml-cuda-double named 
>>> qaysar.kaust.edu.sa <http://qaysar.kaust.edu.sa/> by zampins Mon Nov  1 
>>> 18:06:12 2021
>>> [0]PETSC ERROR: Configure options 
>>> --with-blaslapack-include=/home/zampins/miniforge/envs/ecrcml-cuda/include 
>>> --with-blaslapack-lib=/home/zampins/miniforge/envs/ecrcml-cuda/lib/libmkl_rt.so
>>>  --download-h2opus --with-cuda 
>>> --with-kblas-dir=/home/zampins/miniforge/envs/ecrcml-cuda 
>>> --with-magma-dir=/home/zampins/miniforge/envs/ecrcml-cuda 
>>> --LDFLAGS=/usr/lib/x86_64-linux-gnu/libcuda.so --with-debugging=1 
>>> --with-openmp --with-precision=double --with-fc=0 
>>> PETSC_ARCH=arch-ecrcml-cuda-double 
>>> PETSC_DIR=/home/zampins/miniforge/Devel/petsc
>>> [0]PETSC ERROR: #1 initialize() at 
>>> /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:302
>>> [0]PETSC ERROR: #2 PetscDeviceInitializeTypeFromOptions_Private() at 
>>> /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/interface/device.cxx:292
>>> [0]PETSC ERROR: #3 PetscDeviceInitializeFromOptions_Internal() at 
>>> /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/interface/device.cxx:417
>>> [0]PETSC ERROR: #4 PetscInitialize_Common() at 
>>> /home/zampins/miniforge/Devel/petsc/src/sys/objects/pinit.c:956
>>> [0]PETSC ERROR: #5 PetscInitialize() at 
>>> /home/zampins/miniforge/Devel/petsc/src/sys/objects/pinit.c:1231
>>> --------------------------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code. Per user-direction, the job has been aborted.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> 
>>> [
>> 
> 

Reply via email to