Hit send too early…

If you don’t want to comment out, you can also run with "-device_enable lazy" 
option. Normally this is the default behavior but if -log_view or -log_summary 
is provided this defaults to “-device_enable eager”. See 
src/sys/objects/device/interface/device.cxx:398

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected]> wrote:
> 
>> You need to go into the PetscInitialize() routine find where it loads the 
>> cublas and cusolve and comment out those lines then run with -log_view
> 
> Comment out
> 
> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
> PetscDefined(HAVE_SYCL))
>   ierr = 
> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
> #endif
> 
> At src/sys/objects/pinit.c:956
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>> -log_view it loads all that stuff at startup. You need to go into the 
>> PetscInitialize() routine find where it loads the cublas and cusolve and 
>> comment out those lines then run with -log_view
>> 
>> 
>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> 
>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way too 
>>> much for doing nothing. A test script is attached to reproduce the issue. 
>>> If I remove the first line "import torch", PETSc consumes about 0.73GB, 
>>> which is still significant. Does anyone have any idea about this behavior?
>>> 
>>> Thanks,
>>> Hong
>>> 
>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>> (caidao22/update-examples)$ python3 test.py
>>> CUDA memory before PETSc 0.000GB
>>> CUDA memory after PETSc 0.004GB
>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>> CUDA memory before PETSc 0.000GB
>>> CUDA memory after PETSc 1.936GB
>>> 
>>> import torch
>>> import sys
>>> import os
>>> 
>>> import nvidia_smi
>>> nvidia_smi.nvmlInit()
>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>> 
>>> petsc4py_path = 
>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>> sys.path.append(petsc4py_path)
>>> import petsc4py
>>> petsc4py.init(sys.argv)
>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>> 
>> 
> 

Reply via email to