Are you sure the make dependencies are correct and the code got properly 
recompiled with the commented out in the header file? It is difficult to 
believe they use no memory. 

  What else in the PETSc initialization of the GPU would use huge chunks of 
memory? 



> On Jan 7, 2022, at 11:58 AM, Jacob Faibussowitsch <jacob....@gmail.com> wrote:
> 
>> I don't think this is right. We want the device initialized by PETSc , we 
>> just don't want the cublas and cusolve stuff initialized. In order to see 
>> how much memory initializing the blas and solvers takes.
> 
> This is how it has always been, PetscDevice adopted the same initialization 
> strategy of the previous code. Eager initialization initializes 
> __everything__ since initializing cublas and cusolver takes (or at least 
> historically took) eons.
> 
>>   So I think you need to comment things in cupminterface.hpp like 
>> cublasCreate and cusolverDnCreate.
>> 
>>   Urgh, I hate C++ where huge chunks of real code are in header files.
> 
> Not quite, you will need to comment out
> 
> ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);
> 
> In src/sys/objects/device/impls/cupm/cupmcontext.hpp:202
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jan 7, 2022, at 11:53, Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>>   I don't think this is right. We want the device initialized by PETSc , we 
>> just don't want the cublas and cusolve stuff initialized. In order to see 
>> how much memory initializing the blas and solvers takes.
>> 
>>   So I think you need to comment things in cupminterface.hpp like 
>> cublasCreate and cusolverDnCreate.
>> 
>>   Urgh, I hate C++ where huge chunks of real code are in header files.
>> 
>> 
>> 
>>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com 
>>> <mailto:jacob....@gmail.com>> wrote:
>>> 
>>> Hit send too early…
>>> 
>>> If you don’t want to comment out, you can also run with "-device_enable 
>>> lazy" option. Normally this is the default behavior but if -log_view or 
>>> -log_summary is provided this defaults to “-device_enable eager”. See 
>>> src/sys/objects/device/interface/device.cxx:398
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> 
>>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com 
>>>> <mailto:jacob....@gmail.com>> wrote:
>>>> 
>>>>> You need to go into the PetscInitialize() routine find where it loads the 
>>>>> cublas and cusolve and comment out those lines then run with -log_view
>>>> 
>>>> Comment out
>>>> 
>>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
>>>> PetscDefined(HAVE_SYCL))
>>>>   ierr = 
>>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
>>>> #endif
>>>> 
>>>> At src/sys/objects/pinit.c:956
>>>> 
>>>> Best regards,
>>>> 
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>> 
>>>>> On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev 
>>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>> 
>>>>> 
>>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>>>>> -log_view it loads all that stuff at startup. You need to go into the 
>>>>> PetscInitialize() routine find where it loads the cublas and cusolve and 
>>>>> comment out those lines then run with -log_view
>>>>> 
>>>>> 
>>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
>>>>>> 
>>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way 
>>>>>> too much for doing nothing. A test script is attached to reproduce the 
>>>>>> issue. If I remove the first line "import torch", PETSc consumes about 
>>>>>> 0.73GB, which is still significant. Does anyone have any idea about this 
>>>>>> behavior?
>>>>>> 
>>>>>> Thanks,
>>>>>> Hong
>>>>>> 
>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>> (caidao22/update-examples)$ python3 test.py
>>>>>> CUDA memory before PETSc 0.000GB
>>>>>> CUDA memory after PETSc 0.004GB
>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>>>>> CUDA memory before PETSc 0.000GB
>>>>>> CUDA memory after PETSc 1.936GB
>>>>>> 
>>>>>> import torch
>>>>>> import sys
>>>>>> import os
>>>>>> 
>>>>>> import nvidia_smi
>>>>>> nvidia_smi.nvmlInit()
>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>>>>> 
>>>>>> petsc4py_path = 
>>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>>>>> sys.path.append(petsc4py_path)
>>>>>> import petsc4py
>>>>>> petsc4py.init(sys.argv)
>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to