> Initializing cutlass and cusolver does not affect the memory usage. I did the 
> following to turn them off:

Ok next things to try out in order:

1. src/sys/objects/device/impls/cupm/cupmcontext.hpp:178 [PetscFunctionBegin;] 
Put a PetscFunctionReturn(0); right after this

2. src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = 
_devices[_defaultDevice]->configure();CHKERRQ(ierr);]
Comment this out

3. src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = 
_devices[_defaultDevice]->initialize();CHKERRQ(ierr);]
Comment this out

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Jan 7, 2022, at 12:02, Zhang, Hong <hongzh...@anl.gov> wrote:
> 
> Initializing cutlass and cusolver does not affect the memory usage. I did the 
> following to turn them off:
> 
> diff --git a/src/sys/objects/device/impls/cupm/cupmcontext.hpp 
> b/src/sys/objects/device/impls/cupm/cupmcontext.hpp
> index 51fed809e4d..9a5f068323a 100644
> --- a/src/sys/objects/device/impls/cupm/cupmcontext.hpp
> +++ b/src/sys/objects/device/impls/cupm/cupmcontext.hpp
> @@ -199,7 +199,7 @@ inline PetscErrorCode 
> CUPMContext<T>::setUp(PetscDeviceContext dctx) noexcept
>  #if PetscDefined(USE_DEBUG)
>    dci->timerInUse = PETSC_FALSE;
>  #endif
> -  ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);
> +  //ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);
>    PetscFunctionReturn(0);
>  }
> 
>> On Jan 7, 2022, at 10:53 AM, Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>>   I don't think this is right. We want the device initialized by PETSc , we 
>> just don't want the cublas and cusolve stuff initialized. In order to see 
>> how much memory initializing the blas and solvers takes.
>> 
>>   So I think you need to comment things in cupminterface.hpp like 
>> cublasCreate and cusolverDnCreate.
>> 
>>   Urgh, I hate C++ where huge chunks of real code are in header files.
>> 
>> 
>> 
>>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com 
>>> <mailto:jacob....@gmail.com>> wrote:
>>> 
>>> Hit send too early…
>>> 
>>> If you don’t want to comment out, you can also run with "-device_enable 
>>> lazy" option. Normally this is the default behavior but if -log_view or 
>>> -log_summary is provided this defaults to “-device_enable eager”. See 
>>> src/sys/objects/device/interface/device.cxx:398
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> 
>>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com 
>>>> <mailto:jacob....@gmail.com>> wrote:
>>>> 
>>>>> You need to go into the PetscInitialize() routine find where it loads the 
>>>>> cublas and cusolve and comment out those lines then run with -log_view
>>>> 
>>>> Comment out
>>>> 
>>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
>>>> PetscDefined(HAVE_SYCL))
>>>>   ierr = 
>>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
>>>> #endif
>>>> 
>>>> At src/sys/objects/pinit.c:956
>>>> 
>>>> Best regards,
>>>> 
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>> 
>>>>> On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev 
>>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>> 
>>>>> 
>>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>>>>> -log_view it loads all that stuff at startup. You need to go into the 
>>>>> PetscInitialize() routine find where it loads the cublas and cusolve and 
>>>>> comment out those lines then run with -log_view
>>>>> 
>>>>> 
>>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
>>>>>> 
>>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way 
>>>>>> too much for doing nothing. A test script is attached to reproduce the 
>>>>>> issue. If I remove the first line "import torch", PETSc consumes about 
>>>>>> 0.73GB, which is still significant. Does anyone have any idea about this 
>>>>>> behavior?
>>>>>> 
>>>>>> Thanks,
>>>>>> Hong
>>>>>> 
>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>> (caidao22/update-examples)$ python3 test.py
>>>>>> CUDA memory before PETSc 0.000GB
>>>>>> CUDA memory after PETSc 0.004GB
>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>>>>> CUDA memory before PETSc 0.000GB
>>>>>> CUDA memory after PETSc 1.936GB
>>>>>> 
>>>>>> import torch
>>>>>> import sys
>>>>>> import os
>>>>>> 
>>>>>> import nvidia_smi
>>>>>> nvidia_smi.nvmlInit()
>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>>>>> 
>>>>>> petsc4py_path = 
>>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>>>>> sys.path.append(petsc4py_path)
>>>>>> import petsc4py
>>>>>> petsc4py.init(sys.argv)
>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to