Re: [petsc-dev] PETSc init eats too much CUDA memory

Jacob Faibussowitsch Fri, 07 Jan 2022 09:21:55 -0800

> They had no influence to the memory usage. 
???????????????????????????????????????????????????????????????????????


Comment out the ierr = _devices[id]->initialize();CHKERRQ(ierr); on line 360 in 
cupmdevice.cxx as well.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Jan 7, 2022, at 12:18, Zhang, Hong <hongzh...@anl.gov> wrote:
> 
> I have tried all of these. They had no influence to the memory usage. 
> 
>> On Jan 7, 2022, at 11:15 AM, Jacob Faibussowitsch <jacob....@gmail.com 
>> <mailto:jacob....@gmail.com>> wrote:
>> 
>>> Initializing cutlass and cusolver does not affect the memory usage. I did 
>>> the following to turn them off:
>> 
>> Ok next things to try out in order:
>> 
>> 1. src/sys/objects/device/impls/cupm/cupmcontext.hpp:178 
>> [PetscFunctionBegin;] 
>> Put a PetscFunctionReturn(0); right after this
>> 
>> 2. src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = 
>> _devices[_defaultDevice]->configure();CHKERRQ(ierr);]
>> Comment this out
>> 
>> 3. src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = 
>> _devices[_defaultDevice]->initialize();CHKERRQ(ierr);]
>> Comment this out
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Jan 7, 2022, at 12:02, Zhang, Hong <hongzh...@anl.gov 
>>> <mailto:hongzh...@anl.gov>> wrote:
>>> 
>>> Initializing cutlass and cusolver does not affect the memory usage. I did 
>>> the following to turn them off:
>>> 
>>> diff --git a/src/sys/objects/device/impls/cupm/cupmcontext.hpp 
>>> b/src/sys/objects/device/impls/cupm/cupmcontext.hpp
>>> index 51fed809e4d..9a5f068323a 100644
>>> --- a/src/sys/objects/device/impls/cupm/cupmcontext.hpp
>>> +++ b/src/sys/objects/device/impls/cupm/cupmcontext.hpp
>>> @@ -199,7 +199,7 @@ inline PetscErrorCode 
>>> CUPMContext<T>::setUp(PetscDeviceContext dctx) noexcept
>>>  #if PetscDefined(USE_DEBUG)
>>>    dci->timerInUse = PETSC_FALSE;
>>>  #endif
>>> -  ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);
>>> +  //ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);
>>>    PetscFunctionReturn(0);
>>>  }
>>> 
>>>> On Jan 7, 2022, at 10:53 AM, Barry Smith <bsm...@petsc.dev 
>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>> 
>>>> 
>>>>   I don't think this is right. We want the device initialized by PETSc , 
>>>> we just don't want the cublas and cusolve stuff initialized. In order to 
>>>> see how much memory initializing the blas and solvers takes.
>>>> 
>>>>   So I think you need to comment things in cupminterface.hpp like 
>>>> cublasCreate and cusolverDnCreate.
>>>> 
>>>>   Urgh, I hate C++ where huge chunks of real code are in header files.
>>>> 
>>>> 
>>>> 
>>>>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com 
>>>>> <mailto:jacob....@gmail.com>> wrote:
>>>>> 
>>>>> Hit send too early…
>>>>> 
>>>>> If you don’t want to comment out, you can also run with "-device_enable 
>>>>> lazy" option. Normally this is the default behavior but if -log_view or 
>>>>> -log_summary is provided this defaults to “-device_enable eager”. See 
>>>>> src/sys/objects/device/interface/device.cxx:398
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Jacob Faibussowitsch
>>>>> (Jacob Fai - booss - oh - vitch)
>>>>> 
>>>>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com 
>>>>>> <mailto:jacob....@gmail.com>> wrote:
>>>>>> 
>>>>>>> You need to go into the PetscInitialize() routine find where it loads 
>>>>>>> the cublas and cusolve and comment out those lines then run with 
>>>>>>> -log_view
>>>>>> 
>>>>>> Comment out
>>>>>> 
>>>>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
>>>>>> PetscDefined(HAVE_SYCL))
>>>>>>   ierr = 
>>>>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
>>>>>> #endif
>>>>>> 
>>>>>> At src/sys/objects/pinit.c:956
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>> Jacob Faibussowitsch
>>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>> 
>>>>>>> On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev 
>>>>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>>>>>>> -log_view it loads all that stuff at startup. You need to go into the 
>>>>>>> PetscInitialize() routine find where it loads the cublas and cusolve 
>>>>>>> and comment out those lines then run with -log_view
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
>>>>>>>> 
>>>>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way 
>>>>>>>> too much for doing nothing. A test script is attached to reproduce the 
>>>>>>>> issue. If I remove the first line "import torch", PETSc consumes about 
>>>>>>>> 0.73GB, which is still significant. Does anyone have any idea about 
>>>>>>>> this behavior?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Hong
>>>>>>>> 
>>>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>>>> (caidao22/update-examples)$ python3 test.py
>>>>>>>> CUDA memory before PETSc 0.000GB
>>>>>>>> CUDA memory after PETSc 0.004GB
>>>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>>>>>>> CUDA memory before PETSc 0.000GB
>>>>>>>> CUDA memory after PETSc 1.936GB
>>>>>>>> 
>>>>>>>> import torch
>>>>>>>> import sys
>>>>>>>> import os
>>>>>>>> 
>>>>>>>> import nvidia_smi
>>>>>>>> nvidia_smi.nvmlInit()
>>>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>>>>>>> 
>>>>>>>> petsc4py_path = 
>>>>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>>>>>>> sys.path.append(petsc4py_path)
>>>>>>>> import petsc4py
>>>>>>>> petsc4py.init(sys.argv)
>>>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: [petsc-dev] PETSc init eats too much CUDA memory

Reply via email to