> They had no influence to the memory usage. ???????????????????????????????????????????????????????????????????????
Comment out the ierr = _devices[id]->initialize();CHKERRQ(ierr); on line 360 in cupmdevice.cxx as well. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jan 7, 2022, at 12:18, Zhang, Hong <hongzh...@anl.gov> wrote: > > I have tried all of these. They had no influence to the memory usage. > >> On Jan 7, 2022, at 11:15 AM, Jacob Faibussowitsch <jacob....@gmail.com >> <mailto:jacob....@gmail.com>> wrote: >> >>> Initializing cutlass and cusolver does not affect the memory usage. I did >>> the following to turn them off: >> >> Ok next things to try out in order: >> >> 1. src/sys/objects/device/impls/cupm/cupmcontext.hpp:178 >> [PetscFunctionBegin;] >> Put a PetscFunctionReturn(0); right after this >> >> 2. src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = >> _devices[_defaultDevice]->configure();CHKERRQ(ierr);] >> Comment this out >> >> 3. src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = >> _devices[_defaultDevice]->initialize();CHKERRQ(ierr);] >> Comment this out >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >>> On Jan 7, 2022, at 12:02, Zhang, Hong <hongzh...@anl.gov >>> <mailto:hongzh...@anl.gov>> wrote: >>> >>> Initializing cutlass and cusolver does not affect the memory usage. I did >>> the following to turn them off: >>> >>> diff --git a/src/sys/objects/device/impls/cupm/cupmcontext.hpp >>> b/src/sys/objects/device/impls/cupm/cupmcontext.hpp >>> index 51fed809e4d..9a5f068323a 100644 >>> --- a/src/sys/objects/device/impls/cupm/cupmcontext.hpp >>> +++ b/src/sys/objects/device/impls/cupm/cupmcontext.hpp >>> @@ -199,7 +199,7 @@ inline PetscErrorCode >>> CUPMContext<T>::setUp(PetscDeviceContext dctx) noexcept >>> #if PetscDefined(USE_DEBUG) >>> dci->timerInUse = PETSC_FALSE; >>> #endif >>> - ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); >>> + //ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); >>> PetscFunctionReturn(0); >>> } >>> >>>> On Jan 7, 2022, at 10:53 AM, Barry Smith <bsm...@petsc.dev >>>> <mailto:bsm...@petsc.dev>> wrote: >>>> >>>> >>>> I don't think this is right. We want the device initialized by PETSc , >>>> we just don't want the cublas and cusolve stuff initialized. In order to >>>> see how much memory initializing the blas and solvers takes. >>>> >>>> So I think you need to comment things in cupminterface.hpp like >>>> cublasCreate and cusolverDnCreate. >>>> >>>> Urgh, I hate C++ where huge chunks of real code are in header files. >>>> >>>> >>>> >>>>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com >>>>> <mailto:jacob....@gmail.com>> wrote: >>>>> >>>>> Hit send too early… >>>>> >>>>> If you don’t want to comment out, you can also run with "-device_enable >>>>> lazy" option. Normally this is the default behavior but if -log_view or >>>>> -log_summary is provided this defaults to “-device_enable eager”. See >>>>> src/sys/objects/device/interface/device.cxx:398 >>>>> >>>>> Best regards, >>>>> >>>>> Jacob Faibussowitsch >>>>> (Jacob Fai - booss - oh - vitch) >>>>> >>>>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com >>>>>> <mailto:jacob....@gmail.com>> wrote: >>>>>> >>>>>>> You need to go into the PetscInitialize() routine find where it loads >>>>>>> the cublas and cusolve and comment out those lines then run with >>>>>>> -log_view >>>>>> >>>>>> Comment out >>>>>> >>>>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || >>>>>> PetscDefined(HAVE_SYCL)) >>>>>> ierr = >>>>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); >>>>>> #endif >>>>>> >>>>>> At src/sys/objects/pinit.c:956 >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jacob Faibussowitsch >>>>>> (Jacob Fai - booss - oh - vitch) >>>>>> >>>>>>> On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev >>>>>>> <mailto:bsm...@petsc.dev>> wrote: >>>>>>> >>>>>>> >>>>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with >>>>>>> -log_view it loads all that stuff at startup. You need to go into the >>>>>>> PetscInitialize() routine find where it loads the cublas and cusolve >>>>>>> and comment out those lines then run with -log_view >>>>>>> >>>>>>> >>>>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev >>>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote: >>>>>>>> >>>>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way >>>>>>>> too much for doing nothing. A test script is attached to reproduce the >>>>>>>> issue. If I remove the first line "import torch", PETSc consumes about >>>>>>>> 0.73GB, which is still significant. Does anyone have any idea about >>>>>>>> this behavior? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Hong >>>>>>>> >>>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>>>>> (caidao22/update-examples)$ python3 test.py >>>>>>>> CUDA memory before PETSc 0.000GB >>>>>>>> CUDA memory after PETSc 0.004GB >>>>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt >>>>>>>> CUDA memory before PETSc 0.000GB >>>>>>>> CUDA memory after PETSc 1.936GB >>>>>>>> >>>>>>>> import torch >>>>>>>> import sys >>>>>>>> import os >>>>>>>> >>>>>>>> import nvidia_smi >>>>>>>> nvidia_smi.nvmlInit() >>>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) >>>>>>>> >>>>>>>> petsc4py_path = >>>>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') >>>>>>>> sys.path.append(petsc4py_path) >>>>>>>> import petsc4py >>>>>>>> petsc4py.init(sys.argv) >>>>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9)) >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >