> Initializing cutlass and cusolver does not affect the memory usage. I did the > following to turn them off:
Ok next things to try out in order: 1. src/sys/objects/device/impls/cupm/cupmcontext.hpp:178 [PetscFunctionBegin;] Put a PetscFunctionReturn(0); right after this 2. src/sys/objects/device/impls/cupm/cupmdevice.cxx:327 [ierr = _devices[_defaultDevice]->configure();CHKERRQ(ierr);] Comment this out 3. src/sys/objects/device/impls/cupm/cupmdevice.cxx:326 [ierr = _devices[_defaultDevice]->initialize();CHKERRQ(ierr);] Comment this out Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jan 7, 2022, at 12:02, Zhang, Hong <hongzh...@anl.gov> wrote: > > Initializing cutlass and cusolver does not affect the memory usage. I did the > following to turn them off: > > diff --git a/src/sys/objects/device/impls/cupm/cupmcontext.hpp > b/src/sys/objects/device/impls/cupm/cupmcontext.hpp > index 51fed809e4d..9a5f068323a 100644 > --- a/src/sys/objects/device/impls/cupm/cupmcontext.hpp > +++ b/src/sys/objects/device/impls/cupm/cupmcontext.hpp > @@ -199,7 +199,7 @@ inline PetscErrorCode > CUPMContext<T>::setUp(PetscDeviceContext dctx) noexcept > #if PetscDefined(USE_DEBUG) > dci->timerInUse = PETSC_FALSE; > #endif > - ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); > + //ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > >> On Jan 7, 2022, at 10:53 AM, Barry Smith <bsm...@petsc.dev >> <mailto:bsm...@petsc.dev>> wrote: >> >> >> I don't think this is right. We want the device initialized by PETSc , we >> just don't want the cublas and cusolve stuff initialized. In order to see >> how much memory initializing the blas and solvers takes. >> >> So I think you need to comment things in cupminterface.hpp like >> cublasCreate and cusolverDnCreate. >> >> Urgh, I hate C++ where huge chunks of real code are in header files. >> >> >> >>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <jacob....@gmail.com >>> <mailto:jacob....@gmail.com>> wrote: >>> >>> Hit send too early… >>> >>> If you don’t want to comment out, you can also run with "-device_enable >>> lazy" option. Normally this is the default behavior but if -log_view or >>> -log_summary is provided this defaults to “-device_enable eager”. See >>> src/sys/objects/device/interface/device.cxx:398 >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> >>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <jacob....@gmail.com >>>> <mailto:jacob....@gmail.com>> wrote: >>>> >>>>> You need to go into the PetscInitialize() routine find where it loads the >>>>> cublas and cusolve and comment out those lines then run with -log_view >>>> >>>> Comment out >>>> >>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || >>>> PetscDefined(HAVE_SYCL)) >>>> ierr = >>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); >>>> #endif >>>> >>>> At src/sys/objects/pinit.c:956 >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> >>>>> On Jan 7, 2022, at 11:24, Barry Smith <bsm...@petsc.dev >>>>> <mailto:bsm...@petsc.dev>> wrote: >>>>> >>>>> >>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with >>>>> -log_view it loads all that stuff at startup. You need to go into the >>>>> PetscInitialize() routine find where it loads the cublas and cusolve and >>>>> comment out those lines then run with -log_view >>>>> >>>>> >>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev >>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote: >>>>>> >>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way >>>>>> too much for doing nothing. A test script is attached to reproduce the >>>>>> issue. If I remove the first line "import torch", PETSc consumes about >>>>>> 0.73GB, which is still significant. Does anyone have any idea about this >>>>>> behavior? >>>>>> >>>>>> Thanks, >>>>>> Hong >>>>>> >>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>>> (caidao22/update-examples)$ python3 test.py >>>>>> CUDA memory before PETSc 0.000GB >>>>>> CUDA memory after PETSc 0.004GB >>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt >>>>>> CUDA memory before PETSc 0.000GB >>>>>> CUDA memory after PETSc 1.936GB >>>>>> >>>>>> import torch >>>>>> import sys >>>>>> import os >>>>>> >>>>>> import nvidia_smi >>>>>> nvidia_smi.nvmlInit() >>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) >>>>>> >>>>>> petsc4py_path = >>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') >>>>>> sys.path.append(petsc4py_path) >>>>>> import petsc4py >>>>>> petsc4py.init(sys.argv) >>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9)) >>>>>> >>>>> >>>> >>> >> >