Hit send too early… If you don’t want to comment out, you can also run with "-device_enable lazy" option. Normally this is the default behavior but if -log_view or -log_summary is provided this defaults to “-device_enable eager”. See src/sys/objects/device/interface/device.cxx:398
Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected]> wrote: > >> You need to go into the PetscInitialize() routine find where it loads the >> cublas and cusolve and comment out those lines then run with -log_view > > Comment out > > #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || > PetscDefined(HAVE_SYCL)) > ierr = > PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); > #endif > > At src/sys/objects/pinit.c:956 > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > >> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> Without log_view it does not load any cuBLAS/cuSolve immediately with >> -log_view it loads all that stuff at startup. You need to go into the >> PetscInitialize() routine find where it loads the cublas and cusolve and >> comment out those lines then run with -log_view >> >> >>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way too >>> much for doing nothing. A test script is attached to reproduce the issue. >>> If I remove the first line "import torch", PETSc consumes about 0.73GB, >>> which is still significant. Does anyone have any idea about this behavior? >>> >>> Thanks, >>> Hong >>> >>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>> (caidao22/update-examples)$ python3 test.py >>> CUDA memory before PETSc 0.000GB >>> CUDA memory after PETSc 0.004GB >>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt >>> CUDA memory before PETSc 0.000GB >>> CUDA memory after PETSc 1.936GB >>> >>> import torch >>> import sys >>> import os >>> >>> import nvidia_smi >>> nvidia_smi.nvmlInit() >>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) >>> >>> petsc4py_path = >>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') >>> sys.path.append(petsc4py_path) >>> import petsc4py >>> petsc4py.init(sys.argv) >>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9)) >>> >> >
