Its twice the memory of the entire matrix (when stored on one process). I also just sent you the valgrind results, both for a serial run and a parallel run. The size on disk of the matrix I used is 20 GB. In the serial run, valgrind shows a peak memory usage of 21GB, while in the parallel run (with 4 processes) each process shows a peak memory usage of 10.8GB
Best regards, Michael On 07.10.21 17:55, Barry Smith wrote: > > >> On Oct 7, 2021, at 11:35 AM, Michael Werner <[email protected] >> <mailto:[email protected]>> wrote: >> >> Currently I'm using psutil to query every process for its memory >> usage and sum it up. However, the spike was only visible in top (I >> had a call to psutil right before and after A.load(viewer), and both >> reported only 50 GB of RAM usage). That's why I thought it might be >> directly tied to loading the matrix. However, I also had the problem >> that the computation crashed due to running out of memory while >> loading a matrix that should in theory fit into memory. In that case >> I would expect the OS to free unused meory immediatly, right? >> >> Concerning Barry's questions: the matrix is a sparse matrix and is >> originally created sequentially as SEQAIJ. However, it is then loaded >> as MPIAIJ, and if I look at the memory usage of the various >> processes, they fill up one after another, just as described. Is the >> origin of the matrix somehow preserved in the binary file? I was >> under the impression that the binary format was agnostic to the >> number of processes? > > The file format is independent of the number of processes that > created it. > >> I also varied the number of processes between 1 and 60, as soon as I >> use more than one process I can observe the spike (and its always >> twice the memory, no matter how many processes I'm using). > > Twice the size of the entire matrix (when stored on one process) or > twice the size of the resulting matrix stored on the first rank? The > latter is exactly as expected, since rank 0 has to load the part of > the matrix destined for the next rank and hence for a short time > contains its own part of the matrix and the part of one other rank. > > Barry > >> >> I also tried running Valgrind with the --tool=massif option. However, >> I don't know what to look for. I can send you the output file >> separately, if it helps. >> >> Best regards, >> Michael >> >> On 07.10.21 16:09, Matthew Knepley wrote: >>> On Thu, Oct 7, 2021 at 10:03 AM Barry Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> How many ranks are you using? Is it a sparse matrix with MPIAIJ? >>> >>> The intention is that for parallel runs the first rank reads >>> in its own part of the matrix, then reads in the part of the >>> next rank and sends it, then reads the part of the third rank >>> and sends it etc. So there should not be too much of a blip in >>> memory usage. You can run valgrind with the option for tracking >>> memory usage to see exactly where in the code the blip occurs; >>> it could be a regression occurred in the code making it require >>> more memory. But internal MPI buffers might explain some blip. >>> >>> >>> Is it possible that we free the memory, but the OS has just not >>> given back that memory for use yet? How are you measuring memory usage? >>> >>> Thanks, >>> >>> Matt >>> >>> >>> Barry >>> >>> >>> > On Oct 7, 2021, at 9:50 AM, Michael Werner >>> <[email protected] <mailto:[email protected]>> wrote: >>> > >>> > Hello, >>> > >>> > I noticed that there is a peak in memory consumption when I >>> load an >>> > existing matrix into PETSc. The matrix is previously created by an >>> > external program and saved in the PETSc binary format. >>> > The code I'm using in petsc4py is simple: >>> > >>> > viewer = >>> PETSc.Viewer().createBinary(<path/to/existing/matrix>, "r", >>> > comm=PETSc.COMM_WORLD) >>> > A = PETSc.Mat().create(comm=PETSc.COMM_WORLD) >>> > A.load(viewer) >>> > >>> > When I run this code in serial, the memory consumption of the >>> process is >>> > about 50GB RAM, similar to the file size of the saved matrix. >>> However, >>> > if I run the code in parallel, for a few seconds the memory >>> consumption >>> > of the process doubles to around 100GB RAM, before dropping >>> back down to >>> > around 50GB RAM. So it seems as if, for some reason, the matrix is >>> > copied after it is read into memory. Is there a way to avoid this >>> > behaviour? Currently, it is a clear bottleneck in my code. >>> > >>> > I tried setting the size of the matrix and to explicitly >>> preallocate the >>> > necessary NNZ (with A.setSizes(dim) and >>> A.setPreallocationNNZ(nnz), >>> > respectively) before loading, but that didn't help. >>> > >>> > As mentioned above, I'm using petsc4py together with >>> PETSc-3.16 on a >>> > Linux workstation. >>> > >>> > Best regards, >>> > Michael Werner >>> > >>> > -- >>> > >>> > ____________________________________________________ >>> > >>> > Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR) >>> > Institut für Aerodynamik und Strömungstechnik | Bunsenstr. 10 >>> | 37073 Göttingen >>> > >>> > Michael Werner >>> > Telefon 0551 709-2627 | Telefax 0551 709-2811 | >>> [email protected] <mailto:[email protected]> >>> > DLR.de <http://DLR.de> >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >> >
