Jed Brown wrote: > On Mon, Nov 21, 2011 at 22:47, Andrej Mesaros <andrej.mesaros at bc.edu > <mailto:andrej.mesaros at bc.edu>> wrote: > > Dear all, > > I need guidance in finding the memory needed for matrix assembly. > > The job that fails when I reserve 3.5GB memory per node gives me the > error output below. The job was run on 96 nodes, each storing its > own part of a matrix (around 60k rows each, ~100M non-zero complex > entries). > > The error occurs during assembly (similar numbers for every node): > > [25]PETSC ERROR: Out of memory. This could be due to allocating > [25]PETSC ERROR: too large an object or bleeding by not properly > [25]PETSC ERROR: destroying unneeded objects. > [25]PETSC ERROR: Memory allocated 4565256864 <tel:4565256864> Memory > used by process > 3658739712 > [25]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [25]PETSC ERROR: Memory requested 980025524! > [25]PETSC ERROR: > > ------------------------------__------------------------------__------------ > [25]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 > 13:37:48 CDT 2011 > [25]PETSC ERROR: See docs/changes/index.html for recent updates. > [25]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [25]PETSC ERROR: See docs/index.html for manual pages. > [25]PETSC ERROR: > > ------------------------------__------------------------------__------------ > [25]PETSC ERROR: Unknown Name on a linux-gnu named compute-5-54.local by > mesaros Wed Oct 12 22:27:13 2011 > [25]PETSC ERROR: Libraries linked from > /home/mesaros/code/petsc-3.1-__p8/linux-gnu-mpi-debug-__complex/lib > [25]PETSC ERROR: Configure run at Thu Jun 30 12:30:13 2011 > [25]PETSC ERROR: Configure options --with-scalar-type=complex > --with-64-bit-indices=1 --download-f-blas-lapack=yes --download-mpich=1 > --with-mpi-exec=/usr/publ$ > [25]PETSC ERROR: > > ------------------------------__------------------------------__------------ > [25]PETSC ERROR: PetscMallocAlign() line 49 in src/sys/memory/mal.c > [25]PETSC ERROR: PetscTrMallocDefault() line 192 in src/sys/memory/mtr.c > [25]PETSC ERROR: PetscPostIrecvInt() line 250 in src/sys/utils/mpimesg.c > > > Looks like you are trying to half a billion (--with-64-bit-indices) or a > billion entries. How are you computing the nonzeros? Is it possible that > many processes are computing entries that need to go to one process?
My code has a function which, when given a fixed matrix row index, calculates one by one values of all non-zero matrix elements in this row, while also returning the column index of each of these elements. So, all I need to do is put that the 1st process has a loop for row index going from 1 to 60k, the 2nd process has the loop going from 60k+1 to 120k, etc. Inside the loops, the row index is given, so it finds the non-zero elements and their column indices. > > [25]PETSC ERROR: MatStashScatterBegin_Private() line 498 in > src/mat/utils/matstash.c > [25]PETSC ERROR: MatAssemblyBegin_MPIAIJ() line 474 in > src/mat/impls/aij/mpi/mpiaij.c > [25]PETSC ERROR: MatAssemblyBegin() line 4564 in > src/mat/interface/matrix.c > > > Now, how much memory would I need per node for this assembly to > work? Is it "Memory allocated" + "Memory requested", which is around > 5.5GB? And did it fail when "Memory used by process" reached ~3.5GB, > which was the limit for the job? Usually, breaking the limit on > memory per node kills the job, and PETSc then doesn't give the above > "Out of memory" output. > > Additionally, can I simply estimate the additional memory needed for > SLEPc to find ~100 lowest eigenvalues? > > > Start with what is typically needed by PETSc (the matrix, the setup cost > is for your preconditioner, the vectors for the Krylov method) and add > 100*n*sizeof(PetscScalar). To clarify, is "n" the matrix dimension? So that's memory for 100 vectors (the Krylov space) plus the memory already taken by PETSc when assembly is done? Thanks a lot!
