Hi Barry, I tried adding PetscMallocDump after SNESDestroy as you suggested. When mumps fails, PetscMallocDump shows a number of mallocs which are absent when mumps succeeds, the largest being MatConvertToTriples_mpiaij_mpiaij() (line 638 in petsc-3.6.0/src/mat/impls/aij/mpi/mumps/mumps.c). The total memory reported by PetscMallocDump after SNESDestroy is substantially (>20x) larger when mumps fails than when mumps succeeds, and this amount increases uniformly with each mumps failure. So I think some of the mumps-related structures are not being deallocated by SNESDestroy if mumps generates an error. Thanks, -Matt
On Wed, Sep 30, 2015 at 2:16 PM, Barry Smith <[email protected]> wrote: > > > On Sep 30, 2015, at 1:06 PM, Matt Landreman <[email protected]> > wrote: > > > > PETSc developers, > > > > I tried implementing a system for automatically increasing MUMPS > ICNTL(14), along the lines described in this recent thread. If SNESSolve > returns ierr .ne. 0 due to MUMPS error -9, I call SNESDestroy, > re-initialize SNES, call MatMumpsSetIcntl with a larger value of ICNTL(14), > call SNESSolve again, and repeat as needed. The procedure works, but the > peak memory required (as measured by the HPC system) is 50%-100% higher if > the MUMPS solve has to be repeated compared to when MUMPS works on the 1st > try (by starting with a large ICNTL(14)), even though SNESDestroy is called > in between the attempts. Are there some PETSc or MUMPS structures which > would not be deallocated immediately by SNESDestroy? If so, how do I > deallocate them? > > They should be all destroyed automatically for you. You can use > PetscMallocDump() after the SNES is destroyed to verify that all that > memory is not properly freed. > > My guess is that your new malloc() with the bigger workspace cannot > "reuse" the space that was previously freed; so to the OS it looks like you > are using a lot more space but in terms of physical memory you are not > using more. > > Barry > > > > > Thanks, > > Matt Landreman > > > > > > On Tue, Sep 15, 2015 at 7:47 AM, David Knezevic < > [email protected]> wrote: > > On Tue, Sep 15, 2015 at 7:29 PM, Matthew Knepley <[email protected]> > wrote: > > On Tue, Sep 15, 2015 at 4:30 AM, David Knezevic < > [email protected]> wrote: > > In some cases, I get MUMPS error -9, i.e.: > > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFO(1)=-9, INFO(2)=98927 > > > > This is easily fixed by re-running the executable with > -mat_mumps_icntl_14 on the commandline. > > > > However, I would like to update my code in order to do this > automatically, i.e. detect the -9 error and re-run with the appropriate > option. Is there a recommended way to do this? It seems to me that I could > do this with a PETSc error handler (e.g. PetscPushErrorHandler) in order to > call a function that sets the appropriate option and solves again, is that > right? Are there any examples that illustrate this type of thing? > > > > I would not use the error handler. I would just check the ierr return > code from the solver. I think you need the > > INFO output, for which you can use MatMumpsGetInfo(). > > > > > > OK, that sounds good (and much simpler than what I had in mind), thanks > for the help! > > > > David > > > > > >
