Great, the error is fixed in barry/fix-mumps-destroy, merged into next for testing and will go into master soon.
Barry > On Oct 1, 2015, at 7:04 PM, Matt Landreman <[email protected]> wrote: > > Hi Barry, > Your suggestion of removing the "if (mumps->CleanUpMUMPS)" in mumps.c did > resolve the problem for me. > Thanks, > -Matt > > On Wed, Sep 30, 2015 at 6:28 PM, Barry Smith <[email protected]> wrote: > > Matt, > > Please try the following: edit > > #undef __FUNCT__ > #define __FUNCT__ "MatDestroy_MUMPS" > PetscErrorCode MatDestroy_MUMPS(Mat A) > { > Mat_MUMPS *mumps=(Mat_MUMPS*)A->spptr; > PetscErrorCode ierr; > > PetscFunctionBegin; > if (mumps->CleanUpMUMPS) { > > Remove this if () test and just always do the lines of clean up code after > it. Let us know if this resolves the problem? > > Thanks > > Barry > > This CleanUpMUMPS flag has always be goofy and definitely needs to be > removed, the only question is if some other changes are needed when it is > removed. > > > > On Sep 30, 2015, at 4:59 PM, Barry Smith <[email protected]> wrote: > > > > > > Matt, > > > > Yes, you must be right The MatDestroy() on the "partially factored" > > matrix should clean up everything properly but it sounds like it is not. > > I'll look at it right now but I only have a few minutes; if I can't resolve > > it really quickly it may take a day or two. > > > > > > Barry > > > >> On Sep 30, 2015, at 4:10 PM, Matt Landreman <[email protected]> > >> wrote: > >> > >> Hi Barry, > >> I tried adding PetscMallocDump after SNESDestroy as you suggested. When > >> mumps fails, PetscMallocDump shows a number of mallocs which are absent > >> when mumps succeeds, the largest being MatConvertToTriples_mpiaij_mpiaij() > >> (line 638 in petsc-3.6.0/src/mat/impls/aij/mpi/mumps/mumps.c). The total > >> memory reported by PetscMallocDump after SNESDestroy is substantially > >> (>20x) larger when mumps fails than when mumps succeeds, and this amount > >> increases uniformly with each mumps failure. So I think some of the > >> mumps-related structures are not being deallocated by SNESDestroy if mumps > >> generates an error. > >> Thanks, > >> -Matt > >> > >> On Wed, Sep 30, 2015 at 2:16 PM, Barry Smith <[email protected]> wrote: > >> > >>> On Sep 30, 2015, at 1:06 PM, Matt Landreman <[email protected]> > >>> wrote: > >>> > >>> PETSc developers, > >>> > >>> I tried implementing a system for automatically increasing MUMPS > >>> ICNTL(14), along the lines described in this recent thread. If SNESSolve > >>> returns ierr .ne. 0 due to MUMPS error -9, I call SNESDestroy, > >>> re-initialize SNES, call MatMumpsSetIcntl with a larger value of > >>> ICNTL(14), call SNESSolve again, and repeat as needed. The procedure > >>> works, but the peak memory required (as measured by the HPC system) is > >>> 50%-100% higher if the MUMPS solve has to be repeated compared to when > >>> MUMPS works on the 1st try (by starting with a large ICNTL(14)), even > >>> though SNESDestroy is called in between the attempts. Are there some > >>> PETSc or MUMPS structures which would not be deallocated immediately by > >>> SNESDestroy? If so, how do I deallocate them? > >> > >> They should be all destroyed automatically for you. You can use > >> PetscMallocDump() after the SNES is destroyed to verify that all that > >> memory is not properly freed. > >> > >> My guess is that your new malloc() with the bigger workspace cannot > >> "reuse" the space that was previously freed; so to the OS it looks like > >> you are using a lot more space but in terms of physical memory you are not > >> using more. > >> > >> Barry > >> > >>> > >>> Thanks, > >>> Matt Landreman > >>> > >>> > >>> On Tue, Sep 15, 2015 at 7:47 AM, David Knezevic > >>> <[email protected]> wrote: > >>> On Tue, Sep 15, 2015 at 7:29 PM, Matthew Knepley <[email protected]> > >>> wrote: > >>> On Tue, Sep 15, 2015 at 4:30 AM, David Knezevic > >>> <[email protected]> wrote: > >>> In some cases, I get MUMPS error -9, i.e.: > >>> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > >>> INFO(1)=-9, INFO(2)=98927 > >>> > >>> This is easily fixed by re-running the executable with > >>> -mat_mumps_icntl_14 on the commandline. > >>> > >>> However, I would like to update my code in order to do this > >>> automatically, i.e. detect the -9 error and re-run with the appropriate > >>> option. Is there a recommended way to do this? It seems to me that I > >>> could do this with a PETSc error handler (e.g. PetscPushErrorHandler) in > >>> order to call a function that sets the appropriate option and solves > >>> again, is that right? Are there any examples that illustrate this type of > >>> thing? > >>> > >>> I would not use the error handler. I would just check the ierr return > >>> code from the solver. I think you need the > >>> INFO output, for which you can use MatMumpsGetInfo(). > >>> > >>> > >>> OK, that sounds good (and much simpler than what I had in mind), thanks > >>> for the help! > >>> > >>> David > >>> > >>> > >> > >> > > > >
