On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <[email protected]> wrote: > Hi Jed, > > In my case, I only have 2 hypre preconditioners at the same time, and > they do not solve simultaneously, so it might not be case 1. > > I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on > my own machine (with OpenMPI), all the communicators are freed from my > observation. I could not test it with Spectrum MPI on the clusters > immediately because all the dependencies were built in release mode. > However, as I mentioned, I haven't had this problem with OpenMPI before, > so I'm not sure if this is really an MPI implementation problem, or just > because Spectrum MPI has less limit for the number of communicators, > and/or this also depends on how many MPI ranks are used, as only 2 out > of 40 ranks reported the error. > You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired.
As a workaround, I replaced the MPI_Comm_dup() at > petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also > removed the MPI_Comm_free() in the hypre destroyer. My code runs fine > with Spectrum MPI now, but I don't think this is a long-term solution. > > Thanks! > > Feimi > > On 8/19/21 9:01 AM, Jed Brown wrote: > > Junchao Zhang <[email protected]> writes: > > > >> Hi, Feimi, > >> I need to consult Jed (cc'ed). > >> Jed, is this an example of > >> > https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 > ? > >> If Feimi really can not free matrices, then we just need to attach a > >> hypre-comm to a petsc inner comm, and pass that to hypre. > > Are there a bunch of solves as in that case? > > > > My understanding is that one should be able to > MPI_Comm_dup/MPI_Comm_free as many times as you like, but the > implementation has limits on how many communicators can co-exist at any one > time. The many-at-once is what we encountered in that 2018 thread. > > > > One way to check would be to use a debugger or tracer to examine the > stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. > > > > case 1: we'll find lots of dups without frees (until the end) because > the user really wants lots of these existing at the same time. > > > > case 2: dups are unfreed because of reference counting issue/inessential > references > > > > > > In case 1, I think the solution is as outlined in the thread, PETSc can > create an inner-comm for Hypre. I think I'd prefer to attach it to the > outer comm instead of the PETSc inner comm, but perhaps a case could be > made either way. >
