Feimi, if it is easy to reproduce, could you give instructions on how to reproduce that?
PS: Spectrum MPI is based on OpenMPI. I don't understand why it has the problem but OpenMPI does not. It could be a bug in petsc or user's code. For reference counting on MPI_Comm, we already have petsc inner comm. I think we can reuse that. --Junchao Zhang On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <[email protected]> wrote: > > It sounds like maybe the Spectrum MPI_Comm_free() is not returning the > comm to the "pool" as available for future use; a very buggy MPI > implementation. This can easily be checked in a tiny standalone MPI program > that simply comm dups and frees thousands of times in a loop. Could even be > a configure test (that requires running an MPI program). I do not remember > if we ever tested this possibility; maybe and I forgot. > > If this is the problem we can provide a "work around" that attributes > the new comm (to be passed to hypre) to the old comm with a reference count > value also in the attribute. When the hypre matrix is created that count is > (with the new comm) is set to 1, when the hypre matrix is freed that count > is set to zero (but the comm is not freed), in the next call to create the > hypre matrix when the attribute is found, the count is zero so PETSc knows > it can pass the same comm again to the new hypre matrix. > > This will only allow one simultaneous hypre matrix to be created from the > original comm. To allow multiply simultaneous hypre matrix one could have > multiple comms and counts in the attribute and just check them until one > finds an available one to reuse (or creates yet another one if all the > current ones are busy with hypre matrices). So it is the same model as > DMGetXXVector() where vectors are checked out and then checked in to be > available later. This would solve the currently reported problem (if it is > a buggy MPI that does not properly free comms), but not solve the MOOSE > problem where 10,000 comms are needed at the same time. > > Barry > > > > > > On Aug 19, 2021, at 3:29 PM, Junchao Zhang <[email protected]> > wrote: > > > > > On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <[email protected]> wrote: > >> Hi Jed, >> >> In my case, I only have 2 hypre preconditioners at the same time, and >> they do not solve simultaneously, so it might not be case 1. >> >> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on >> my own machine (with OpenMPI), all the communicators are freed from my >> observation. I could not test it with Spectrum MPI on the clusters >> immediately because all the dependencies were built in release mode. >> However, as I mentioned, I haven't had this problem with OpenMPI before, >> so I'm not sure if this is really an MPI implementation problem, or just >> because Spectrum MPI has less limit for the number of communicators, >> and/or this also depends on how many MPI ranks are used, as only 2 out >> of 40 ranks reported the error. >> > You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two > ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are > paired. > > As a workaround, I replaced the MPI_Comm_dup() at > >> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also >> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine >> with Spectrum MPI now, but I don't think this is a long-term solution. >> >> Thanks! >> >> Feimi >> >> On 8/19/21 9:01 AM, Jed Brown wrote: >> > Junchao Zhang <[email protected]> writes: >> > >> >> Hi, Feimi, >> >> I need to consult Jed (cc'ed). >> >> Jed, is this an example of >> >> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >> ? >> >> If Feimi really can not free matrices, then we just need to attach a >> >> hypre-comm to a petsc inner comm, and pass that to hypre. >> > Are there a bunch of solves as in that case? >> > >> > My understanding is that one should be able to >> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the >> implementation has limits on how many communicators can co-exist at any one >> time. The many-at-once is what we encountered in that 2018 thread. >> > >> > One way to check would be to use a debugger or tracer to examine the >> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. >> > >> > case 1: we'll find lots of dups without frees (until the end) because >> the user really wants lots of these existing at the same time. >> > >> > case 2: dups are unfreed because of reference counting >> issue/inessential references >> > >> > >> > In case 1, I think the solution is as outlined in the thread, PETSc can >> create an inner-comm for Hypre. I think I'd prefer to attach it to the >> outer comm instead of the PETSc inner comm, but perhaps a case could be >> made either way. >> > >
