Hi Jed,

In my case, I only have 2 hypre preconditioners at the same time, and they do not solve simultaneously, so it might not be case 1.

I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on my own machine (with OpenMPI), all the communicators are freed from my observation. I could not test it with Spectrum MPI on the clusters immediately because all the dependencies were built in release mode. However, as I mentioned, I haven't had this problem with OpenMPI before, so I'm not sure if this is really an MPI implementation problem, or just because Spectrum MPI has less limit for the number of communicators, and/or this also depends on how many MPI ranks are used, as only 2 out of 40 ranks reported the error.

As a workaround, I replaced the MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also removed the MPI_Comm_free() in the hypre destroyer. My code runs fine with Spectrum MPI now, but I don't think this is a long-term solution.

Thanks!

Feimi

On 8/19/21 9:01 AM, Jed Brown wrote:
Junchao Zhang <[email protected]> writes:

Hi, Feimi,
   I need to consult Jed (cc'ed).
   Jed, is this an example of
https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
If Feimi really can not free matrices, then we just need to attach a
hypre-comm to a petsc inner comm, and pass that to hypre.
Are there a bunch of solves as in that case?

My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as 
many times as you like, but the implementation has limits on how many 
communicators can co-exist at any one time. The many-at-once is what we 
encountered in that 2018 thread.

One way to check would be to use a debugger or tracer to examine the stack 
every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.

case 1: we'll find lots of dups without frees (until the end) because the user 
really wants lots of these existing at the same time.

case 2: dups are unfreed because of reference counting issue/inessential 
references


In case 1, I think the solution is as outlined in the thread, PETSc can create 
an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm 
instead of the PETSc inner comm, but perhaps a case could be made either way.

Reply via email to