Hi Jed,
In my case, I only have 2 hypre preconditioners at the same time, and
they do not solve simultaneously, so it might not be case 1.
I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
my own machine (with OpenMPI), all the communicators are freed from my
observation. I could not test it with Spectrum MPI on the clusters
immediately because all the dependencies were built in release mode.
However, as I mentioned, I haven't had this problem with OpenMPI before,
so I'm not sure if this is really an MPI implementation problem, or just
because Spectrum MPI has less limit for the number of communicators,
and/or this also depends on how many MPI ranks are used, as only 2 out
of 40 ranks reported the error.
As a workaround, I replaced the MPI_Comm_dup() at
petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
with Spectrum MPI now, but I don't think this is a long-term solution.
Thanks!
Feimi
On 8/19/21 9:01 AM, Jed Brown wrote:
Junchao Zhang <[email protected]> writes:
Hi, Feimi,
I need to consult Jed (cc'ed).
Jed, is this an example of
https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
If Feimi really can not free matrices, then we just need to attach a
hypre-comm to a petsc inner comm, and pass that to hypre.
Are there a bunch of solves as in that case?
My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as
many times as you like, but the implementation has limits on how many
communicators can co-exist at any one time. The many-at-once is what we
encountered in that 2018 thread.
One way to check would be to use a debugger or tracer to examine the stack
every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
case 1: we'll find lots of dups without frees (until the end) because the user
really wants lots of these existing at the same time.
case 2: dups are unfreed because of reference counting issue/inessential
references
In case 1, I think the solution is as outlined in the thread, PETSc can create
an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm
instead of the PETSc inner comm, but perhaps a case could be made either way.