Hi Barry and Junchao,

Actually I did a simple MPI "dup and free" test before with Spectrum MPI, but that one did not have any problem. I'm not a PETSc programmer as I mainly use deal.ii's PETSc wrappers, but I managed to write a minimal program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This piece of code creates and destroys 10,000 instances of Hypre Parasail preconditioners (for my own code, it uses Euclid, but I don't think it matters). It runs fine with OpenMPI but reports the out of communicator error with Sepctrum MPI. The code is attached in the email. In case the attachment is not available, I also uploaded a copy on my google drive:

https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing

Thanks!

Feimi

On 8/20/21 9:58 AM, Junchao Zhang wrote:
Feimi, if it is easy to reproduce, could you give instructions on how to reproduce that?

PS: Spectrum MPI is based on OpenMPI.  I don't understand why it has the problem but OpenMPI does not.  It could be a bug in petsc or user's code.  For reference counting on MPI_Comm, we already have petsc inner comm. I think we can reuse that.

--Junchao Zhang


On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <[email protected] <mailto:[email protected]>> wrote:


      It sounds like maybe the Spectrum MPI_Comm_free() is not
    returning the comm to the "pool" as available for future use; a
    very buggy MPI implementation. This can easily be checked in a
    tiny standalone MPI program that simply comm dups and frees
    thousands of times in a loop. Could even be a configure test (that
    requires running an MPI program). I do not remember if we ever
    tested this possibility; maybe and I forgot.

      If this is the problem we can provide a "work around" that
    attributes the new comm (to be passed to hypre) to the old comm
    with a reference count value also in the attribute. When the hypre
    matrix is created that count is (with the new comm) is set to 1,
    when the hypre matrix is freed that count is set to zero (but the
    comm is not freed), in the next call to create the hypre matrix
    when the attribute is found, the count is zero so PETSc knows it
    can pass the same comm again to the new hypre matrix.

    This will only allow one simultaneous hypre matrix to be created
    from the original comm. To allow multiply simultaneous hypre
    matrix one could have multiple comms and counts in the attribute
    and just check them until one finds an available one to reuse (or
    creates yet another one if all the current ones are busy with
    hypre matrices). So it is the same model as DMGetXXVector() where
    vectors are checked out and then checked in to be available later.
    This would solve the currently reported problem (if it is a buggy
    MPI that does not properly free comms), but not solve the MOOSE
    problem where 10,000 comms are needed at the same time.

      Barry





    On Aug 19, 2021, at 3:29 PM, Junchao Zhang
    <[email protected] <mailto:[email protected]>> wrote:




    On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Jed,

        In my case, I only have 2 hypre preconditioners at the same
        time, and
        they do not solve simultaneously, so it might not be case 1.

        I checked the stack for all the calls of
        MPI_Comm_dup/MPI_Comm_free on
        my own machine (with OpenMPI), all the communicators are
        freed from my
        observation. I could not test it with Spectrum MPI on the
        clusters
        immediately because all the dependencies were built in
        release mode.
        However, as I mentioned, I haven't had this problem with
        OpenMPI before,
        so I'm not sure if this is really an MPI implementation
        problem, or just
        because Spectrum MPI has less limit for the number of
        communicators,
        and/or this also depends on how many MPI ranks are used, as
        only 2 out
        of 40 ranks reported the error.

    You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the
    two ranks, e.g., if (myrank == 38) printf(...), to see if the
    dup/free are paired.
     As a workaround, I replaced the MPI_Comm_dup() at

        petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy
        assignment, and also
        removed the MPI_Comm_free() in the hypre destroyer. My code
        runs fine
        with Spectrum MPI now, but I don't think this is a long-term
        solution.

        Thanks!

        Feimi

        On 8/19/21 9:01 AM, Jed Brown wrote:
        > Junchao Zhang <[email protected]
        <mailto:[email protected]>> writes:
        >
        >> Hi, Feimi,
        >>    I need to consult Jed (cc'ed).
        >>    Jed, is this an example of
        >>
        
https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
        
<https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
        >> If Feimi really can not free matrices, then we just need
        to attach a
        >> hypre-comm to a petsc inner comm, and pass that to hypre.
        > Are there a bunch of solves as in that case?
        >
        > My understanding is that one should be able to
        MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
        implementation has limits on how many communicators can
        co-exist at any one time. The many-at-once is what we
        encountered in that 2018 thread.
        >
        > One way to check would be to use a debugger or tracer to
        examine the stack every time (P)MPI_Comm_dup and
        (P)MPI_Comm_free are called.
        >
        > case 1: we'll find lots of dups without frees (until the
        end) because the user really wants lots of these existing at
        the same time.
        >
        > case 2: dups are unfreed because of reference counting
        issue/inessential references
        >
        >
        > In case 1, I think the solution is as outlined in the
        thread, PETSc can create an inner-comm for Hypre. I think I'd
        prefer to attach it to the outer comm instead of the PETSc
        inner comm, but perhaps a case could be made either way.


#include <petscsys.h>
#include <petscconf.h>
#include <petscmat.h>
#include <petscpc.h>

#include <iostream>

namespace
{
  inline void set_option_value(const std::string &name, const std::string &value)
  {
    const PetscErrorCode ierr =
        PetscOptionsSetValue(nullptr, name.c_str(), value.c_str());
  }
}

class my_hypre_precon
{
public:
  my_hypre_precon() : pc(nullptr){};
  void clear();
  void initialize(Mat matrix);
  ~my_hypre_precon() { clear(); };

private:
  PC pc;
};

void my_hypre_precon::clear()
{
  if (pc != nullptr)
  {
    PetscErrorCode ierr = PCDestroy(&pc);
    pc = nullptr;
  }
}

void my_hypre_precon::initialize(Mat matrix)
{
  clear();

  MPI_Comm comm;
  PetscErrorCode ierr =
      PetscObjectGetComm(PetscObject(matrix), &comm);

  ierr = PCCreate(comm, &pc);

  ierr = PCSetOperators(pc, matrix, matrix);

  ierr = PCSetType(pc, const_cast<char *>(PCHYPRE));

  ierr = PCHYPRESetType(pc, "parasails");

  ::set_option_value("-pc_hypre_parasails_sym", "nonsymmetric");

  ::set_option_value("-pc_hypre_parasails_nlevels", "1");

  ::set_option_value("-pc_hypre_parasails_thresh", "0.1");

  ::set_option_value("-pc_hypre_parasails_filter", "0.05");

  ierr = PCSetFromOptions(pc);

  ierr = PCSetUp(pc);
}

int main(int argc, char **argv)
{
  Mat A;
  PetscErrorCode ierr;
  PetscInt *ia, *ja;
  PetscMPIInt rank, size;

  ierr = PetscInitialize(&argc, &argv, 0, 0);
  if (ierr)
  {
    return ierr;
  }

  ierr = MPI_Comm_size(PETSC_COMM_WORLD, &size);
  CHKERRMPI(ierr);
  if (size != 4)
  {
    SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_WRONG_MPI_SIZE, "Must run with 4 processors");
  }
  ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
  CHKERRMPI(ierr);

  ierr = PetscMalloc1(5, &ia);
  CHKERRQ(ierr);
  ierr = PetscMalloc1(16, &ja);
  CHKERRQ(ierr);
  if (!rank)
  {
    ja[0] = 1;
    ja[1] = 4;
    ja[2] = 0;
    ja[3] = 2;
    ja[4] = 5;
    ja[5] = 1;
    ja[6] = 3;
    ja[7] = 6;
    ja[8] = 2;
    ja[9] = 7;
    ia[0] = 0;
    ia[1] = 2;
    ia[2] = 5;
    ia[3] = 8;
    ia[4] = 10;
  }
  else if (rank == 1)
  {
    ja[0] = 0;
    ja[1] = 5;
    ja[2] = 8;
    ja[3] = 1;
    ja[4] = 4;
    ja[5] = 6;
    ja[6] = 9;
    ja[7] = 2;
    ja[8] = 5;
    ja[9] = 7;
    ja[10] = 10;
    ja[11] = 3;
    ja[12] = 6;
    ja[13] = 11;
    ia[0] = 0;
    ia[1] = 3;
    ia[2] = 7;
    ia[3] = 11;
    ia[4] = 14;
  }
  else if (rank == 2)
  {
    ja[0] = 4;
    ja[1] = 9;
    ja[2] = 12;
    ja[3] = 5;
    ja[4] = 8;
    ja[5] = 10;
    ja[6] = 13;
    ja[7] = 6;
    ja[8] = 9;
    ja[9] = 11;
    ja[10] = 14;
    ja[11] = 7;
    ja[12] = 10;
    ja[13] = 15;
    ia[0] = 0;
    ia[1] = 3;
    ia[2] = 7;
    ia[3] = 11;
    ia[4] = 14;
  }
  else
  {
    ja[0] = 8;
    ja[1] = 13;
    ja[2] = 9;
    ja[3] = 12;
    ja[4] = 14;
    ja[5] = 10;
    ja[6] = 13;
    ja[7] = 15;
    ja[8] = 11;
    ja[9] = 14;
    ia[0] = 0;
    ia[1] = 2;
    ia[2] = 5;
    ia[3] = 8;
    ia[4] = 10;
  }

  ierr = MatCreate(PETSC_COMM_WORLD, &A);
  CHKERRQ(ierr);
  ierr = MatSetSizes(A, 4, 4, 16, 16);
  CHKERRQ(ierr);
  ierr = MatSetType(A, MATMPIAIJ);
  CHKERRQ(ierr);
  ierr = MatMPIAIJSetPreallocationCSR(A, ia, ja, NULL);
  CHKERRQ(ierr);
  for (unsigned i = 0; i < 100000; ++i)
  {
    std::cout << "Starting " << i
              << "th preconditioner initialization" << std::endl;
    my_hypre_precon precon;
    precon.initialize(A);
  }
  ierr = PetscFinalize();
  return ierr;
}

Reply via email to