Are we sure this is a PETSc comm issue and not a hypre comm duplication issue
frame #6: 0x00000001061345d9 libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, participate=<unavailable>, new_comm_ptr=<unavailable>) + 409 at gen_redcs_mat.c:531 [opt] Looks like hypre is needed to generate subcomms, perhaps it generates too many? Barry > On Apr 2, 2018, at 7:07 PM, Derek Gaston <fried...@gmail.com> wrote: > > I’m working with Fande on this and I would like to add a bit more. There are > many circumstances where we aren’t working on COMM_WORLD at all (e.g. working > on a sub-communicator) but PETSc was initialized using MPI_COMM_WORLD (think > multi-level solves)… and we need to create arbitrarily many PETSc > vecs/mats/solvers/preconditioners and solve. We definitely can’t rely on > using PETSC_COMM_WORLD to avoid triggering duplication. > > Can you explain why PETSc needs to duplicate the communicator so much? > > Thanks for your help in tracking this down! > > Derek > > On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande <fande.k...@inl.gov> wrote: > Why we do not use user-level MPI communicators directly? What are potential > risks here? > > > Fande, > > On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay <ba...@mcs.anl.gov> wrote: > PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to > MPI_Comm_dup() - thus potentially avoiding such errors > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html&d=DwIBAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU&s=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU&e= > > > Satish > > On Mon, 2 Apr 2018, Kong, Fande wrote: > > > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay <ba...@mcs.anl.gov> wrote: > > > > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects? > > > > > > If so - you could try changing to PETSC_COMM_WORLD > > > > > > > > > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects. > > Why we can not use MPI_COMM_WORLD? > > > > > > Fande, > > > > > > > > > > Satish > > > > > > > > > On Mon, 2 Apr 2018, Kong, Fande wrote: > > > > > > > Hi All, > > > > > > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its > > > > applications. I have a error message for a standard test: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *preconditioners/pbp.lots_of_variables: MPI had an > > > > errorpreconditioners/pbp.lots_of_variables: > > > > ------------------------------------------------ > > > preconditioners/pbp.lots_of_variables: > > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables: > > > > PMPI_Comm_dup(177)..................: MPI_Comm_dup(comm=0x84000001, > > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables: > > > > PMPI_Comm_dup(162)..................: > > > > preconditioners/pbp.lots_of_variables: > > > > MPIR_Comm_dup_impl(57)..............: > > > > preconditioners/pbp.lots_of_variables: > > > > MPIR_Comm_copy(739).................: > > > > preconditioners/pbp.lots_of_variables: > > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048 > > > free > > > > on this process; ignore_id=0)* > > > > > > > > > > > > I did "git bisect', and the following commit introduces this issue: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini > > > > <stefano.zamp...@gmail.com <stefano.zamp...@gmail.com>>Date: Sat Nov 5 > > > > 20:15:19 2016 +0300 PCHYPRE: use internal Mat of type MatHYPRE > > > > hpmat already stores two HYPRE vectors* > > > > > > > > Before I debug line-by-line, anyone has a clue on this? > > > > > > > > > > > > Fande, > > > > > > > > > > > > >