Discussed with Mr. Hong. The following two new lines fix the problem. Either
single line also works, but I think we should have both. Similar things happen
to dmsnes and dmts. But I need DM experts' feedback before changing them.
Thanks.
diff --git a/src/ksp/ksp/interface/dmksp.c b/src/ksp/ksp/interface/dmksp.c
index 9ce75090..0ab69574 100644
--- a/src/ksp/ksp/interface/dmksp.c
+++ b/src/ksp/ksp/interface/dmksp.c
@@ -80,6 +80,7 @@ PetscErrorCode DMKSPCopy(DMKSP kdm,DMKSP nkdm)
nkdm->rhsctx = kdm->rhsctx;
nkdm->initialguessctx = kdm->initialguessctx;
nkdm->data = kdm->data;
+ nkdm->originaldm = kdm->originaldm;
nkdm->fortran_func_pointers[0] = kdm->fortran_func_pointers[0];
nkdm->fortran_func_pointers[1] = kdm->fortran_func_pointers[1];
@@ -156,6 +157,7 @@ PetscErrorCode DMGetDMKSPWrite(DM dm,DMKSP *kspdm)
ierr = DMKSPCopy(oldkdm,kdm);CHKERRQ(ierr);
ierr = DMKSPDestroy((DMKSP*)&dm->dmksp);CHKERRQ(ierr);
dm->dmksp = (PetscObject)kdm;
+ kdm->originaldm = dm;
}
*kspdm = kdm;
PetscFunctionReturn(0);
--Junchao Zhang
On Fri, Jun 14, 2019 at 1:07 PM Junchao Zhang
<[email protected]<mailto:[email protected]>> wrote:
On Fri, Jun 14, 2019 at 1:01 PM Lawrence Mitchell
<[email protected]<mailto:[email protected]>> wrote:
> On 14 Jun 2019, at 18:44, Zhang, Junchao via petsc-dev
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hello,
> I am investigating petsc issue 306. One can produce the problem with
> src/snes/examples/tutorials/ex9.c and mpirun -n 3 ./ex9 -snes_grid_sequence 3
> -snes_converged_reason -pc_type mg
> The program can run to finish, or crash, or hang. The error appears
> either in PetscGatherMessageLengths or PetscCommBuildTwoSided(). From from
> my debugging, the following routine is suspicious. It claims not collective,
> but it might call DMKSPCreate, which can indirectly call a collective
> MPI_Comm_dup().
I would have thought that DMKSPCreate could only call MPI_Comm_dup (via
PetscCommDuplicate) if the incoming dm has a communicator which is /not/ a
PETSc communicator. Given that all PETSc objects must (?) return a PETSc
communicator when calling PetscObjectComm, this function is presumably
incidentally not collective (although it is logically collective I would have
thought), oh, and with PETSC_USE_DEBUG, there's a barrier even in the "return
immediately with a PETSc comm" case.
Yes, PetscCommDuplicate not always calls MPI_Comm_dup, but it decreases MPI
tag, resulting in MPI tag mismatch and crashing the code. If I turn on
PETSC_USE_DEBUG, the code sometime (but not all times) hang in
PetscCommDuplicate.
FWIW, the call site looks to be collective (PCSetUp_MG).
Lawrence
> Can someone familiar with the code explain it? Thanks.
>
> /*@C
> DMGetDMKSPWrite - get write access to private DMKSP context from a DM
>
> Not Collective
>
> Input Argument:
> . dm - DM to be used with KSP
>
> Output Argument:
> . kspdm - private DMKSP context
>
> Level: developer
>
> .seealso: DMGetDMKSP()
> @*/
> PetscErrorCode DMGetDMKSPWrite(DM dm,DMKSP *kspdm)
> {
> PetscErrorCode ierr;
> DMKSP kdm;
>
> PetscFunctionBegin;
> PetscValidHeaderSpecific(dm,DM_CLASSID,1);
> ierr = DMGetDMKSP(dm,&kdm);CHKERRQ(ierr);
> if (!kdm->originaldm) kdm->originaldm = dm;
> if (kdm->originaldm != dm) { /* Copy on write */
> DMKSP oldkdm = kdm;
> ierr = PetscInfo(dm,"Copying DMKSP due to write\n");CHKERRQ(ierr);
> ierr =
> DMKSPCreate(PetscObjectComm((PetscObject)dm),&kdm);CHKERRQ(ierr);
> ierr = DMKSPCopy(oldkdm,kdm);CHKERRQ(ierr);
> ierr = DMKSPDestroy((DMKSP*)&dm->dmksp);CHKERRQ(ierr);
> dm->dmksp = (PetscObject)kdm;
> }
> *kspdm = kdm;
> PetscFunctionReturn(0);
> }
>
> The calling stack is
> ...
> #20 0x00007fcf3e3d23e2 in PetscCommDuplicate
> (comm_in=comm_in@entry=-2080374782, comm_out=comm_out@entry=0x557a18304db0,
> first_tag=first_tag@entry=0x557a18304de4) at
> /home/jczhang/petsc/src/sys/objects/tagm.c:162
> #21 0x00007fcf3e3d7730 in PetscHeaderCreate_Private (h=0x557a18304d70,
> classid=<optimized out>, class_name=class_name@entry=0x7fcf3f7f762a "DMKSP",
> descr=descr@entry=0x7fcf3f7f762a "DMKSP", mansec=mansec@entry=0x7fcf3f7f762a
> "DMKSP", comm=comm@entry=-2080374782, destroy=0x7fcf3f350570 <DMKSPDestroy>,
> view=0x0) at /home/jczhang/petsc/src/sys/objects/inherit.c:64
> #22 0x00007fcf3f3504c9 in DMKSPCreate (comm=-2080374782,
> kdm=kdm@entry=0x7ffc1d4d00f8) at
> /home/jczhang/petsc/src/ksp/ksp/interface/dmksp.c:24
> #23 0x00007fcf3f35150f in DMGetDMKSPWrite (dm=0x557a18541a10,
> kspdm=kspdm@entry=0x7ffc1d4d01a8) at
> /home/jczhang/petsc/src/ksp/ksp/interface/dmksp.c:155
> #24 0x00007fcf3f1bb20e in PCSetUp_MG (pc=<optimized out>) at
> /home/jczhang/petsc/src/ksp/pc/impls/mg/mg.c:682
> #25 0x00007fcf3f204bea in PCSetUp (pc=0x557a17dc1860) at
> /home/jczhang/petsc/src/ksp/pc/interface/precon.c:894
> #26 0x00007fcf3f32ba4b in KSPSetUp (ksp=0x557a17d73500) at
> /home/jczhang/petsc/src/ksp/ksp/interface/itfunc.c:377
> #27 0x00007fcf3f41e43e in SNESSolve_VINEWTONRSLS (snes=0x557a17bff210) at
> /home/jczhang/petsc/src/snes/impls/vi/rs/virs.c:502
> #28 0x00007fcf3f3fa191 in SNESSolve (snes=0x557a17bff210, b=0x0, x=<optimized
> out>) at /home/jczhang/petsc/src/snes/interface/snes.c:4433
> #29 0x0000557a16432095 in main (argc=<optimized out>, argv=<optimized out>)
> at /home/jczhang/petsc/src/snes/examples/tutorials/ex9.c:105
>
> --Junchao Zhang