Manav Can you add a MPI_Barrier before
ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); Also, in order to assess where the issue is, we need to see the values (per rank) of ((Mat_SeqAIJ*)aij->B->data)->nonew mat->was_assembled aij->donotstash mat->nooffprocentries Another question: is this the first matrix assembly of the code? If you change to pc_none, do you get the same issue? > On Aug 20, 2020, at 3:10 PM, Manav Bhatia <bhatiama...@gmail.com> wrote: > > > >> On Aug 19, 2020, at 9:39 PM, Matthew Knepley <knep...@gmail.com >> <mailto:knep...@gmail.com>> wrote: >> >> Jed is more knowledgeable about the communication, but I have a simple >> question about the FEM method. Normally, the way >> we divide unknowns is that the only unknowns which might have entries >> computed off-process are those on the partition boundary. >> However, it sounds like you have a huge number of communicated values. Is it >> possible that the division of rows in your matrix does >> not match the division of the cells you compute element matrices for? > > > I hope that is not the case. I am using libMesh to manage the mesh and > creation of sparsity pattern, which uses Parmetis to create the partitions. > libMesh ensures that off-process entries are only at the partition boundary > (unless an extra set of DoFs are marked for coupling. > > I also printed and looked at the n_nz and n_oz values on each rank and it > does not seem to raise any flags. > > I will try to dig in a bit further to make sure everything checks out. > > Looking at the screenshots I had shared yesterday, all processes are in this > function: > > PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode) > { > Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data; > Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data; > PetscErrorCode ierr; > PetscMPIInt n; > PetscInt i,j,rstart,ncols,flg; > PetscInt *row,*col; > PetscBool other_disassembled; > PetscScalar *val; > > /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in > disassembly */ > > PetscFunctionBegin; > if (!aij->donotstash && !mat->nooffprocentries) { > while (1) { > ierr = > MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); > if (!flg) break; > > for (i=0; i<n; ) { > /* Now identify the consecutive vals belonging to the same row */ > for (j=i,rstart=row[j]; j<n; j++) { > if (row[j] != rstart) break; > } > if (j < n) ncols = j-i; > else ncols = n-i; > /* Now assemble all these values with a single function call */ > ierr = > MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr); > > i = j; > } > } > ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); > } > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr); > > /* determine if any processor has disassembled, if so we must > also disassemble ourselfs, in order that we may reassemble. */ > /* > if nonzero structure of submatrix B cannot change then we know that > no processor disassembled thus we can skip this stuff > */ > if (!((Mat_SeqAIJ*)aij->B->data)->nonew) { > ierr = > MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); > if (mat->was_assembled && !other_disassembled) { > ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr); > } > } > if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > } > ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr); > ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr); > ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr); > > ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr); > > aij->rowvalues = 0; > > ierr = VecDestroy(&aij->diag);CHKERRQ(ierr); > if (a->inode.size) mat->ops->multdiagonalblock = > MatMultDiagonalBlock_MPIAIJ; > > /* if no new nonzero locations are allowed in matrix then only set the > matrix state the first time through */ > if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || > !((Mat_SeqAIJ*)(aij->A->data))->nonew) { > PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate; > ierr = > MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > I noticed that of the 8 MPI processes, 2 were stuck at > ierr = > MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); > > Other two were stuck at > ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); > > And remaining four were under > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > > Is it expected for processes to be at different stages in this function? > > -Manav > >