Manav

Can you add a MPI_Barrier before

ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);

Also, in order to assess where the issue is, we need to see the values (per 
rank) of 

((Mat_SeqAIJ*)aij->B->data)->nonew
mat->was_assembled
aij->donotstash
mat->nooffprocentries

Another question: is this the first matrix assembly of the code?
If you change to pc_none, do you get the same issue?

> On Aug 20, 2020, at 3:10 PM, Manav Bhatia <bhatiama...@gmail.com> wrote:
> 
> 
> 
>> On Aug 19, 2020, at 9:39 PM, Matthew Knepley <knep...@gmail.com 
>> <mailto:knep...@gmail.com>> wrote:
>> 
>> Jed is more knowledgeable about the communication, but I have a simple 
>> question about the FEM method. Normally, the way
>> we divide unknowns is that the only unknowns which might have entries 
>> computed off-process are those on the partition boundary.
>> However, it sounds like you have a huge number of communicated values. Is it 
>> possible that the division of rows in your matrix does
>> not match the division of the cells you compute element matrices for?
> 
> 
> I hope that is not the case. I am using libMesh to manage the mesh and 
> creation of sparsity pattern, which uses Parmetis to create the partitions. 
> libMesh ensures that off-process entries are only at the partition boundary 
> (unless an extra set of DoFs are marked for coupling. 
> 
> I also printed and looked at the n_nz and n_oz values on each rank and it 
> does not seem to raise any flags.  
> 
> I will try to dig in a bit further to make sure everything checks out. 
> 
> Looking at the screenshots I had shared yesterday, all processes are in this 
> function: 
> 
> PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode)
> {
>   Mat_MPIAIJ     *aij = (Mat_MPIAIJ*)mat->data;
>   Mat_SeqAIJ     *a   = (Mat_SeqAIJ*)aij->A->data;
>   PetscErrorCode ierr;
>   PetscMPIInt    n;
>   PetscInt       i,j,rstart,ncols,flg;
>   PetscInt       *row,*col;
>   PetscBool      other_disassembled;
>   PetscScalar    *val;
> 
>   /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in 
> disassembly */
> 
>   PetscFunctionBegin;
>   if (!aij->donotstash && !mat->nooffprocentries) {
>     while (1) {
>       ierr = 
> MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr);
>       if (!flg) break;
> 
>       for (i=0; i<n; ) {
>         /* Now identify the consecutive vals belonging to the same row */
>         for (j=i,rstart=row[j]; j<n; j++) {
>           if (row[j] != rstart) break;
>         }
>         if (j < n) ncols = j-i;
>         else       ncols = n-i;
>         /* Now assemble all these values with a single function call */
>         ierr = 
> MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr);
> 
>         i = j;
>       }
>     }
>     ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr);
>   }
>   ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>   ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr);
> 
>   /* determine if any processor has disassembled, if so we must
>      also disassemble ourselfs, in order that we may reassemble. */
>   /*
>      if nonzero structure of submatrix B cannot change then we know that
>      no processor disassembled thus we can skip this stuff
>   */
>   if (!((Mat_SeqAIJ*)aij->B->data)->nonew) {
>     ierr = 
> MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr);
>     if (mat->was_assembled && !other_disassembled) {
>       ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr);
>     }
>   }
>   if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) {
>     ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr);
>   }
>   ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr);
>   ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr);
>   ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr);
> 
>   ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr);
> 
>   aij->rowvalues = 0;
> 
>   ierr = VecDestroy(&aij->diag);CHKERRQ(ierr);
>   if (a->inode.size) mat->ops->multdiagonalblock = 
> MatMultDiagonalBlock_MPIAIJ;
> 
>   /* if no new nonzero locations are allowed in matrix then only set the 
> matrix state the first time through */
>   if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || 
> !((Mat_SeqAIJ*)(aij->A->data))->nonew) {
>     PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate;
>     ierr = 
> MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr);
>   }
>   PetscFunctionReturn(0);
> }
> 
> 
>  I noticed that of the 8 MPI processes, 2 were stuck at 
>       ierr = 
> MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr);
> 
> Other two were stuck at
>      ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr);
> 
> And remaining four were under
>      ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr);
> 
> Is it expected for processes to be at different stages in this function? 
> 
> -Manav
> 
> 

Reply via email to