I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. Please note that the test is specifically for 8 ranks.
The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing <https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing> This contains a README file with instructions on running. Please note that the work directory needs the index files. Please let me know if I can provide any further information. Thank you all for your help. Regards, Manav > On Aug 20, 2020, at 12:54 PM, Jed Brown <j...@jedbrown.org> wrote: > > Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>> writes: > >> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiama...@gmail.com> wrote: >> >>> >>> >>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zamp...@gmail.com> >>> wrote: >>> >>> Can you add a MPI_Barrier before >>> >>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>> >>> >>> With a MPI_Barrier before this function call: >>> — three of the processes have already hit this barrier, >>> — the other 5 are inside MatStashScatterGetMesg_Private -> >>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>> processes) > > This is not itself evidence of inconsistent state. You can use > > -build_twosided allreduce > > to avoid the nonblocking sparse algorithm. > >> >> Okay, you should run this with -matstash_legacy just to make sure it is not >> a bug in your MPI implementation. But it looks like >> there is inconsistency in the parallel state. This can happen because we >> have a bug, or it could be that you called a collective >> operation on a subset of the processes. Is there any way you could cut down >> the example (say put all 1s in the matrix, etc) so >> that you could give it to us to run?