I think that the test
if ((diag != 0.0) && (mat->A->rmap->N == mat->A->cmap->N)) {
at line 836 in src/mat/impls/aij/mpi/mpiaij.c (master) is a bug, either for
rectangular matrices or even for square matrices with non-trivial layouts.
It can happen that some of the processes does not satisfy the test and some
others do; in this case, MatAssemblyBegin/End is not collectively called.
I don’t see any trivial fix; my solution involves computing a flag during
MatSetSizes() based on some test on the ranges of the row and column layouts.
I will go on and fix it in a branch unless someone has a simpler solution.