Hi John, I had to go digging into the PETSc source a little to figure out what was going wrong. Here's a brief summary:
Serial matrices in PETSc are represented as a single matrix, with data stored in mat->data A parallel matrix A in PETSc is represented as two matrices, a matrix a comprised of locally owned rows and columns of A corresponding to the owned entries of the input vector x, and a matrix b comprised of locally owned rows and the remaining columns of B corresponding to non-local entries of x. A = a+b y = A*x = a*x_l + b*x_u x_l = local values of x x_u = unowned values of x This is enabled by a VecScatter that occurs concurrent to the a*x_l operation, which sends the required unowned values of x into a (potentially large!) buffer A->lvec. If you check out the function MatMultCDCR_MPIAIJ (MatMult_MPIAIJ in src/mat/impls/aij/mpi/mpiaij.c if you're following in a standard PETSc distro) you'll get a view of how this works. I'm using dynamic PETSc queries for now to pick these up, but we could also just hardcode the calls to our code here. Notice that we need a sequential MatMultAdd routine, even if we're just doing parallel MatMults. I have patched in the standard PETSc sequential aij MatMultAdd routine for now, but since this is potentially a large portion of the parallel matmult, we will need to tune this. Barry sat down and explained this to me on my visit to Argonne, so I'm an idiot for not getting this right the first time :) Cheers, Aron
