On Tue, 19 May 2009, tribur at vision.ee.ethz.ch wrote: > Distinguished PETSc experts, > > Assuming processor k has defined N entries of a parallel matrix using > MatSetValues. The half of the entries are in matrix rows belonging to this > processor, but the other half are situated in rows of other processors. > > My question: > > When does MatAssemblyBegin+MatAssemblyEnd take longer, if the rows where the > second half of the entries are situated belong all to one single other > processor, e.g. processor k+1, or if these rows are distributed across > several, let's say 4, other processors? Is there a significant difference?
Obviously there will be a difference. But it will depend upon the network/MPI behavior. A single large one-to-one message vs multiple small all-to-all messages. Wrt PETSc part - you might have to make sure enough memory is allocated for these buffers. If the default is small - then there could be multiple malloc/copies that could slow things down. Run with '-info' and look for "stash". The number of mallocs here should be 0 for efficient matrix assembly [The stash size can be changed with a command line option -matstash_initial_size] Satish
