On Tue, 19 May 2009, Satish Balay wrote: > On Tue, 19 May 2009, tribur at vision.ee.ethz.ch wrote: > > > Distinguished PETSc experts, > > > > Assuming processor k has defined N entries of a parallel matrix using > > MatSetValues. The half of the entries are in matrix rows belonging to this > > processor, but the other half are situated in rows of other processors. > > > > My question: > > > > When does MatAssemblyBegin+MatAssemblyEnd take longer, if the rows where the > > second half of the entries are situated belong all to one single other > > processor, e.g. processor k+1, or if these rows are distributed across > > several, let's say 4, other processors? Is there a significant difference? > > Obviously there will be a difference. But it will depend upon the > network/MPI behavior. > > A single large one-to-one message vs multiple small all-to-all messages. > > Wrt PETSc part - you might have to make sure enough memory is > allocated for these buffers. If the default is small - then there > could be multiple malloc/copies that could slow things down. > > Run with '-info' and look for "stash". The number of mallocs here > should be 0 for efficient matrix assembly [The stash size can be > changed with a command line option -matstash_initial_size]
Another note: If you have lot of data movement during matassembly - you can do a MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY) - to flush out the currently accumulated off-proc-data - and continue with more MatSetValues(). It might help on some network/mpi types [we don't know for sure..].. Satish
