You can do 2 things. 1. allocate sufficient stash space to avoid mallocs. You can do this with the following runtime command line options -vecstash_initial_size -matstash_initial_size
2. flush stashed values in stages instead of doing a single large communication at the end. <add values to matrix> MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY) <add values to matrix> MatAssemblyBegin/End(MAT_FLUSH_ASSEMBLY) ... ... <add values to matrix> MatAssemblyBegin/End(MAT_FINAL_ASSEMBLY) Satish On Wed, 18 Jan 2012, Wen Jiang wrote: > Hi, > > I am working on FEM codes with spline-based element type. For 3D case, one > element has 64 nodes and every two neighboring elements share 48 nodes. > Thus regardless how I partition a mesh, there are still very large number > of entries that have to write on the 'wrong' processor. And my code is > running on clusters, the processes are sending between 550 and 620 Million > packets per second across the network. My code seems IO-bound at this > moment and just get stuck at the matrix assembly stage. A -info file is > attached. Do I have other options to optimize my codes to be less > io-intensive? > > Thanks in advance. > > [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs. > [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs. > [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs. > [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs. > [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs. > [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10 mallocs. > [0] MatStashScatterBegin_Private(): No of messages: 1 > [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10 mallocs. > [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10 mallocs. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > unneeded,2514537 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode > routines > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0 > unneeded,2525390 used > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode > routines > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > unneeded,2500281 used > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [3] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode > routines > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > unneeded,2500281 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode > routines > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > unneeded,2500281 used > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [4] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode > routines > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > unneeded,2525733 used > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode > routines >
