On Tue, May 29, 2012 at 3:52 PM, Benjamin Sanderse <B.Sanderse at cwi.nl>wrote:
> Hello all, > > I have a simple question about using MatMult (or MatMultAdd) in parallel. > > I am performing the matrix-vector multiplication > > z = A*x + y > > in my code by using > > call MatMultAdd(A,x,y,z,ierr); CHKERRQ(ierr) > > A is a sparse matrix, type MPIAIJ, and x, y, and z have been obtained using > > call MatGetVecs(A,x,y,ierr); CHKERRQ(ierr) > call MatGetVecs(A,PETSC_NULL_OBJECT,z,ierr); CHKERRQ(ierr) > > x, y, and z are vecs of type mpi. > > The problem is that in the sequential case the MatMultAdd is MUCH faster > than in the parallel case (at least a factor 100 difference). > With any performance question, always always always send the output of -log_summary to petsc-maint at mcs.anl.gov. Matt > As an example, here is the output with some properties of A when using > -mat_view_info and -info: > > 2 processors: > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] MatStashScatterBegin_Private(): No of messages: 0 > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100 > unneeded,900 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 450; storage space: 100 > unneeded,900 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2 > [0] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode > routines > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2 > [1] Mat_CheckInode(): Found 500 nodes out of 500 rows. Not using Inode > routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: MPI to Seq > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0 > unneeded,0 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 500 X 0; storage space: 0 > unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > Matrix Object: 2 MPI processes > type: mpiaij > rows=1000, cols=900 > total: nonzeros=1800, allocated nonzeros=2000 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > > 1 processor: > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374783 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1000 X 900; storage space: 200 > unneeded,1800 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 2 > [0] Mat_CheckInode(): Found 1000 nodes out of 1000 rows. Not using Inode > routines > Matrix Object: 1 MPI processes > type: seqaij > rows=1000, cols=900 > total: nonzeros=1800, allocated nonzeros=2000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > When I look at the partitioning of the vectors, I have the following for > the parallel case: > x: > 0 450 > 450 900 > y: > 0 500 > 500 1000 > z: > 0 500 > 500 1000 > > This seems OK to me. > > Certainly I am missing something in performing this matrix-vector > multiplication efficiently. Any ideas? > > Best regards, > > Benjamin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120529/2796f43c/attachment.html>
