http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#64-bit-indices
On Apr 23, 2011, at 12:36 PM, S V N Vishwanathan wrote: > Barry, > >> It has not assembled the matrix in an hour. It is working all night >> to assemble the matrix, the problem is that you are not >> preallocating the nonzeros per row with MatMPIAIJSetPreallocation() >> when pre allocation is correct it will always print 0 for Number of >> mallocs. The actual writing of the parallel matrix to the binary >> file will take at most minutes. > > You were absolutely right! I had not set the preallocation properly and > hence the code was painfully slow. I fixed that issue (see attached > code) and now it runs much faster. However, I am having a different > problem now. When I run the code for smaller matrices (less than a > million rows) everything works well. However, when working with large > matrices (e.g. 2.8 million rows x 1157 columns) writing the matrix to > file dies with the following message: > > Fatal error in MPI_Recv: Other MPI error > > Any hints on how to solve this problem or are deeply appreciated. > > vishy > > The output of running the code with the -info flag is as follows: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 4 > [0] PetscInitialize(): Running on machine: rossmann-b001.rcac.purdue.edu > [3] PetscInitialize(): PETSc successfully started: number of processors = 4 > [3] PetscInitialize(): Running on machine: rossmann-b004.rcac.purdue.edu > No libsvm test file specified! > > Reading libsvm train file at > /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt > [2] PetscInitialize(): PETSc successfully started: number of processors = 4 > [2] PetscInitialize(): Running on machine: rossmann-b003.rcac.purdue.edu > [3] PetscFOpen(): Opening file > /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt > [2] PetscFOpen(): Opening file > /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt > [0] PetscFOpen(): Opening file > /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt > [1] PetscInitialize(): PETSc successfully started: number of processors = 4 > [1] PetscInitialize(): Running on machine: rossmann-b002.rcac.purdue.edu > [1] PetscFOpen(): Opening file > /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt > m=100000 > m=200000 > m=300000 > m=400000 > m=500000 > m=600000 > m=700000 > m=800000 > m=900000 > m=1000000 > m=1100000 > m=1200000 > m=1300000 > m=1400000 > m=1500000 > m=1600000 > m=1700000 > m=1800000 > m=1900000 > m=2000000 > m=2100000 > m=2200000 > m=2300000 > m=2400000 > m=2500000 > m=2600000 > m=2700000 > m=2800000 > user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [0] PetscCommDuplicate(): returning tag 2147483647 > user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 > user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 > user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 > [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > 1140850688 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 > [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > -2080374784 > [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 > [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [2] PetscCommDuplicate(): returning tag 2147483647 > [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > 1140850688 > [2] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 > [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > -2080374784 > [2] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 > [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [3] PetscCommDuplicate(): returning tag 2147483647 > [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > 1140850688 > [3] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 > [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > -2080374784 > [3] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [0] PetscCommDuplicate(): returning tag 2147483647 > [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [1] PetscCommDuplicate(): returning tag 2147483647 > [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > 1140850688 > [1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 > [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm > -2080374784 > [1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 > [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [1] PetscCommDuplicate(): returning tag 2147483647 > [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [2] PetscCommDuplicate(): returning tag 2147483647 > [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 > max tags = 2147483647 > [3] PetscCommDuplicate(): returning tag 2147483647 > [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [3] PetscCommDuplicate(): returning tag 2147483642 > [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [2] PetscCommDuplicate(): returning tag 2147483642 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [1] PetscCommDuplicate(): returning tag 2147483642 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): returning tag 2147483642 > [0] MatSetUpPreallocation(): Warning not preallocating matrix storage > [0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 > max tags = 2147483647 > [0] PetscCommDuplicate(): returning tag 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [0] PetscCommDuplicate(): returning tag 2147483646 > [2] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 > max tags = 2147483647 > [2] PetscCommDuplicate(): returning tag 2147483647 > [1] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 > max tags = 2147483647 > [1] PetscCommDuplicate(): returning tag 2147483647 > [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [2] PetscCommDuplicate(): returning tag 2147483646 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [1] PetscCommDuplicate(): returning tag 2147483646 > [3] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 > max tags = 2147483647 > [3] PetscCommDuplicate(): returning tag 2147483647 > [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [3] PetscCommDuplicate(): returning tag 2147483646 > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Private(): No of messages: 0 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 > unneeded,202300000 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 > unneeded,202300000 used > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 290; storage space: 0 > unneeded,202300000 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 > unneeded,202300000 used > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 > [1] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using > Inode routines > [3] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using > Inode routines > [2] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using > Inode routines > [0] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using > Inode routines > [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [3] PetscCommDuplicate(): returning tag 2147483645 > [3] PetscCommDuplicate(): returning tag 2147483638 > [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [2] PetscCommDuplicate(): returning tag 2147483645 > [2] PetscCommDuplicate(): returning tag 2147483638 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [0] PetscCommDuplicate(): returning tag 2147483645 > [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter > [0] PetscCommDuplicate(): returning tag 2147483638 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [1] PetscCommDuplicate(): returning tag 2147483645 > [1] PetscCommDuplicate(): returning tag 2147483638 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [0] PetscCommDuplicate(): returning tag 2147483644 > [0] PetscCommDuplicate(): returning tag 2147483637 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [1] PetscCommDuplicate(): returning tag 2147483644 > [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [3] PetscCommDuplicate(): returning tag 2147483644 > [3] PetscCommDuplicate(): returning tag 2147483637 > [1] PetscCommDuplicate(): returning tag 2147483637 > [0] PetscCommDuplicate(): returning tag 2147483632 > [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374783 > [2] PetscCommDuplicate(): returning tag 2147483644 > [2] PetscCommDuplicate(): returning tag 2147483637 > [1] PetscCommDuplicate(): returning tag 2147483632 > [2] PetscCommDuplicate(): returning tag 2147483632 > [3] PetscCommDuplicate(): returning tag 2147483632 > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: MPI to Seq > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 > unneeded,606900000 used > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 > [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [2] PetscCommDuplicate(): returning tag 2147483628 > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 > unneeded,606900000 used > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 > [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [3] PetscCommDuplicate(): returning tag 2147483628 > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 > unneeded,606900000 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [1] PetscCommDuplicate(): returning tag 2147483628 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 > unneeded,606900000 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 > > Writing data in binary format to /scratch/lustreA/v/vishy/biclass/ocr.train.x > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): returning tag 2147483628 > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) > > <libsvm-to-binary.cpp>
