Barry, > It has not assembled the matrix in an hour. It is working all night > to assemble the matrix, the problem is that you are not > preallocating the nonzeros per row with MatMPIAIJSetPreallocation() > when pre allocation is correct it will always print 0 for Number of > mallocs. The actual writing of the parallel matrix to the binary > file will take at most minutes.
You were absolutely right! I had not set the preallocation properly and hence the code was painfully slow. I fixed that issue (see attached code) and now it runs much faster. However, I am having a different problem now. When I run the code for smaller matrices (less than a million rows) everything works well. However, when working with large matrices (e.g. 2.8 million rows x 1157 columns) writing the matrix to file dies with the following message: Fatal error in MPI_Recv: Other MPI error Any hints on how to solve this problem or are deeply appreciated. vishy The output of running the code with the -info flag is as follows: [0] PetscInitialize(): PETSc successfully started: number of processors = 4 [0] PetscInitialize(): Running on machine: rossmann-b001.rcac.purdue.edu [3] PetscInitialize(): PETSc successfully started: number of processors = 4 [3] PetscInitialize(): Running on machine: rossmann-b004.rcac.purdue.edu No libsvm test file specified! Reading libsvm train file at /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt [2] PetscInitialize(): PETSc successfully started: number of processors = 4 [2] PetscInitialize(): Running on machine: rossmann-b003.rcac.purdue.edu [3] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt [2] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt [0] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt [1] PetscInitialize(): PETSc successfully started: number of processors = 4 [1] PetscInitialize(): Running on machine: rossmann-b002.rcac.purdue.edu [1] PetscFOpen(): Opening file /scratch/lustreA/v/vishy/LibSVM/biclass/ocr/ocr.train.txt m=100000 m=200000 m=300000 m=400000 m=500000 m=600000 m=700000 m=800000 m=900000 m=1000000 m=1100000 m=1200000 m=1300000 m=1400000 m=1500000 m=1600000 m=1700000 m=1800000 m=1900000 m=2000000 m=2100000 m=2200000 m=2300000 m=2400000 m=2500000 m=2600000 m=2700000 m=2800000 user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [0] PetscCommDuplicate(): returning tag 2147483647 user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 user.dim=1157 user.m=2800000 user.maxnnz=1156 user.maxlen=32768 user.flg=1 [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [2] PetscCommDuplicate(): returning tag 2147483647 [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688 [2] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 [2] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784 [2] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [3] PetscCommDuplicate(): returning tag 2147483647 [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688 [3] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 [3] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784 [3] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [0] PetscCommDuplicate(): returning tag 2147483647 [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [1] PetscCommDuplicate(): returning tag 2147483647 [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm 1140850688 [1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784 [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user MPI_Comm -2080374784 [1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784 [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [1] PetscCommDuplicate(): returning tag 2147483647 [2] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [2] PetscCommDuplicate(): returning tag 2147483647 [3] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 2147483647 [3] PetscCommDuplicate(): returning tag 2147483647 [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [3] PetscCommDuplicate(): returning tag 2147483642 [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [2] PetscCommDuplicate(): returning tag 2147483642 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [1] PetscCommDuplicate(): returning tag 2147483642 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): returning tag 2147483642 [0] MatSetUpPreallocation(): Warning not preallocating matrix storage [0] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647 [0] PetscCommDuplicate(): returning tag 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [0] PetscCommDuplicate(): returning tag 2147483646 [2] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647 [2] PetscCommDuplicate(): returning tag 2147483647 [1] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647 [1] PetscCommDuplicate(): returning tag 2147483647 [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [2] PetscCommDuplicate(): returning tag 2147483646 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [1] PetscCommDuplicate(): returning tag 2147483646 [3] PetscCommDuplicate(): Duplicating a communicator 1140850689 -2080374783 max tags = 2147483647 [3] PetscCommDuplicate(): returning tag 2147483647 [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [3] PetscCommDuplicate(): returning tag 2147483646 [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. [0] MatStashScatterBegin_Private(): No of messages: 0 [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 290; storage space: 0 unneeded,202300000 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 289; storage space: 0 unneeded,202300000 used [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 289 [1] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines [3] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines [2] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines [0] Mat_CheckInode(): Found 140000 nodes of 700000. Limit used: 5. Using Inode routines [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [3] PetscCommDuplicate(): returning tag 2147483645 [3] PetscCommDuplicate(): returning tag 2147483638 [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [2] PetscCommDuplicate(): returning tag 2147483645 [2] PetscCommDuplicate(): returning tag 2147483638 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [0] PetscCommDuplicate(): returning tag 2147483645 [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter [0] PetscCommDuplicate(): returning tag 2147483638 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [1] PetscCommDuplicate(): returning tag 2147483645 [1] PetscCommDuplicate(): returning tag 2147483638 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [0] PetscCommDuplicate(): returning tag 2147483644 [0] PetscCommDuplicate(): returning tag 2147483637 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [1] PetscCommDuplicate(): returning tag 2147483644 [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [3] PetscCommDuplicate(): returning tag 2147483644 [3] PetscCommDuplicate(): returning tag 2147483637 [1] PetscCommDuplicate(): returning tag 2147483637 [0] PetscCommDuplicate(): returning tag 2147483632 [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783 [2] PetscCommDuplicate(): returning tag 2147483644 [2] PetscCommDuplicate(): returning tag 2147483637 [1] PetscCommDuplicate(): returning tag 2147483632 [2] PetscCommDuplicate(): returning tag 2147483632 [3] PetscCommDuplicate(): returning tag 2147483632 [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] VecScatterCreate(): General case: MPI to Seq [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 [2] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [2] PetscCommDuplicate(): returning tag 2147483628 [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 [3] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [3] PetscCommDuplicate(): returning tag 2147483628 [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [1] PetscCommDuplicate(): returning tag 2147483628 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 700000 X 867; storage space: 0 unneeded,606900000 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 867 Writing data in binary format to /scratch/lustreA/v/vishy/biclass/ocr.train.x [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): returning tag 2147483628 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) -------------- next part -------------- A non-text attachment was scrubbed... Name: libsvm-to-binary.cpp Type: application/octet-stream Size: 15449 bytes Desc: not available URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110423/a4769bfd/attachment-0001.obj>
