http://www.mcs.anl.gov/petsc/documentation/faq.html#efficient-assembly



> On Jun 14, 2017, at 3:18 PM, Jed Brown <j...@jedbrown.org> wrote:
> 
> "Kannan, Ramakrishnan" <kann...@ornl.gov> writes:
> 
>> I am running NHEP across 16 MPI processors over 16 nodes in a matrix of 
>> global size of 1,000,000x1,000,000 with approximately global 16,000,000 
>> non-zeros. Each node has the 1D  row distribution of the matrix with exactly 
>> 62500 rows and 1 million columns with 1million non-zeros as CSR/COO matrix.
>> 
>> I am generating this graph as follows. It takes approximately 12 seconds to 
>> insert 25000 NNZ into petsc matrix with MatSetValues which means it is 
>> taking closer to 10 minutes to 1million NNZ’s in every processes. It takes 
>> 12 seconds for assembly. Is these times normal? Is there a faster way of 
>> doing it? I am unable to construct matrices of 1 billion global nnz’s in 
>> which each process has closer to 100 million entries.
>> 
>>  Generate_petsc_matrix(int n_rows, int n_cols, int n_nnz,
>>                PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val,
>>                const MPICommunicator& communicator) {
>>    int *start_row = new int[communicator.size()];
>>    MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, 
>> MPI_COMM_WORLD);
>>    start_row[0] = 0;
>>    for (int i = 0; i < communicator.size(); i++) {
>>      if (i > 0) {
>>        start_row[i] = start_row[i - 1] + all_proc_rows[i];
>>      }
>>    }
>>    MatCreate(PETSC_COMM_WORLD, &A);
>>    MatSetType(A, MATMPIAIJ);
>>    MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, n_cols);
>>    MatMPIAIJSetPreallocation(A, PETSC_DEFAULT, PETSC_NULL, PETSC_DEFAULT, 
>> PETSC_NULL);
> 
> This preallocation is not sufficient.  Either put in the maximum number
> of entries any any row or provide the arrays.  This will make your
> matrix assembly orders of magnitude faster.
> 
>>    PetscInt local_row_idx;
>>    PetscInt local_col_idx;
>>    PetscScalar local_val;
>>    int my_start_row = start_row[MPI_RANK];
>>    int my_start_col = 0;
>> double petsc_insert_time=0.0;
>>    for (int i = 0; i < n_nnz; i++) {
>>      local_row_idx = my_start_row + row_idx[i];
>>      local_col_idx = my_start_col + col_idx[i];
>>      local_val = val[i];
>> tic();
>>      ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, 
>> &local_val, INSERT_VALUES);
>>      petsc_insert_time += toc();
>>      if (i % 25000 == 0){
>>        PRINTROOT("25000 time::" << petsc_insert_time);
>>        petsc_insert_time=0;
>>      }
>>      CHKERRV(ierr);
>>    }
>>    PRINTROOT("100000 time::" << petsc_insert_time);
>>    tic();
>>    MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>>    MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>    petsc_insert_time = toc();
>>    PRINTROOT("calling assembly to end::took::" << petsc_insert_time);
>> }
>> --
>> Regards,
>> Ramki

Reply via email to