Ok, I only see one all to KSPSolve. On Sat, Jul 13, 2019 at 2:08 PM Mohammed Mostafa <mo7ammedmost...@gmail.com> wrote:
> This log is for 100 time-steps, not a single time step > > > On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfad...@lbl.gov> wrote: > >> You call the assembly stuff a lot (200). BuildTwoSidedF is a global thing >> and is taking a lot of time. You should just call these once per time step >> (it looks like you are just doing one time step). >> >> >> --- Event Stage 1: Matrix Construction >> >> BuildTwoSidedF 400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 5 0 0 0 0 0 >> VecSet 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyBegin 200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 5 0 0 0 0 0 >> VecAssemblyEnd 200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 >> 0.0e+00 0 0 79 2 0 0 0 99100 0 0 >> VecScatterEnd 200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyBegin 200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 >> 8.0e+00 4 0 1 0 6 9 0 1 0100 0 >> AssembleMats 200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 >> 8.0e+00 6 0 79 2 6 14 0100100100 0 >> myMatSetValues 200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 11 0 0 0 0 25 0 0 0 0 0 >> setNativeMat 100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 12 0 0 0 0 28 0 0 0 0 0 >> setNativeMatII 100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 14 0 0 0 0 31 0 0 0 0 0 >> callScheme 100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 >> >> >> >> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users < >> petsc-users@mcs.anl.gov> wrote: >> >>> Hello Matt, >>> Attached is the dumped entire log output using -log_view and -info. >>> >>> Thanks, >>> Kamra >>> >>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knep...@gmail.com> >>> wrote: >>> >>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users < >>>> petsc-users@mcs.anl.gov> wrote: >>>> >>>>> Hello all, >>>>> I have a few question regarding Petsc, >>>>> >>>> >>>> Please send the entire output of a run with all the logging turned on, >>>> using -log_view and -info. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Question 1: >>>>> For the profiling , is it possible to only show the user defined log >>>>> events in the breakdown of each stage in Log-view. >>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC, >>>>> PetscLogEventExcludeClass(MAT_CLASSID); >>>>> PetscLogEventExcludeClass(VEC_CLASSID); >>>>> PetscLogEventExcludeClass(KSP_CLASSID); >>>>> PetscLogEventExcludeClass(PC_CLASSID); >>>>> which should "Deactivates event logging for a PETSc object class in >>>>> every stage" according to the manual. >>>>> however I still see them in the stage breakdown >>>>> --- Event Stage 1: Matrix Construction >>>>> >>>>> BuildTwoSidedF 4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0 >>>>> VecSet 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecAssemblyBegin 2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0 >>>>> VecAssemblyEnd 2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecScatterBegin 2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 >>>>> 2.1e+03 0.0e+00 0 0 3 0 0 0 0 50 80 0 0 >>>>> VecScatterEnd 2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatAssemblyBegin 2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatAssemblyEnd 2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 >>>>> 5.3e+02 8.0e+00 0 0 3 0 6 10 0 50 20100 0 >>>>> AssembleMats 2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 >>>>> 1.3e+03 8.0e+00 0 0 7 0 6 28 0100100100 0 # USER EVENT >>>>> myMatSetValues 2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 19 0 0 0 0 0 # USER EVENT >>>>> setNativeMat 1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 24 0 0 0 0 0 # USER EVENT >>>>> setNativeMatII 1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 28 0 0 0 0 0 # USER EVENT >>>>> callScheme 1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 # USER EVENT >>>>> >>>>> Also is possible to clear the logs so that I can write a separate >>>>> profiling output file for each timestep ( since I am solving a transient >>>>> problem and I want to know the change in performance as time goes by ) >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Question 2: >>>>> Regarding MatSetValues >>>>> Right now, I writing a finite volume code, due to algorithm >>>>> requirement I have to write the matrix into local native format ( array of >>>>> arrays) and then loop through rows and use MatSetValues to set the >>>>> elements >>>>> in "Mat A" >>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES); >>>>> but it is very slow and it is killing my performance >>>>> although the matrix was properly set using >>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size, >>>>> PETSC_DETERMINE, >>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A); >>>>> with d_nnz,and o_nnz properly assigned so no mallocs occur during >>>>> matsetvalues and all inserted values are local so no off-processor values >>>>> So my question is it possible to set multiple rows at once hopefully >>>>> all, I checked the manual and MatSetValues can only set dense matrix block >>>>> because it seems that row by row is expensive >>>>> Or perhaps is it possible to copy all rows to the underlying matrix >>>>> data, as I mentioned all values are local and no off-processor values ( >>>>> stash is 0 ) >>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs. >>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 >>>>> mallocs. >>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage >>>>> space: 0 unneeded,743028 used >>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>>> space: 0 unneeded,742972 used >>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines. >>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage >>>>> space: 0 unneeded,743093 used >>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>>> space: 0 unneeded,743036 used >>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines. >>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>>> space: 0 unneeded,742938 used >>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage >>>>> space: 0 unneeded,743049 used >>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines. >>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: >>>>> 0 unneeded,685 used >>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: >>>>> 0 unneeded,649 used >>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines. >>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines. >>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage >>>>> space: 0 unneeded,1011 used >>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage >>>>> space: 0 unneeded,1137 used >>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: >>>>> 0 unneeded,658 used >>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: >>>>> 0 unneeded,648 used >>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines. >>>>> >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Question 3: >>>>> If all matrix and vector inserted data are local, what part of the >>>>> vec/mat assembly consumes time because matsetvalues and matassembly >>>>> consume >>>>> more time than matrix builder >>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY >>>>> >>>>> >>>>> For context the matrix in the above is nearly 1Mx1M partitioned over >>>>> six processes and it was NOT built using DM >>>>> >>>>> Finally the configure options are: >>>>> >>>>> Configure options: >>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native >>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" >>>>> FOPTFLAGS="-O3 >>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx >>>>> --with-fc=mpif90 --download-metis --download-hypre >>>>> >>>>> Sorry for such long question and thanks in advance >>>>> Thanks >>>>> M. Kamra >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>>