> On Jan 14, 2016, at 2:24 PM, Barry Smith <[email protected]> wrote: > > > Matt is right, there is a lot of "missing" time from the output. Please > send the output from -ksp_view so we can see exactly what solver is being > used. > > From the output we have: > > Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC > is taking about 22% of the time) > Linear solver 77 % of the time (this is reasonable pretty much the entire > cost of the nonlinear solve is the linear solve) > Time to set up the preconditioner is 19% (10 + 9) > Time of iteration in KSP 35 % (this is the sum of the vector operations > and MatMult() and MatSolve()) > > So 77 - (19 + 35) = 23 % unexplained time inside the linear solver > (custom preconditioner???) > > Also getting the results with Instruments or HPCToolkit would be useful > (so long as we don't need to install HPCTool ourselves to see the results).
Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. -- Boyce > > > Barry > >> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <[email protected]> >> wrote: >> >> >> >>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <[email protected]> >>> wrote: >>> >>> I see one hot spot: >> >> >> Here is with opt build >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 >> 02:24:43 2016 >> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: >> 2016-01-13 21:30:26 -0600 >> >> Max Max/Min Avg Total >> Time (sec): 1.018e+00 1.00000 1.018e+00 >> Objects: 2.935e+03 1.00000 2.935e+03 >> Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 >> Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 >> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> >> 2N flops >> and VecAXPY() for complex vectors of length N --> >> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- >> -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts %Total >> Avg %Total counts %Total >> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% >> 0.000e+00 0.0% 0.000e+00 0.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on interpreting >> output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this phase >> %M - percent messages in this phase %L - percent message lengths in >> this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over >> all processors) >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flops >> --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 >> VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 >> VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 >> VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 >> VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 >> VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 >> VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 >> VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 >> VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 >> VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 >> VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 >> MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 >> MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >> MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 >> MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 >> MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 >> MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 >> MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >> MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 >> PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 >> PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 >> PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 >> SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 >> SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 >> SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Vector 971 839 15573352 0. >> Vector Scatter 290 289 189584 0. >> Index Set 1171 823 951928 0. >> IS L to G Mapping 110 109 2156656 0. >> Application Order 6 6 99952 0. >> MatMFFD 1 1 776 0. >> Matrix 189 189 24083332 0. >> Matrix Null Space 4 4 2432 0. >> Krylov Solver 90 90 122720 0. >> DMKSP interface 1 1 648 0. >> Preconditioner 90 90 89872 0. >> SNES 1 1 1328 0. >> SNESLineSearch 1 1 984 0. >> DMSNES 1 1 664 0. >> Distributed Mesh 2 2 9168 0. >> Star Forest Bipartite Graph 4 4 3168 0. >> Discrete System 2 2 1712 0. >> Viewer 1 0 0 0. >> ======================================================================================================================== >> Average time to get PetscTime(): 9.53674e-07 >> #PETSc Option Table entries: >> -ib_ksp_converged_reason >> -ib_ksp_monitor_true_residual >> -ib_snes_type ksponly >> -log_summary >> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal >> -stokes_ib_pc_level_0_sub_pc_type ilu >> -stokes_ib_pc_level_ksp_richardson_self_scale >> -stokes_ib_pc_level_ksp_type richardson >> -stokes_ib_pc_level_pc_asm_local_type additive >> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >> -stokes_ib_pc_level_sub_pc_type lu >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 >> --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 >> --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 >> --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >> ----------------------------------------- >> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta >> Machine characteristics: >> Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty >> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc >> Using PETSc arch: linux-opt >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing >> -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} >> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable >> -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} >> ----------------------------------------- >> >> Using include paths: >> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include >> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include >> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include >> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include >> -I/not_backed_up/softwares/MPICH/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using Fortran linker: mpif90 >> Using libraries: >> -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >> -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >> -lpetsc >> -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >> -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >> -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib >> -L/not_backed_up/softwares/MPICH/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 >> -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >> -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread >> -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort >> -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ >> -Wl,-rpath,/not_backed_up/softwares/MPICH/lib >> -L/not_backed_up/softwares/MPICH/lib >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 >> -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >> -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu >> -L/usr/lib/x86_64-linux-gnu -ldl >> -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl >> -----------------------------------------
