On Jan 14, 2016, at 8:44 AM, Matthew Knepley <[email protected]<mailto:[email protected]>> wrote:
On Thu, Jan 14, 2016 at 7:37 AM, Dave May <[email protected]<mailto:[email protected]>> wrote: On 14 January 2016 at 14:24, Matthew Knepley <[email protected]<mailto:[email protected]>> wrote: On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <[email protected]<mailto:[email protected]>> wrote: On Jan 13, 2016, at 6:22 PM, Matthew Knepley <[email protected]<mailto:[email protected]>> wrote: Can you mail us a -log_summary for a rough cut? Sometimes its hard to interpret the data avalanche from one of those tools without a simple map. Does this indicate some hot spots? 1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae You can try to avoid this by giving -options_left 2) Are you using any custom code during the solve? There is a gaping whole in the timing. It take 9s to do PCApply(), but something like a collective 1s to do everything we time under that. You are looking at the timing from a debug build. The results from the optimized build don't have such a gaping hole. It still looks like 50% of the runtime to me. Amneet, on OS X, I would echo Barry and suggest starting out using the timer profiler instrument (accessible through the Instruments app). -- Boyce Matt Since this is serial, we can use something like kcachegrind to look at performance as well, which should at least tell us what is sucking up this time so we can put a PETSc even on it. Thanks, Matt ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net<http://amneets-mbp.attlocal.net/> with 1 processor, by Taylor Wed Jan 13 21:07:43 2016 Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: 2015-11-16 13:07:08 -0600 Max Max/Min Avg Total Time (sec): 1.039e+01 1.00000 1.039e+01 Objects: 2.834e+03 1.00000 2.834e+03 Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 Memory: 3.949e+07 1.00000 3.949e+07 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 870 762 13314200 0. Vector Scatter 290 289 189584 0. Index Set 1171 823 951096 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 24202324 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 190080 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89128 0. SNES 1 1 1328 0. SNESLineSearch 1 1 856 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9024 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1696 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.74e-08 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_ksp_richardson_self_scae -stokes_ib_pc_level_ksp_type gmres -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_pc_asm_type interpolate -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ ----------------------------------------- Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu<http://d209.math.ucdavis.edu/> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit Using PETSc directory: /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc Using PETSc arch: darwin-dbg ----------------------------------------- Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lpetsc -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl -lcrypto -lmpifort -lgfortran -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -ldl ----------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
