Thanks Barry for the reply. When I use the option -log_view, I got the Petsc API function profiling posted as below. But I wonder is there any way for me to get the performance/execution profiling of the functions in the code, like in Xcode profiling, so that I would know which part needs special optimization? I now badly needs your instructions. Thank you very much in advance!
---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./pStreamer on a arch-darwin-c-opt named rcc-dhcp-249-233.sr.unh.edu with 4 processors, by GPL Wed Nov 23 18:49:59 2016 Using Petsc Release Version 3.5.4, May, 23, 2015 Max Max/Min Avg Total Time (sec): 1.026e+02 1.00004 1.026e+02 Objects: 3.420e+02 1.00000 3.420e+02 Flops: 2.509e+10 1.00200 2.506e+10 1.002e+11 Flops/sec: 2.445e+08 1.00202 2.441e+08 9.764e+08 MPI Messages: 1.387e+04 1.99971 1.041e+04 4.163e+04 MPI Message Lengths: 2.783e+07 1.99971 2.006e+03 8.350e+07 MPI Reductions: 1.425e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0264e+02 100.0% 1.0022e+11 100.0% 4.163e+04 100.0% 2.006e+03 100.0% 1.425e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 6778 1.0 5.0633e+00 1.1 1.70e+09 1.0 0.0e+00 0.0e+00 6.8e+03 5 7 0 0 48 5 7 0 0 48 1345 VecDotNorm2 3389 1.0 3.1751e+00 1.3 1.70e+09 1.0 0.0e+00 0.0e+00 3.4e+03 3 7 0 0 24 3 7 0 0 24 2144 VecNorm 3571 1.0 2.1366e+00 1.1 8.98e+08 1.0 0.0e+00 0.0e+00 3.6e+03 2 4 0 0 25 2 4 0 0 25 1679 VecScale 49 1.0 1.2606e-02 1.2 6.16e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1952 VecCopy 157 1.0 4.5638e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 7140 1.0 5.1525e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 146 1.0 7.0036e-02 1.2 3.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2266 VecAXPBYCZ 6778 1.0 4.9426e+00 1.2 3.41e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 14 0 0 0 4 14 0 0 0 2755 VecWAXPY 6785 1.0 3.9687e+00 1.0 1.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 7 0 0 0 4 7 0 0 0 1716 VecScatterBegin 6918 1.0 3.4263e-01 1.7 0.00e+00 0.0 4.2e+04 2.0e+03 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 6918 1.0 8.6652e+0029.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatMult 6869 1.0 2.6045e+01 1.5 7.76e+09 1.0 4.1e+04 2.0e+03 0.0e+00 21 31 99 99 0 21 31 99 99 0 1190 MatSolve 6960 1.0 1.7961e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 17 31 0 0 0 17 31 0 0 0 1747 MatLUFactorNum 7 1.0 5.3501e-02 1.1 9.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 719 MatILUFactorSym 7 1.0 5.1423e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 14 1.0 3.7722e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.8e+01 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 14 1.0 6.4600e-02 1.0 0.00e+00 0.0 8.4e+01 5.0e+02 5.6e+01 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 7 1.0 3.4150e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 7 1.0 5.8947e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 14 1.0 3.9437e+00132.6 0.00e+00 0.0 0.0e+00 0.0e+00 8.4e+01 2 0 0 0 1 2 0 0 0 1 0 KSPSolve 91 1.0 6.6894e+01 1.2 2.51e+10 1.0 4.1e+04 2.0e+03 1.4e+04 59100 99 99 98 59100 99 99 98 1497 PCSetUp 14 1.0 1.1119e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 346 PCSetUpOnBlocks 91 1.0 1.1087e-01 1.1 9.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 347 PCApply 6960 1.0 1.8604e+01 1.0 7.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 31 0 0 0 18 31 0 0 0 1687 TSStep 6 1.0 7.0114e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03 1.1e+04 68 80 80 80 78 68 80 80 80 78 1146 TSFunctionEval 6 1.0 7.0102e+01 1.0 2.01e+10 1.0 3.3e+04 2.0e+03 1.1e+04 68 80 80 80 78 68 80 80 80 78 1146 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 212 207 180438384 0 Vector Scatter 9 8 1014488 0 Matrix 28 28 147732508 0 Distributed Mesh 4 3 13776 0 Star Forest Bipartite Graph 8 6 4800 0 Discrete System 4 3 2376 0 Index Set 39 39 8087676 0 IS L to G Mapping 2 1 504596 0 Krylov Solver 15 15 17456 0 Preconditioner 15 15 14208 0 TSAdapt 1 1 1208 0 TS 1 1 1168 0 DMTS 1 1 712 0 SNES 1 1 1336 0 DMSNES 1 1 672 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 3.82e-08 Average time for MPI_Barrier(): 6.2797e-05 Average time for zero size MPI_Send(): 1.98967e-05 #PETSc Option Table entries: -log_view #End of PETSc Option Table entries On Wed, Oct 19, 2016 at 8:33 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > To start you should just run the code with the ./configure option > --with-debugging=0 with the option -log_view this will give you the high > view of where it is spending the time. Feel free to email back the output. > From that you can focus on what parts are both taking a lot of time AND > running slowly and that gives a good idea of what needs to be optimized. > > Barry > > > On Oct 19, 2016, at 3:15 PM, Sharp Stone <thron...@gmail.com> wrote: > > > > Dear all, > > > > Now I'm using a Petsc code which needs to be optimized. But after > trying, I still don't know how to get the profiling for each of the > function for each process. I mean, for each process, how should I know the > execution time for each function? > > > > Thanks! > > > > -- > > Best regards, > > > > Feng > > -- Best regards, Feng