profiling PETSc code

Matthew Knepley Tue, 1 Aug 2006 17:36:55 -0500

On 8/1/06, Matt Funk <mafunk at nmsu.edu> wrote:
> Hi,
>
> well, now i do get  summary:
> ...
>      #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was run without the PreLoadBegin()         #
>       #   macros. To get timing results we always recommend    #
>       #   preloading. otherwise timing numbers may be          #
>       #   meaningless.                                         #
>       ##########################################################
>
>
> Event                Count      Time (sec)     Flops/sec
> --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecNorm              200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 0.0e+00
> 1.0e+02  0 36  0  0  7   0 36  0  0 31   207
> VecCopy              200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAYPX              100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 18  0  0  0   0 18  0  0  0   182
> MatMult              100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  9  0  0  0   1  9  0  0  0    35
> MatSolve             200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  2 18  0  0  0   2 18  0  0  0    23
> MatSolveTranspos     100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  9  0  0  0   2  9  0  0  0    14
> MatLUFactorNum       100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  9  0  0  0   3  9  0  0  0     6
> MatILUFactorSym        1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  1     0
> MatAssemblyBegin       1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatGetOrdering         1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  1     0
> KSPSetup             100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+01  0  0  0  0  1   0  0  0  0  4     0
> KSPSolve             100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 0.0e+00
> 3.0e+02  7 91  0  0 21   7 91  0  0 93    32
> PCSetUp              100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 0.0e+00
> 4.0e+00  4  9  0  0  0   4  9  0  0  1     6
> PCApply              300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  4 27  0  0  0   4 27  0  0  0    18
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>            Index Set     3              3      35976     0
>                  Vec   109            109    2458360     0
>               Matrix     2              2      23304     0
>        Krylov Solver     1              1          0     0
>       Preconditioner     1              1        168     0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
>
> ...
>
> am i using the push and pop calls in an manner they are not intended to be
> used?

Not exactly. You need to register a stage first before pushing it.

http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html

> plus, how can i see what's going on with respect to why it takes so much
> longer to solve the system in parallel than in serial without being able to
> specify the stages (i.e single out the KSPSolve call)?

There are 100 calls to KSPSolve() which collectively take .1s. Your
problem is most
likely in matrix setup. I would bet that you have not preallocated the
space correctly.
Therefore, a malloc() is called every time you insert a value. You can
check the number of mallocs using -info.

   Matt

> mat
>
>
>
>
>
> On Tuesday 01 August 2006 15:57, Matthew Knepley wrote:
> > ke out your stage push/pop for the moment, and the log_summary
> > call. Just run with -log_summary and send the output as a test.
> >
> > Thanks,
> >
> > Matt
>
>

-- 
"Failure has a thousand explanations. Success doesn't need one" -- Sir
Alec Guiness

profiling PETSc code

Reply via email to