Re: [petsc-users] KSP changes for successive solver

Michele Rosso Fri, 24 Jul 2015 11:39:49 -0700

Hi Mark and Barry,

I am sorry for my late reply: it was a busy week!
I run a test case for a larger problem with  as many levels (i.e. 5) of
MG I could and  GAMG as PC at the coarse level. I attached the output of
info ( after grep for "gmag"),  ksp_view and log_summary.
The solve takes about 2 seconds on 8192 cores, which is way too much.
The number of iterations to convergence is 24.
I hope there is a way to speed it up.


Thanks,
Michele


On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> 
> 
> 
> 
> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> 
>         Barry,
>         
>         thank you very much for the detailed answer.  I tried what you
>         suggested and it works.
>         So far I tried on a small system but the final goal is to use
>         it for very large runs.  How does  PCGAMG compares to PCMG  as
>         far as performances and scalability are concerned?
>         Also, could you help me to tune the GAMG part ( my current
>         setup is in the attached ksp_view.txt file )? 
>         
> 
> 
> I am going to add this to the document today but you can run with
> -info.  This is very noisy so you might want to do the next step at
> run time.  Then grep on GAMG.  This will be about 20 lines.  Send that
> to us and we can go from there.
> 
> 
> Mark
> 
> 
>  
>         
>         I also tried to use superlu_dist for the LU decomposition on
>         mg_coarse_mg_sub_
>         -mg_coarse_mg_coarse_sub_pc_type lu
>         -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package
>         superlu_dist
>         
>         but I got an error:
>         
>         ****** Error in MC64A/AD. INFO(1) = -2 
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         
>         
>         Thank you,
>         Michele
>         
>         
>         On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote: 
>         
>         > 
>         > > On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
>         > > 
>         > > Barry,
>         > > 
>         > > thanks for your reply. So if I want it fixed, I will have to use 
> the master branch, correct?
>         > 
>         >   Yes, or edit mg.c and remove the offending lines of code (easy 
> enough). 
>         > > 
>         > > On a side note, what I am trying to achieve is to be able to use 
> how many levels of MG I want, despite the limitation imposed by the local 
> number of grid nodes.
>         > 
>         >    I assume you are talking about with DMDA? There is no generic 
> limitation for PETSc's multigrid, it is only with the way the DMDA code 
> figures out the interpolation that causes a restriction.
>         > 
>         > > So far I am using a borrowed code that implements a PC that 
> creates a sub communicator and perform MG on it.
>         > > While reading the documentation I found out that PCMGSetLevels 
> takes in an optional array of communicators. How does this work?
>         > 
>         >    It doesn't work. It was an idea that never got pursued.
>         > 
>         > > Can I can simply define my matrix and rhs on the fine grid as I 
> would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP 
> would take care of it by using the correct communicator for each level?
>         > 
>         >    No.
>         > 
>         >    You can use the PCMG geometric multigrid with DMDA for as many 
> levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG 
> automatically uses fewer processes for the coarse level matrices and vectors. 
> You could do this all from the command line without writing code. 
>         > 
>         >    For example if your code uses a DMDA and calls KSPSetDM() use 
> for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg  
> -ksp_view 
>         > 
>         > 
>         > 
>         >   Barry
>         > 
>         > 
>         > > 
>         > > Thanks,
>         > > Michele
>         > > 
>         > > 
>         > > 
>         > > 
>         > > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
>         > >>    Michel,
>         > >> 
>         > >>     This is a very annoying feature that has been fixed in 
> master 
>         > >> http://www.mcs.anl.gov/petsc/developers/index.html
>         > >>   I would like to have changed it in maint but Jed would have a 
> shit-fit :-) since it changes behavior.
>         > >> 
>         > >>   Barry
>         > >> 
>         > >> 
>         > >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> 
> wrote:
>         > >> > 
>         > >> > Hi,
>         > >> > 
>         > >> > I am performing a series of solves inside a loop. The matrix 
> for each solve changes but not enough to justify a rebuilt of the PC at each 
> solve.
>         > >> > Therefore I am using  KSPSetReusePreconditioner to avoid 
> rebuilding unless necessary. The solver is CG + MG with a custom  PC at the 
> coarse level.
>         > >> > If KSP is not updated each time, everything works as it is 
> supposed to. 
>         > >> > When instead I allow the default PETSc  behavior, i.e. 
> updating PC every time the matrix changes, the coarse level KSP , initially 
> set to PREONLY, is changed into GMRES 
>         > >> > after the first solve. I am not sure where the problem lies 
> (my PC or PETSc), so I would like to have your opinion on this.
>         > >> > I attached the ksp_view for the 2 successive solve and the 
> options stack.
>         > >> > 
>         > >> > Thanks for your help,
>         > >> > Michel
>         > >> > 
>         > >> > 
>         > >> > 
>         > >> > <ksp_view.txt><petsc_options.txt>
>         > >> 
>         > >> 
>         > >> 
>         > > 
>         > 
>         
>         
>         
> 
> 
>

[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row 
(ave)=7, np=8192
[0] PCGAMGFilterGraph():         100% nnz after filtering, with threshold 0, 4 
nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00 
min=1.105137e-01 PC=jacobi
[135] MatAssemblyEnd_SeqA[0] PCGAMGCreateLevel_GAMG(): Number of equations 
(loc) 0 with simple aggregation
[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row (ave)=27, 16 active pes
[0] PCGAMGFilterGraph():         100% nnz after filtering, with threshold 0, 
20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00 
min=1.226917e-03 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple 
aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[94] PetscCommDuplicate(): Using[0] PCGAMGFilterGraph():         100% nnz after 
filtering, with threshold 0, 55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00 
min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple 
aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
      type: gamg
          GAMG specific options
[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row 
(ave)=7, np=8192
[125] MatAssemb[0] PCGAMGFilterGraph():          100% nnz after filtering, with 
threshold 0, 4 nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00 
min=1.105137e-01 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple 
aggregation
[268] MatAssemblyEn[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row 
(ave)=27, 16 active pes
[0] PCGAMGFilterGraph():         100% nnz after filtering, with threshold 0, 
20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00 
min=1.226917e-03 PC=jacobi
[233] Pe[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple 
aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[0] PCGAMGFilterGraph():         100% nnz after filtering, with threshold 0, 
55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00 
min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple 
aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
      type: gamg
          GAMG specific options
      type: gamg
    GAMG specific options
      type: gamg
          GAMG specific options

KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: gamg
        MG: type is MULTIPLICATIVE, levels=4 cycles=v
          Cycles per PCApply=1
          Using Galerkin computed coarse grid matrices
          GAMG specific options
            Threshold for dropping small values from graph 0
            AGG specific options
              Symmetric graph false
      Coarse grid solver -- level -------------------------------
        KSP Object:        (mg_coarse_mg_coarse_)         8192 MPI processes
          type: preonly
          maximum iterations=1, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_coarse_)         8192 MPI processes
          type: bjacobi
            block Jacobi: number of blocks = 8192
            Local solve is same for all blocks, in the following KSP and PC 
objects:
          KSP Object:          (mg_coarse_mg_coarse_sub_)           1 MPI 
processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_mg_coarse_sub_)           1 MPI 
processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
              matrix ordering: nd
              factor fill ratio given 5, needed 1
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=6, cols=6
                    package used to perform factorization: petsc
                    total: nonzeros=36, allocated nonzeros=36
                    total number of mallocs used during MatSetValues calls =0
                      using I-node routines: found 2 nodes, limit used is 5
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=6, cols=6
              total: nonzeros=36, allocated nonzeros=36
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 2 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=6, cols=6
            total: nonzeros=36, allocated nonzeros=36
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 2 nodes, limit used 
is 5
      Down solver (pre-smoother) on level 1 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_1_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.0995252, max = 1.09478
            Chebyshev: eigenvalues estimated using gmres with translations  [0 
0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_1_esteig_)             
8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_1_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=103, cols=103
            total: nonzeros=5717, allocated nonzeros=5717
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 2 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_2_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.15748, max = 1.73228
            Chebyshev: eigenvalues estimated using gmres with translations  [0 
0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_2_esteig_)             
8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_2_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=1005, cols=1005
            total: nonzeros=27137, allocated nonzeros=27137
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 3 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_3_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.191092, max = 2.10202
            Chebyshev: eigenvalues estimated using gmres with translations  [0 
0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_3_esteig_)             
8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_3_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=8192, cols=8192
            total: nonzeros=54784, allocated nonzeros=54784
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx 
named p���� with 8192 processors, by mrosso Fri Jul 24 13:09:23 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 
10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           1.130e+02      1.00023   1.130e+02
Objects:              1.587e+03      1.00253   1.583e+03
Flops:                8.042e+07      1.28093   6.371e+07  5.219e+11
Flops/sec:            7.115e+05      1.28065   5.639e+05  4.619e+09
MPI Messages:         1.267e+05     13.76755   1.879e+04  1.539e+08
MPI Message Lengths:  8.176e+06      2.12933   3.881e+02  5.972e+10
MPI Reductions:       2.493e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 1.1300e+02 100.0%  5.2195e+11 100.0%  1.539e+08 100.0%  
3.881e+02      100.0%  2.492e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot              120 1.0 2.4560e-01 1.6 7.24e+04329.0 0.0e+00 0.0e+00 
1.2e+02  0  0  0  0  5   0  0  0  0  5     9
VecTDot              194 1.0 2.6155e-01 1.3 1.59e+06 1.0 0.0e+00 0.0e+00 
1.9e+02  0  2  0  0  8   0  2  0  0  8 49771
VecNorm              236 1.0 4.8733e-01 1.4 8.67e+05 1.0 0.0e+00 0.0e+00 
2.4e+02  0  1  0  0  9   0  1  0  0  9 14323
VecScale            1009 1.0 1.1008e-03 1.5 1.63e+05 1.8 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 1156928
VecCopy              405 1.0 3.2604e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              2648 1.0 9.1252e-03 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              594 1.0 1.5715e-02 3.8 4.77e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0 2485378
VecAYPX             3103 1.0 8.7631e-03 2.4 2.58e+06 1.1 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0 2263359
VecAXPBYCZ          1164 1.0 3.5439e-0313.2 3.22e+05166.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  5091
VecMAXPY             132 1.0 3.7217e-04 6.8 8.63e+04166.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 12994
VecAssemblyBegin      36 1.0 3.7324e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
9.0e+01  0  0  0  0  4   0  0  0  0  4     0
VecAssemblyEnd        36 1.0 1.5278e-0364.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      66 1.0 2.8777e-0426.8 3.65e+03166.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   711
VecScatterBegin     4914 1.0 3.9190e-0128.1 0.00e+00 0.0 1.1e+08 5.3e+02 
0.0e+00  0  0 72 99  0   0  0 72 99  0     0
VecScatterEnd       4914 1.0 8.6240e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
VecSetRandom           6 1.0 1.0859e-022168.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         132 1.0 3.4323e-01 1.5 2.19e+04166.0 0.0e+00 0.0e+00 
1.3e+02  0  0  0  0  5   0  0  0  0  5     4
MatMult             2645 1.0 3.6311e+0032.2 3.45e+07 1.3 6.5e+07 7.6e+02 
0.0e+00  1 42 42 83  0   1 42 42 83  0 60126
MatMultAdd           679 1.0 4.8583e+0031.6 1.08e+06 1.2 1.6e+06 1.4e+01 
0.0e+00  4  1  1  0  0   4  1  1  0  0  1532
MatMultTranspose     683 1.0 4.2303e+00667.0 1.09e+06 1.2 1.6e+06 1.4e+01 
0.0e+00  0  1  1  0  0   0  1  1  0  0  1778
MatSolve              97 0.0 2.9469e-04 0.0 6.40e+03 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0    22
MatSOR              2782 1.0 3.6662e+0035.7 3.29e+07 1.3 4.1e+07 2.2e+02 
0.0e+00  1 40 26 15  0   1 40 26 15  0 56576
MatLUFactorSym         2 1.0 1.5128e-02358.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 1.0 6.1989e-0513.0 2.58e+02 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     4
MatConvert             6 1.0 9.1314e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              18 1.0 1.3737e-02219.9 3.16e+041579.8 9.3e+04 8.6e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0    36
MatResidual          679 1.0 3.0610e-0110.0 7.51e+06 1.2 2.3e+07 5.5e+02 
0.0e+00  0 10 15 21  0   0 10 15 21  0 169592
MatAssemblyBegin     119 1.0 1.7048e+01 2.8 0.00e+00 0.0 4.3e+04 7.2e+00 
1.3e+02 10  0  0  0  5  10  0  0  0  5     0
MatAssemblyEnd       119 1.0 4.1777e+01 1.4 0.00e+00 0.0 1.7e+06 4.1e+01 
3.8e+02 31  0  1  0 15  31  0  1  0 15     0
MatGetRow           1328166.0 3.9291e-0454.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 0.0 1.5020e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrix       12 1.0 2.5183e+01 1.0 0.00e+00 0.0 7.6e+04 1.6e+01 
1.9e+02 22  0  0  0  8  22  0  0  0  8     0
MatGetOrdering         2 0.0 9.5510e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             6 1.0 1.5275e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00 
2.4e+02  1  0 25  0 10   1  0 25  0 10     0
MatView               60 1.2 1.9943e+0026.6 0.00e+00 0.0 0.0e+00 0.0e+00 
5.0e+01  1  0  0  0  2   1  0  0  0  2     0
MatAXPY                6 1.0 4.5854e+00326.5 0.00e+00 0.0 0.0e+00 0.0e+00 
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatMatMult             6 1.0 1.7226e+01 1.3 2.80e+041749.0 4.8e+05 5.9e+00 
9.6e+01 12  0  0  0  4  12  0  0  0  4     0
MatMatMultSym          6 1.0 1.3093e+01 1.0 0.00e+00 0.0 3.9e+05 5.3e+00 
8.4e+01 12  0  0  0  3  12  0  0  0  3     0
MatMatMultNum          6 1.0 4.1413e+0087.5 2.80e+041749.0 9.3e+04 8.6e+00 
1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatPtAP               14 1.0 2.3092e+01 1.2 3.94e+05 2.0 1.7e+06 2.4e+02 
1.8e+02 20  0  1  1  7  20  0  1  1  7    73
MatPtAPSymbolic       10 1.0 1.6246e+01 1.7 0.00e+00 0.0 1.0e+06 2.6e+02 
7.0e+01 12  0  1  0  3  12  0  1  0  3     0
MatPtAPNumeric        14 1.0 9.1005e+00 1.3 3.94e+05 2.0 7.2e+05 2.1e+02 
1.1e+02  8  0  0  0  4   8  0  0  0  4   185
MatTrnMatMult          2 1.0 5.6152e+00 1.0 3.64e+02 2.9 1.1e+06 1.2e+01 
3.8e+01  5  0  1  0  2   5  0  1  0  2     0
MatTrnMatMultSym       2 1.0 5.5943e+00 1.0 0.00e+00 0.0 1.0e+06 7.6e+00 
3.4e+01  5  0  1  0  1   5  0  1  0  1     0
MatTrnMatMultNum       2 1.0 2.8538e-02 4.6 3.64e+02 2.9 9.3e+04 5.4e+01 
4.0e+00  0  0  0  0  0   0  0  0  0  0    96
MatGetLocalMat        30 1.0 4.4808e+001435.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         26 1.0 4.5314e+00292.0 0.00e+00 0.0 1.4e+06 2.8e+02 
0.0e+00  0  0  1  1  0   0  0  1  1  0     0
MatGetSymTrans        20 1.0 3.3071e-0347.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             6 1.0 7.4315e-0443.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastBegin         252 1.0 1.3895e+0014.8 0.00e+00 0.0 3.8e+07 4.0e+00 
0.0e+00  1  0 25  0  0   1  0 25  0  0     0
SFBcastEnd           252 1.0 7.1653e-0218.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       120 1.0 2.4587e-01 1.6 1.45e+05220.3 0.0e+00 0.0e+00 
1.2e+02  0  0  0  0  5   0  0  0  0  5    26
KSPSetUp              36 1.0 2.2627e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
2.6e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 1.0763e+02 1.0 8.04e+07 1.3 1.5e+08 3.9e+02 
2.4e+03 95100100100 96  95100100100 96  4848
PCGAMGGraph_AGG        6 1.0 4.2442e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00 
7.2e+01  4  0  0  0  3   4  0  0  0  3     0
PCGAMGCoarse_AGG       6 1.0 7.2416e+00 1.0 3.64e+02 2.9 4.1e+07 4.4e+00 
3.1e+02  6  0 26  0 12   6  0 26  0 12     0
PCGAMGProl_AGG         6 1.0 1.0400e+01 1.0 0.00e+00 0.0 7.9e+05 8.0e+00 
1.4e+02  9  0  1  0  6   9  0  1  0  6     0
PCGAMGPOpt_AGG         6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00 
3.0e+02 17  0  1  0 12  17  0  1  0 12     0
GAMG: createProl       6 1.0 4.4975e+01 1.1 4.31e+05565.4 4.3e+07 4.6e+00 
8.3e+02 36  0 28  0 33  36  0 28  0 33     0
  Graph               12 1.0 4.2435e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00 
7.2e+01  4  0  0  0  3   4  0  0  0  3     0
  MIS/Agg              6 1.0 1.5276e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00 
2.4e+02  1  0 25  0 10   1  0 25  0 10     0
  SA: col data         6 1.0 6.9602e+00 1.6 0.00e+00 0.0 7.2e+05 8.2e+00 
6.0e+01  6  0  0  0  2   6  0  0  0  2     0
  SA: frmProl0         6 1.0 3.3989e+00 1.0 0.00e+00 0.0 7.2e+04 5.9e+00 
6.0e+01  3  0  0  0  2   3  0  0  0  2     0
  SA: smooth           6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00 
3.0e+02 17  0  1  0 12  17  0  1  0 12     0
GAMG: partLevel        6 1.0 4.3421e+01 1.1 1.97e+053512.3 6.9e+05 2.0e+01 
4.1e+02 38  0  0  0 16  38  0  0  0 16     0
  repartition          6 1.0 3.0290e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
3.6e+01  0  0  0  0  1   0  0  0  0  1     0
  Invert-Sort          6 1.0 2.2100e+00 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 
2.4e+01  2  0  0  0  1   2  0  0  0  1     0
  Move A               6 1.0 1.3769e+01 1.0 0.00e+00 0.0 1.1e+04 8.4e+01 
1.0e+02 12  0  0  0  4  12  0  0  0  4     0
  Move P               6 1.0 1.1463e+01 1.0 0.00e+00 0.0 6.5e+04 5.5e+00 
1.0e+02 10  0  0  0  4  10  0  0  0  4     0
PCSetUp                6 1.0 9.5437e+01 1.0 8.96e+05 3.3 4.5e+07 1.4e+01 
1.5e+03 84  0 29  1 59  84  0 29  1 59    24
PCSetUpOnBlocks       97 1.0 1.7256e-0221.5 2.58e+02 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply               97 1.0 1.1589e+01 1.0 6.95e+07 1.3 1.0e+08 4.8e+02 
5.1e+02 10 84 67 83 21  10 84 67 83 21 37682
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector  1032           1032      3077992     0
      Vector Scatter    64             63        71936     0
              Matrix   211            211      2308880     0
      Matrix Coarsen     6              6         3720     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     5              4        19808     0
Star Forest Bipartite Graph    16             14        11760     0
     Discrete System     5              4         3360     0
           Index Set   180            180       169588     0
   IS L to G Mapping     5              4         6020     0
       Krylov Solver    22             22       374160     0
     DMKSP interface     4              4         2560     0
      Preconditioner    22             22        20924     0
         PetscRandom     6              6         3696     0
              Viewer     8              6         4512     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.57764e-05
Average time for zero size MPI_Send(): 1.04982e-05
#PETSc Option Table entries:
-finput input.txt
-info
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type gamg
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0 
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " 
--known-mpi-shared-libraries=0 --known-memcmp-ok  
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a 
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable 
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native 
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " 
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " 
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 " 
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " 
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 " 
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " 
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 
-Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} 
${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include 
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include 
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc 
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE 
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib 
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

Re: [petsc-users] KSP changes for successive solver

Reply via email to