Re: [petsc-users] KSP changes for successive solver

Michele Rosso Fri, 24 Jul 2015 12:48:38 -0700

Barry,

I attached ksp_view and log_summary for two different setups:


1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
2) Plain MG on 5 levels + custom PC + LU at the coarse level (files
ending in mg7)

The custom PC works on a subset of processes, thus allowing to use two
more levels of MG, for a total of 7.
Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21
iterations.
Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29
iterations.

Thanks for your help!

Michele


On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:

>   The coarse problem for the PCMG (geometric multigrid) is 
> 
> Mat Object:       8192 MPI processes
>         type: mpiaij
>         rows=8192, cols=8192
> 
> then it tries to solve it with algebraic multigrid on 8192 processes (which 
> is completely insane). A lot of the time is spent in setting up the algebraic 
> multigrid (not surprisingly).
> 
> 8192 is kind of small to parallelize.  Please run the same code but with the 
> default coarse grid problem instead of PCGAMG and send us the -log_summary 
> again
> 
>   Barry
> 
> > On Jul 24, 2015, at 1:35 PM, Michele Rosso <[email protected]> wrote:
> > 
> > Hi Mark and Barry,
> > 
> > I am sorry for my late reply: it was a busy week!
> > I run a test case for a larger problem with  as many levels (i.e. 5) of MG 
> > I could and  GAMG as PC at the coarse level. I attached the output of info 
> > ( after grep for "gmag"),  ksp_view and log_summary.
> > The solve takes about 2 seconds on 8192 cores, which is way too much. The 
> > number of iterations to convergence is 24.
> > I hope there is a way to speed it up.
> > 
> > Thanks,
> > Michele
> > 
> > 
> > On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >> 
> >> 
> >> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> >> Barry,
> >> 
> >> thank you very much for the detailed answer.  I tried what you suggested 
> >> and it works.
> >> So far I tried on a small system but the final goal is to use it for very 
> >> large runs.  How does  PCGAMG compares to PCMG  as far as performances and 
> >> scalability are concerned?
> >> Also, could you help me to tune the GAMG part ( my current setup is in the 
> >> attached ksp_view.txt file )? 
> >> 
> >> 
> >> 
> >> I am going to add this to the document today but you can run with -info.  
> >> This is very noisy so you might want to do the next step at run time.  
> >> Then grep on GAMG.  This will be about 20 lines.  Send that to us and we 
> >> can go from there.
> >> 
> >> 
> >> Mark
> >> 
> >> 
> >>  
> >> 
> >> I also tried to use superlu_dist for the LU decomposition on 
> >> mg_coarse_mg_sub_
> >> -mg_coarse_mg_coarse_sub_pc_type lu
> >> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >> 
> >> but I got an error:
> >> 
> >> ****** Error in MC64A/AD. INFO(1) = -2 
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> 
> >> 
> >> Thank you,
> >> Michele
> >> 
> >> 
> >> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>> 
> >>> > On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> >>> > 
> >>> > Barry,
> >>> > 
> >>> > thanks for your reply. So if I want it fixed, I will have to use the 
> >>> > master branch, correct?
> >>> 
> >>> 
> >>>   Yes, or edit mg.c and remove the offending lines of code (easy enough). 
> >>> 
> >>> > 
> >>> > On a side note, what I am trying to achieve is to be able to use how 
> >>> > many levels of MG I want, despite the limitation imposed by the local 
> >>> > number of grid nodes.
> >>> 
> >>> 
> >>>    I assume you are talking about with DMDA? There is no generic 
> >>> limitation for PETSc's multigrid, it is only with the way the DMDA code 
> >>> figures out the interpolation that causes a restriction.
> >>> 
> >>> 
> >>> > So far I am using a borrowed code that implements a PC that creates a 
> >>> > sub communicator and perform MG on it.
> >>> > While reading the documentation I found out that PCMGSetLevels takes in 
> >>> > an optional array of communicators. How does this work?
> >>> 
> >>> 
> >>>    It doesn't work. It was an idea that never got pursued.
> >>> 
> >>> 
> >>> > Can I can simply define my matrix and rhs on the fine grid as I would 
> >>> > do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP 
> >>> > would take care of it by using the correct communicator for each level?
> >>> 
> >>> 
> >>>    No.
> >>> 
> >>>    You can use the PCMG geometric multigrid with DMDA for as many levels 
> >>> as it works and then use PCGAMG as the coarse grid solver. PCGAMG 
> >>> automatically uses fewer processes for the coarse level matrices and 
> >>> vectors. You could do this all from the command line without writing 
> >>> code. 
> >>> 
> >>>    For example if your code uses a DMDA and calls KSPSetDM() use for 
> >>> example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg  
> >>> -ksp_view 
> >>> 
> >>> 
> >>> 
> >>>   Barry
> >>> 
> >>> 
> >>> 
> >>> > 
> >>> > Thanks,
> >>> > Michele
> >>> > 
> >>> > 
> >>> > 
> >>> > 
> >>> > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>> >>    Michel,
> >>> >> 
> >>> >>     This is a very annoying feature that has been fixed in master 
> >>> >> http://www.mcs.anl.gov/petsc/developers/index.html
> >>> >>   I would like to have changed it in maint but Jed would have a 
> >>> >> shit-fit :-) since it changes behavior.
> >>> >> 
> >>> >>   Barry
> >>> >> 
> >>> >> 
> >>> >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> wrote:
> >>> >> > 
> >>> >> > Hi,
> >>> >> > 
> >>> >> > I am performing a series of solves inside a loop. The matrix for 
> >>> >> > each solve changes but not enough to justify a rebuilt of the PC at 
> >>> >> > each solve.
> >>> >> > Therefore I am using  KSPSetReusePreconditioner to avoid rebuilding 
> >>> >> > unless necessary. The solver is CG + MG with a custom  PC at the 
> >>> >> > coarse level.
> >>> >> > If KSP is not updated each time, everything works as it is supposed 
> >>> >> > to. 
> >>> >> > When instead I allow the default PETSc  behavior, i.e. updating PC 
> >>> >> > every time the matrix changes, the coarse level KSP , initially set 
> >>> >> > to PREONLY, is changed into GMRES 
> >>> >> > after the first solve. I am not sure where the problem lies (my PC 
> >>> >> > or PETSc), so I would like to have your opinion on this.
> >>> >> > I attached the ksp_view for the 2 successive solve and the options 
> >>> >> > stack.
> >>> >> > 
> >>> >> > Thanks for your help,
> >>> >> > Michel
> >>> >> > 
> >>> >> > 
> >>> >> > 
> >>> >> > <ksp_view.txt><petsc_options.txt>
> >>> >> 
> >>> >> 
> >>> >> 
> >>> > 
> >>> 
> >>> 
> >>> 
> >> 
> >> 
> >> 
> >> 
> > 
> > <info.txt><ksp_view.txt><log_gamg.txt>
>

KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: lu
        LU: out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: natural
        factor fill ratio given 0, needed 0
          Factored matrix follows:
            Mat Object:             8192 MPI processes
              type: mpiaij
              rows=8192, cols=8192
              package used to perform factorization: superlu_dist
              total: nonzeros=0, allocated nonzeros=0
              total number of mallocs used during MatSetValues calls =0
                SuperLU_DIST run parameters:
                  Process grid nprow 128 x npcol 64 
                  Equilibrate matrix TRUE 
                  Matrix input mode 1 
                  Replace tiny pivots TRUE 
                  Use iterative refinement FALSE 
                  Processors in row 128 col partition 64 
                  Row permutation LargeDiag 
                  Column permutation METIS_AT_PLUS_A
                  Parallel symbolic factorization FALSE 
                  Repeated factorization SamePattern_SameRowPerm
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: dmdarepart
        DMDARepart: parent comm size reduction factor = 64
        DMDARepart: subcomm_size = 128
      KSP Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: mg
          MG: type is MULTIPLICATIVE, levels=2 cycles=v
            Cycles per PCApply=1
            Using Galerkin computed coarse grid matrices
        Coarse grid solver -- level -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 
MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 
MPI processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              matrix ordering: natural
              factor fill ratio given 0, needed 0
                Factored matrix follows:
                  Mat Object:                   128 MPI processes
                    type: mpiaij
                    rows=1024, cols=1024
                    package used to perform factorization: superlu_dist
                    total: nonzeros=0, allocated nonzeros=0
                    total number of mallocs used during MatSetValues calls =0
                      SuperLU_DIST run parameters:
                        Process grid nprow 16 x npcol 8 
                        Equilibrate matrix TRUE 
                        Matrix input mode 1 
                        Replace tiny pivots TRUE 
                        Use iterative refinement FALSE 
                        Processors in row 16 col partition 8 
                        Row permutation LargeDiag 
                        Column permutation METIS_AT_PLUS_A
                        Parallel symbolic factorization FALSE 
                        Repeated factorization SamePattern_SameRowPerm
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=1024, cols=1024
              total: nonzeros=6528, allocated nonzeros=6528
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Down solver (pre-smoother) on level 1 -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_levels_1_)           
128 MPI processes
            type: richardson
              Richardson: damping factor=1
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_levels_1_)           128 
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local iterations = 
1, omega = 1
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=8192, cols=8192
              total: nonzeros=54784, allocated nonzeros=54784
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        linear system matrix = precond matrix:
        Mat Object:         128 MPI processes
          type: mpiaij
          rows=8192, cols=8192
          total: nonzeros=54784, allocated nonzeros=54784
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx 
named p���� with 8192 processors, by mrosso Fri Jul 24 14:11:55 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 
10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           7.565e+01      1.00002   7.565e+01
Objects:              7.230e+02      1.00000   7.230e+02
Flops:                5.717e+07      1.01632   5.707e+07  4.675e+11
Flops/sec:            7.557e+05      1.01634   7.544e+05  6.180e+09
MPI Messages:         9.084e+03      2.00000   8.611e+03  7.054e+07
MPI Message Lengths:  6.835e+06      2.00000   7.524e+02  5.307e+10
MPI Reductions:       1.000e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 7.5651e+01 100.0%  4.6755e+11 100.0%  7.054e+07 100.0%  
7.524e+02      100.0%  9.990e+02  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              174 1.0 1.8118e-01 1.9 1.43e+06 1.0 0.0e+00 0.0e+00 
1.7e+02  0  2  0  0 17   0  2  0  0 17 64440
VecNorm               94 1.0 6.4223e-02 2.1 7.70e+05 1.0 0.0e+00 0.0e+00 
9.4e+01  0  1  0  0  9   0  1  0  0  9 98224
VecScale             787 1.0 1.0910e-03 1.6 1.48e+05 1.8 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 1059301
VecCopy              179 1.0 1.0858e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1240 1.0 1.4889e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              522 1.0 5.7485e-03 1.2 4.28e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0 6093896
VecAYPX              695 1.0 5.3260e-03 1.4 2.17e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0 3335289
VecAssemblyBegin       4 1.0 1.3018e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.6499e-0428.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2182 1.0 2.2002e-02 2.1 0.00e+00 0.0 6.9e+07 7.6e+02 
0.0e+00  0  0 98 99  0   0  0 98 99  0     0
VecScatterEnd       2182 1.0 5.0710e+0074.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatMult              699 1.0 2.3855e+0031.0 2.40e+07 1.0 3.3e+07 1.4e+03 
0.0e+00  0 42 46 84  0   0 42 46 84  0 82105
MatMultAdd           348 1.0 5.8677e-03 1.6 8.14e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 1136883
MatMultTranspose     352 1.0 5.7197e-03 1.2 8.24e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 1179718
MatSolve              87 1.0 5.8730e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 72  0  0  0  0  72  0  0  0  0     0
MatSOR               870 1.0 5.0801e+0055.5 2.27e+07 1.0 3.6e+07 2.2e+02 
0.0e+00  4 40 52 15  0   4 40 52 15  0 36617
MatLUFactorSym         1 1.0 9.5398e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 1.4040e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 19  0  0  0  0  19  0  0  0  0     0
MatResidual          348 1.0 4.1076e-02 1.8 5.70e+06 1.0 1.6e+07 6.8e+02 
0.0e+00  0 10 23 21  0   0 10 23 21  0 1133130
MatAssemblyBegin      21 1.0 2.5973e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
2.6e+01  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        21 1.0 5.4194e-02 2.0 0.00e+00 0.0 4.7e+05 1.4e+02 
7.2e+01  0  0  1  0  7   0  0  1  0  7     0
MatGetRowIJ            1 1.0 5.6028e-0558.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2708e-04 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               35 1.0 4.3098e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
3.5e+01  0  0  0  0  4   0  0  0  0  4     0
MatPtAP                4 1.0 6.8662e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02 
6.8e+01  0  0  1  1  7   0  0  1  1  7 12233
MatPtAPSymbolic        4 1.0 5.3361e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02 
2.8e+01  0  0  1  0  3   0  0  1  0  3     0
MatPtAPNumeric         4 1.0 1.6402e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01 
4.0e+01  0  0  1  0  4   0  0  1  0  4 51212
MatGetLocalMat         4 1.0 2.6742e-0269.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          4 1.0 1.5030e-03 2.6 0.00e+00 0.0 5.6e+05 4.5e+02 
0.0e+00  0  0  1  0  0   0  0  1  0  0     0
MatGetSymTrans         8 1.0 1.9407e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               9 1.0 5.1131e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 7.3904e+01 1.0 5.72e+07 1.0 7.0e+07 7.5e+02 
9.1e+02 98100100100 91  98100100100 91  6325
PCSetUp                4 1.0 1.4206e+01 1.0 1.73e+05 1.0 1.3e+06 2.2e+02 
2.0e+02 19  0  2  1 20  19  0  2  1 20   100
PCApply               87 1.0 5.9362e+01 1.0 4.79e+07 1.0 6.5e+07 6.8e+02 
3.5e+02 78 84 92 83 35  78 84 92 83 35  6596
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   592            592      2160472     0
      Vector Scatter    14             13        18512     0
              Matrix    38             38       976248     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     5              4        19808     0
Star Forest Bipartite Graph    10              8         6720     0
     Discrete System     5              4         3360     0
           Index Set    32             32        51488     0
   IS L to G Mapping     5              4         6020     0
       Krylov Solver     7              7         8608     0
     DMKSP interface     4              4         2560     0
      Preconditioner     7              7         6968     0
              Viewer     3              1          752     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.22001e-05
Average time for zero size MPI_Send(): 1.56337e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0 
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " 
--known-mpi-shared-libraries=0 --known-memcmp-ok  
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a 
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable 
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native 
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " 
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " 
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 " 
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " 
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 " 
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " 
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 
-Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} 
${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include 
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include 
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc 
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE 
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib 
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx 
named p���� with 8192 processors, by mrosso Fri Jul 24 14:33:06 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 
10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           3.447e+00      1.00038   3.446e+00
Objects:              1.368e+03      1.28935   1.066e+03
Flops:                7.647e+07      1.02006   7.608e+07  6.232e+11
Flops/sec:            2.219e+07      1.02020   2.207e+07  1.808e+11
MPI Messages:         2.096e+04      3.38688   1.201e+04  9.840e+07
MPI Message Lengths:  9.104e+06      2.00024   7.189e+02  7.074e+10
MPI Reductions:       1.416e+03      1.08506

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 3.1206e+00  90.5%  6.2314e+11 100.0%  9.376e+07  95.3%  
7.181e+02       99.9%  1.261e+03  89.0% 
 1: PCRprt_SetUpMat: 2.5313e-02   0.7%  6.5418e+05   0.0%  6.123e+05   0.6%  
5.931e-02        0.0%  4.425e+01   3.1% 
 2:    PCRprt_Apply: 3.0039e-01   8.7%  8.8424e+07   0.0%  4.029e+06   4.1%  
6.738e-01        0.1%  9.062e-01   0.1% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              232 1.0 4.3392e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00 
2.3e+02  1  2  0  0 16   1  2  0  0 18 358757
VecNorm              123 1.0 1.6137e-02 2.0 1.01e+06 1.0 0.0e+00 0.0e+00 
1.2e+02  0  1  0  0  9   0  1  0  0 10 511516
VecScale            1048 1.0 1.1351e-03 1.5 1.92e+05 1.8 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 1318105
VecCopy              121 1.0 1.2727e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1647 1.0 1.6043e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              696 1.0 7.1111e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0 6568316
VecAYPX              927 1.0 4.7853e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0 4961251
VecAssemblyBegin       4 1.0 1.2280e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.6284e-0434.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2907 1.0 2.7515e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 
0.0e+00  1  0 94 99  0   1  0 98 99  0     0
VecScatterEnd       2907 1.0 1.5621e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0   4  0  0  0  0     0
MatMult              931 1.0 2.1213e-01 2.2 3.19e+07 1.0 4.3e+07 1.4e+03 
0.0e+00  5 42 44 84  0   5 42 46 84  0 1228981
MatMultAdd           464 1.0 4.5297e-03 1.1 1.09e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 1963600
MatMultTranspose     468 1.0 7.2241e-03 1.2 1.10e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 1241849
MatSOR              1160 1.0 1.4814e-01 1.2 3.03e+07 1.0 4.9e+07 2.2e+02 
0.0e+00  4 40 49 15  0   4 40 52 15  0 1673981
MatResidual          464 1.0 5.4564e-02 1.8 7.60e+06 1.0 2.2e+07 6.8e+02 
0.0e+00  1 10 22 21  0   1 10 23 21  0 1137383
MatAssemblyBegin      26 1.0 2.9964e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
3.6e+01  1  0  0  0  3   1  0  0  0  3     0
MatAssemblyEnd        26 1.0 3.6304e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02 
8.0e+01  1  0  0  0  6   1  0  1  0  6     0
MatView               50 1.7 5.7154e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
3.0e+01  2  0  0  0  2   2  0  0  0  2     0
MatPtAP                8 1.0 4.8214e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02 
7.6e+01  1  0  1  1  5   2  0  1  1  6 34843
MatPtAPSymbolic        4 1.0 2.7914e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02 
2.8e+01  1  0  1  0  2   1  0  1  0  2     0
MatPtAPNumeric         8 1.0 2.1734e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02 
4.8e+01  1  0  1  0  3   1  0  1  0  4 77294
MatGetLocalMat         8 1.0 6.5875e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          8 1.0 1.9593e-03 2.6 0.00e+00 0.0 7.5e+05 5.1e+02 
0.0e+00  0  0  1  1  0   0  0  1  1  0     0
MatGetSymTrans         8 1.0 1.4830e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              14 1.0 6.4659e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 9.5956e-01 1.0 7.65e+07 1.0 9.8e+07 7.2e+02 
1.2e+03 28100100100 86  31100105100 97 649356
PCSetUp                4 1.0 1.7332e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02 
2.8e+02  5  0  2  1 20   5  0  2  1 22 13014
PCApply              116 1.0 7.0218e-01 1.0 6.42e+07 1.0 9.1e+07 6.5e+02 
4.6e+02 20 84 92 83 33  22 84 97 83 37 743519

--- Event Stage 1: PCRprt_SetUpMat

VecSet                 3 1.5 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      10 1.2 4.3280e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 
4.1e+00  0  0  0  0  0   8  0  0  0  9     0
MatAssemblyEnd        10 1.2 8.4145e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00 
1.6e+01  0  0  0  0  1  30  0 31 13 36     0
MatGetRow            192 0.0 4.4584e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 1.0426e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01 
6.0e+00  0  0  0  0  0  23  0 13 32 14     0
MatZeroEntries         1 0.0 6.9141e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 1.0 1.8841e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00 
3.4e+01  1  0  1  0  2  74100 87 67 77    35
MatPtAPSymbolic        2 1.0 9.2332e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00 
1.4e+01  0  0  0  0  1  35  0 54 40 32     0
MatPtAPNumeric         2 1.0 1.0050e-02 1.1 8.40e+01 2.6 2.0e+05 7.9e+00 
2.0e+01  0  0  0  0  1  39100 33 28 45    65
MatGetLocalMat         2 1.0 5.9128e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 1.0 5.0616e-04 3.8 0.00e+00 0.0 2.8e+05 5.3e+00 
0.0e+00  0  0  0  0  0   1  0 46 26  0     0
MatGetSymTrans         4 1.0 1.0729e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 2: PCRprt_Apply

VecScale             348 0.0 2.4199e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  4  0  0  0 13989
VecCopy              116 0.0 6.5565e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1049 3.0 3.4976e-04 6.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX              116 0.0 8.7500e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  1  0  0  0 10860
VecScatterBegin     1161 2.5 1.2123e-0240.8 0.00e+00 0.0 4.0e+06 1.6e+01 
0.0e+00  0  0  4  0  0   0  0100100  0     0
VecScatterEnd       1161 2.5 3.0874e-0110.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0  98  0  0  0  0     0
MatMult              232 2.0 9.2895e-0368.7 9.67e+04834.0 1.0e+06 1.6e+01 
0.0e+00  0  0  1  0  0   1 15 25 25  0  1469
MatMultAdd           116 0.0 3.1829e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  2  0  0  0  5971
MatMultTranspose     233 2.0 1.1170e-0233.1 1.52e+0465.6 9.4e+05 8.0e+00 
0.0e+00  0  0  1  0  0   1  4 23 11  0   342
MatSolve             116 0.0 1.6799e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatSOR               232 0.0 1.7143e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02 
0.0e+00  0  0  0  0  0   0 77  5 41  0  3947
MatLUFactorSym         1 0.0 4.6492e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 0.0 6.0585e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual          116 0.0 4.7536e-03 0.0 1.04e+05 0.0 7.1e+04 1.3e+02 
0.0e+00  0  0  0  0  0   0 14  2 14  0  2674
MatAssemblyBegin       5 0.0 4.3392e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
9.4e-02  0  0  0  0  0   0  0  0  0 10     0
MatAssemblyEnd         5 0.0 8.8215e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01 
2.5e-01  0  0  0  0  0   0  0  0  0 28     0
MatGetRowIJ            1 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 0.0 2.7895e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 0.0 1.5361e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01 
3.0e-01  0  0  0  0  0   0  0  0  0 33   221
MatPtAPSymbolic        1 0.0 6.6018e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01 
1.1e-01  0  0  0  0  0   0  0  0  0 12     0
MatPtAPNumeric         2 0.0 8.8406e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01 
1.9e-01  0  0  0  0  0   0  0  0  0 21   385
MatGetLocalMat         2 0.0 3.2187e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 0.0 1.9097e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               6 0.0 1.2183e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
3.1e-02  0  0  0  0  0   0  0  0  0  3     0
KSPSolve             116 0.0 2.7114e-01 0.0 6.87e+05 0.0 2.9e+05 1.3e+02 
9.1e-01  0  0  0  0  0   1 96  7 55100   312
PCSetUp                2 0.0 6.5762e-02 0.0 3.78e+03 0.0 4.9e+03 5.3e+01 
9.1e-01  0  0  0  0  0   0  1  0  0100     7
PCApply              116 0.0 2.0491e-01 0.0 6.83e+05 0.0 2.8e+05 1.3e+02 
0.0e+00  0  0  0  0  0   1 95  7 54  0   411
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   778            787      2743704     0
      Vector Scatter    18             21        27616     0
              Matrix    38             52      1034136     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     7              7        34664     0
Star Forest Bipartite Graph    14             14        11760     0
     Discrete System     7              7         5880     0
           Index Set    36             38        56544     0
   IS L to G Mapping     7              7         8480     0
       Krylov Solver    11             10        12240     0
     DMKSP interface     4              5         3200     0
      Preconditioner    11             10        10056     0
              Viewer     8              6         4512     0

--- Event Stage 1: PCRprt_SetUpMat

              Vector     6              5         7840     0
      Vector Scatter     3              2         2128     0
              Matrix    15             12        43656     0
           Index Set    10             10         7896     0

--- Event Stage 2: PCRprt_Apply

              Vector   364            356       685152     0
      Vector Scatter     3              0            0     0
              Matrix    11              0            0     0
    Distributed Mesh     1              0            0     0
Star Forest Bipartite Graph     2              0            0     0
     Discrete System     1              0            0     0
           Index Set    10              8         6304     0
   IS L to G Mapping     1              0            0     0
       Krylov Solver     0              1         1136     0
     DMKSP interface     1              0            0     0
      Preconditioner     0              1          984     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 5.24044e-05
Average time for zero size MPI_Send(): 2.16223e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_dmdarepart_mg_coarse_pc_type lu
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0 
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " 
--known-mpi-shared-libraries=0 --known-memcmp-ok  
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a 
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable 
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native 
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " 
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " 
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 " 
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " 
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 " 
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " 
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 
-Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} 
${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include 
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include 
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc 
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE 
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib 
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

Re: [petsc-users] KSP changes for successive solver

Reply via email to