Re: [petsc-users] KSP changes for successive solver

Michele Rosso Wed, 29 Jul 2015 10:50:36 -0700

Hi Barry,

I tried what you suggested:


1) 5 levels of MG + defaults at the coarse level (PCREDUNDANT)
2) 5 levels of MG + 2 levels of MG via DMDAREPART +  defaults at the
coarse level (PCREDUNDANT)

I attached ksp_view and log_summary for both cases.
The use of PCREDUNDAND halves the time for case 1 ( from ~ 20 sec per
solve to ~ 10 sec per solve ), while it seems not having much effect on
case 2.
Any thoughts on this?

Thanks,
Michele


On Sat, 2015-07-25 at 22:18 -0500, Barry Smith wrote:

>   This dmdarepart business, which I am guessing is running PCMG on smaller 
> sets of processes with a DMDDA on that smaller set of processes for a coarse 
> problem is a fine idea but you should keep in mind the rule of thumb that 
> that parallel iterative (and even more direct) solvers don't do well we there 
> is roughly 10,000 or fewer degrees of freedom per processor.  So you should 
> definitely not be using SuperLU_DIST in parallel to solve a problem with 1048 
> degrees of freedom on 128 processes, just use PCREDUNDANT and its default 
> (sequential) LU. That should be faster.
> 
>   Barry
> 
> > On Jul 25, 2015, at 10:09 PM, Barry Smith <[email protected]> wrote:
> > 
> > 
> >  Don't use 
> > 
> > -mg_coarse_pc_factor_mat_solver_package superlu_dist
> > -mg_coarse_pc_type lu
> > 
> >  with 8000+ processes and 1 degree of freedom per process SuperLU_DIST will 
> > be terrible. Just leave the defaults for this and send the -log_summary
> > 
> >  Barry
> > 
> >> On Jul 24, 2015, at 2:44 PM, Michele Rosso <[email protected]> wrote:
> >> 
> >> Barry,
> >> 
> >> I attached ksp_view and log_summary for two different setups:
> >> 
> >> 1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
> >> 2) Plain MG on 5 levels + custom PC + LU at the coarse level (files ending 
> >> in mg7)
> >> 
> >> The custom PC works on a subset of processes, thus allowing to use two 
> >> more levels of MG, for a total of 7.
> >> Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21 
> >> iterations.
> >> Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29 
> >> iterations.
> >> 
> >> Thanks for your help!
> >> 
> >> Michele
> >> 
> >> 
> >> On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> >>>  The coarse problem for the PCMG (geometric multigrid) is 
> >>> 
> >>> Mat Object:       8192 MPI processes
> >>>        type: mpiaij
> >>>        rows=8192, cols=8192
> >>> 
> >>> then it tries to solve it with algebraic multigrid on 8192 processes 
> >>> (which is completely insane). A lot of the time is spent in setting up 
> >>> the algebraic multigrid (not surprisingly).
> >>> 
> >>> 8192 is kind of small to parallelize.  Please run the same code but with 
> >>> the default coarse grid problem instead of PCGAMG and send us the 
> >>> -log_summary again
> >>> 
> >>>  Barry
> >>> 
> >>> 
> >>>> On Jul 24, 2015, at 1:35 PM, Michele Rosso <[email protected]> wrote:
> >>>> 
> >>>> Hi Mark and Barry,
> >>>> 
> >>>> I am sorry for my late reply: it was a busy week!
> >>>> I run a test case for a larger problem with  as many levels (i.e. 5) of 
> >>>> MG I could and  GAMG as PC at the coarse level. I attached the output of 
> >>>> info ( after grep for "gmag"),  ksp_view and log_summary.
> >>>> The solve takes about 2 seconds on 8192 cores, which is way too much. 
> >>>> The number of iterations to convergence is 24.
> >>>> I hope there is a way to speed it up.
> >>>> 
> >>>> Thanks,
> >>>> Michele
> >>>> 
> >>>> 
> >>>> On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>>>> 
> >>>>> 
> >>>>> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> >>>>> Barry,
> >>>>> 
> >>>>> thank you very much for the detailed answer.  I tried what you 
> >>>>> suggested and it works.
> >>>>> So far I tried on a small system but the final goal is to use it for 
> >>>>> very large runs.  How does  PCGAMG compares to PCMG  as far as 
> >>>>> performances and scalability are concerned?
> >>>>> Also, could you help me to tune the GAMG part ( my current setup is in 
> >>>>> the attached ksp_view.txt file )? 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I am going to add this to the document today but you can run with 
> >>>>> -info.  This is very noisy so you might want to do the next step at run 
> >>>>> time.  Then grep on GAMG.  This will be about 20 lines.  Send that to 
> >>>>> us and we can go from there.
> >>>>> 
> >>>>> 
> >>>>> Mark
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I also tried to use superlu_dist for the LU decomposition on 
> >>>>> mg_coarse_mg_sub_
> >>>>> -mg_coarse_mg_coarse_sub_pc_type lu
> >>>>> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>>>> 
> >>>>> but I got an error:
> >>>>> 
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2 
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> 
> >>>>> 
> >>>>> Thank you,
> >>>>> Michele
> >>>>> 
> >>>>> 
> >>>>> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>>>> 
> >>>>>>> On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>> 
> >>>>>>> Barry,
> >>>>>>> 
> >>>>>>> thanks for your reply. So if I want it fixed, I will have to use the 
> >>>>>>> master branch, correct?
> >>>>>> 
> >>>>>> 
> >>>>>>  Yes, or edit mg.c and remove the offending lines of code (easy 
> >>>>>> enough). 
> >>>>>> 
> >>>>>>> 
> >>>>>>> On a side note, what I am trying to achieve is to be able to use how 
> >>>>>>> many levels of MG I want, despite the limitation imposed by the local 
> >>>>>>> number of grid nodes.
> >>>>>> 
> >>>>>> 
> >>>>>>   I assume you are talking about with DMDA? There is no generic 
> >>>>>> limitation for PETSc's multigrid, it is only with the way the DMDA 
> >>>>>> code figures out the interpolation that causes a restriction.
> >>>>>> 
> >>>>>> 
> >>>>>>> So far I am using a borrowed code that implements a PC that creates a 
> >>>>>>> sub communicator and perform MG on it.
> >>>>>>> While reading the documentation I found out that PCMGSetLevels takes 
> >>>>>>> in an optional array of communicators. How does this work?
> >>>>>> 
> >>>>>> 
> >>>>>>   It doesn't work. It was an idea that never got pursued.
> >>>>>> 
> >>>>>> 
> >>>>>>> Can I can simply define my matrix and rhs on the fine grid as I would 
> >>>>>>> do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP 
> >>>>>>> would take care of it by using the correct communicator for each 
> >>>>>>> level?
> >>>>>> 
> >>>>>> 
> >>>>>>   No.
> >>>>>> 
> >>>>>>   You can use the PCMG geometric multigrid with DMDA for as many 
> >>>>>> levels as it works and then use PCGAMG as the coarse grid solver. 
> >>>>>> PCGAMG automatically uses fewer processes for the coarse level 
> >>>>>> matrices and vectors. You could do this all from the command line 
> >>>>>> without writing code. 
> >>>>>> 
> >>>>>>   For example if your code uses a DMDA and calls KSPSetDM() use for 
> >>>>>> example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type 
> >>>>>> gamg  -ksp_view 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>  Barry
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Michele
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>>>>>>>   Michel,
> >>>>>>>> 
> >>>>>>>>    This is a very annoying feature that has been fixed in master 
> >>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
> >>>>>>>>  I would like to have changed it in maint but Jed would have a 
> >>>>>>>> shit-fit :-) since it changes behavior.
> >>>>>>>> 
> >>>>>>>>  Barry
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>>>> 
> >>>>>>>>> Hi,
> >>>>>>>>> 
> >>>>>>>>> I am performing a series of solves inside a loop. The matrix for 
> >>>>>>>>> each solve changes but not enough to justify a rebuilt of the PC at 
> >>>>>>>>> each solve.
> >>>>>>>>> Therefore I am using  KSPSetReusePreconditioner to avoid rebuilding 
> >>>>>>>>> unless necessary. The solver is CG + MG with a custom  PC at the 
> >>>>>>>>> coarse level.
> >>>>>>>>> If KSP is not updated each time, everything works as it is supposed 
> >>>>>>>>> to. 
> >>>>>>>>> When instead I allow the default PETSc  behavior, i.e. updating PC 
> >>>>>>>>> every time the matrix changes, the coarse level KSP , initially set 
> >>>>>>>>> to PREONLY, is changed into GMRES 
> >>>>>>>>> after the first solve. I am not sure where the problem lies (my PC 
> >>>>>>>>> or PETSc), so I would like to have your opinion on this.
> >>>>>>>>> I attached the ksp_view for the 2 successive solve and the options 
> >>>>>>>>> stack.
> >>>>>>>>> 
> >>>>>>>>> Thanks for your help,
> >>>>>>>>> Michel
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> <ksp_view.txt><petsc_options.txt>
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >>>> <info.txt><ksp_view.txt><log_gamg.txt>
> >>> 
> >>> 
> >>> 
> >> 
> >> <ksp_view_mg5.txt><ksp_view_mg7.txt><log_mg5.txt><log_mg7.txt>
> > 
>

KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: dmdarepart
        DMDARepart: parent comm size reduction factor = 64
        DMDARepart: subcomm_size = 128
      KSP Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_dmdarepart_)       128 MPI processes
        type: mg
          MG: type is MULTIPLICATIVE, levels=2 cycles=v
            Cycles per PCApply=1
            Using Galerkin computed coarse grid matrices
        Coarse grid solver -- level -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 
MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_coarse_)           128 
MPI processes
            type: redundant
              Redundant preconditioner: First (color=0) of 128 PCs follows
            KSP Object:            (mg_coarse_dmdarepart_mg_coarse_redundant_)  
           1 MPI processes
              type: preonly
              maximum iterations=10000, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_dmdarepart_mg_coarse_redundant_)   
          1 MPI processes
              type: lu
                LU: out-of-place factorization
                tolerance for zero pivot 2.22045e-14
                using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
                matrix ordering: nd
                factor fill ratio given 5, needed 9.76317
                  Factored matrix follows:
                    Mat Object:                     1 MPI processes
                      type: seqaij
                      rows=1024, cols=1024
                      package used to perform factorization: petsc
                      total: nonzeros=63734, allocated nonzeros=63734
                      total number of mallocs used during MatSetValues calls =0
                        not using I-node routines
              linear system matrix = precond matrix:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=1024, cols=1024
                total: nonzeros=6528, allocated nonzeros=6528
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=1024, cols=1024
              total: nonzeros=6528, allocated nonzeros=6528
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Down solver (pre-smoother) on level 1 -------------------------------
          KSP Object:          (mg_coarse_dmdarepart_mg_levels_1_)           
128 MPI processes
            type: richardson
              Richardson: damping factor=1
            maximum iterations=2
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using nonzero initial guess
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_dmdarepart_mg_levels_1_)           128 
MPI processes
            type: sor
              SOR: type = local_symmetric, iterations = 1, local iterations = 
1, omega = 1
            linear system matrix = precond matrix:
            Mat Object:             128 MPI processes
              type: mpiaij
              rows=8192, cols=8192
              total: nonzeros=54784, allocated nonzeros=54784
              total number of mallocs used during MatSetValues calls =0
                not using I-node (on process 0) routines
        Up solver (post-smoother) same as down solver (pre-smoother)
        linear system matrix = precond matrix:
        Mat Object:         128 MPI processes
          type: mpiaij
          rows=8192, cols=8192
          total: nonzeros=54784, allocated nonzeros=54784
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
There are 3 unused database options. They are:
Option left: name:-finput value: input.txt
Option left: name:-mg_coarse_dmdarepart_ksp_constant_null_space (no value)
Option left: name:-pc_dmdarepart_monitor (no value)
Application 25736695 resources: utime ~29149s, stime ~48455s, Rss ~64608, 
inblocks ~6174814, outblocks ~18104253

KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 8192 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5, needed 23.9038
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=8192, cols=8192
                package used to perform factorization: petsc
                total: nonzeros=1.30955e+06, allocated nonzeros=1.30955e+06
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Mat Object:         1 MPI processes
          type: seqaij
          rows=8192, cols=8192
          total: nonzeros=54784, allocated nonzeros=54784
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx 
named p���� with 8192 processors, by mrosso Tue Jul 28 16:20:21 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 
10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           7.498e+00      1.01676   7.375e+00
Objects:              1.385e+03      1.30537   1.066e+03
Flops:                9.815e+07      1.30922   7.642e+07  6.260e+11
Flops/sec:            1.331e+07      1.30928   1.036e+07  8.488e+10
MPI Messages:         3.595e+04      5.80931   1.225e+04  1.003e+08
MPI Message Lengths:  9.104e+06      2.00024   7.063e+02  7.086e+10
MPI Reductions:       1.427e+03      1.09349

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0526e+00  95.6%  6.2314e+11  99.5%  9.376e+07  93.5%  
7.044e+02       99.7%  1.260e+03  88.3% 
 1: PCRprt_SetUpMat: 2.7279e-02   0.4%  6.5418e+05   0.0%  6.123e+05   0.6%  
5.817e-02        0.0%  4.425e+01   3.1% 
 2:    PCRprt_Apply: 2.9504e-01   4.0%  2.8632e+09   0.5%  5.947e+06   5.9%  
1.880e+00        0.3%  1.156e+00   0.1% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              232 1.0 3.9837e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00 
2.3e+02  0  2  0  0 16   0  2  0  0 18 390775
VecNorm              123 1.0 1.7174e-02 1.9 1.01e+06 1.0 0.0e+00 0.0e+00 
1.2e+02  0  1  0  0  9   0  1  0  0 10 480626
VecScale            1048 1.0 1.5078e-0218.8 1.92e+05 1.8 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 99231
VecCopy              121 1.0 1.2872e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1647 1.0 1.6298e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              696 1.0 6.7093e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0 6961607
VecAYPX              927 1.0 4.6690e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0 5084883
VecAssemblyBegin       4 1.0 1.3000e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.4210e-0429.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2907 1.0 2.7453e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 
0.0e+00  0  0 92 99  0   0  0 98 99  0     0
VecScatterEnd       2907 1.0 1.8748e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatMult              931 1.0 2.3768e-01 2.6 3.19e+07 1.0 4.3e+07 1.4e+03 
0.0e+00  2 42 43 84  0   2 42 46 84  0 1096892
MatMultAdd           464 1.0 4.9362e-03 1.2 1.09e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 1801895
MatMultTranspose     468 1.0 1.6587e-02 2.6 1.10e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 540858
MatSOR              1160 1.0 1.8799e-01 1.6 3.03e+07 1.0 4.9e+07 2.2e+02 
0.0e+00  2 40 48 15  0   2 40 52 15  0 1319153
MatResidual          464 1.0 7.4724e-02 2.5 7.60e+06 1.0 2.2e+07 6.8e+02 
0.0e+00  1 10 22 21  0   1 10 23 21  0 830522
MatAssemblyBegin      26 1.0 3.0778e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
3.6e+01  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        26 1.0 3.6265e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02 
8.0e+01  0  0  0  0  6   0  0  1  0  6     0
MatView               55 1.8 3.3602e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00 
3.0e+01  4  0  0  0  2   5  0  0  0  2     0
MatPtAP                8 1.0 4.7572e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02 
7.6e+01  1  0  1  1  5   1  0  1  1  6 35313
MatPtAPSymbolic        4 1.0 2.7729e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02 
2.8e+01  0  0  1  0  2   0  0  1  0  2     0
MatPtAPNumeric         8 1.0 2.1160e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02 
4.8e+01  0  0  1  0  3   0  0  1  0  4 79392
MatGetLocalMat         8 1.0 6.5184e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          8 1.0 1.9581e-03 2.4 0.00e+00 0.0 7.5e+05 5.1e+02 
0.0e+00  0  0  1  1  0   0  0  1  1  0     0
MatGetSymTrans         8 1.0 1.2302e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              14 1.0 6.8645e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 1.0214e+00 1.0 9.81e+07 1.3 1.0e+08 7.1e+02 
1.2e+03 14100100100 86  14100107100 97 612784
PCSetUp                4 1.0 1.7279e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02 
2.8e+02  2  0  2  1 20   2  0  2  1 22 13054
PCApply              116 1.0 7.6665e-01 1.0 8.58e+07 1.4 9.2e+07 6.4e+02 
4.7e+02 10 84 92 83 33  11 84 99 83 37 684611

--- Event Stage 1: PCRprt_SetUpMat

VecSet                 3 1.5 1.3113e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      10 1.2 5.4898e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
4.1e+00  0  0  0  0  0  11  0  0  0  9     0
MatAssemblyEnd        10 1.2 9.6285e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00 
1.6e+01  0  0  0  0  1  33  0 31 13 36     0
MatGetRow            192 0.0 4.2677e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 1.0698e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01 
6.0e+00  0  0  0  0  0  22  0 13 32 14     0
MatZeroEntries         1 0.0 3.0994e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 1.0 2.0634e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00 
3.4e+01  0  0  1  0  2  75100 87 67 77    32
MatPtAPSymbolic        2 1.0 8.6851e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00 
1.4e+01  0  0  0  0  1  31  0 54 40 32     0
MatPtAPNumeric         2 1.0 1.2376e-02 1.0 8.40e+01 2.6 2.0e+05 7.9e+00 
2.0e+01  0  0  0  0  1  44100 33 28 45    53
MatGetLocalMat         2 1.0 6.1274e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 1.0 4.8995e-04 3.7 0.00e+00 0.0 2.8e+05 5.3e+00 
0.0e+00  0  0  0  0  0   1  0 46 26  0     0
MatGetSymTrans         4 1.0 2.0742e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 2: PCRprt_Apply

VecScale             348 0.0 2.3985e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 14114
VecSet              1167 3.4 5.2118e-04 9.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX              116 0.0 7.3195e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 12983
VecScatterBegin     1393 3.0 3.2119e-02112.6 0.00e+00 0.0 5.9e+06 3.2e+01 
0.0e+00  0  0  6  0  0   0  0 99 99  0     0
VecScatterEnd       1393 3.0 3.2946e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0  99  0  0  0  0     0
MatMult              232 2.0 4.5841e-02336.1 9.67e+04834.0 1.0e+06 1.6e+01 
0.0e+00  0  0  1  0  0   1  0 17  9  0   298
MatMultAdd           116 0.0 2.9373e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  6470
MatMultTranspose     233 2.0 3.0067e-0290.1 1.52e+0465.6 9.4e+05 8.0e+00 
0.0e+00  0  0  1  0  0   1  0 16  4  0   127
MatSolve             116 0.0 2.3469e-02 0.0 1.47e+07 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0 66  0  0  0 79995
MatSOR               232 0.0 4.8394e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02 
0.0e+00  0  0  0  0  0   0  2  4 14  0  1398
MatLUFactorSym         1 0.0 2.5880e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 0.0 1.0722e-02 0.0 7.01e+06 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0 31  0  0  0 83692
MatCopy                1 0.0 3.0041e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 0.0 7.4148e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual          116 0.0 4.5305e-02 0.0 1.04e+05 0.0 7.1e+04 1.3e+02 
0.0e+00  0  0  0  0  0   0  0  1  5  0   281
MatAssemblyBegin       6 0.0 4.5967e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatAssemblyEnd         6 0.0 9.6583e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01 
2.5e-01  0  0  0  0  0   0  0  0  0 22     0
MatGetRowIJ            1 0.0 9.5844e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 0.0 2.3339e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatGetOrdering         1 0.0 8.8000e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                2 0.0 1.5650e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01 
3.0e-01  0  0  0  0  0   0  0  0  0 26   217
MatPtAPSymbolic        1 0.0 6.5613e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01 
1.1e-01  0  0  0  0  0   0  0  0  0  9     0
MatPtAPNumeric         2 0.0 9.1791e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01 
1.9e-01  0  0  0  0  0   0  0  0  0 16   370
MatRedundantMat        2 0.0 2.4142e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
9.4e-02  0  0  0  0  0   0  0  0  0  8     0
MatGetLocalMat         2 0.0 3.7909e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          2 0.0 2.0623e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               8 0.0 1.2207e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
3.1e-02  0  0  0  0  0   0  0  0  0  3     0
KSPSolve             116 0.0 2.6315e-01 0.0 2.24e+07 0.0 2.2e+06 7.2e+01 
1.2e+00  0  0  2  0  0   1100 37 84100 10866
PCSetUp                2 0.0 4.0980e-02 0.0 7.01e+06 0.0 3.8e+04 5.0e+01 
1.2e+00  0  0  0  0  0   0 31  1  1100 21909
PCApply              116 0.0 2.2205e-01 0.0 1.54e+07 0.0 2.2e+06 7.2e+01 
0.0e+00  0  0  2  0  0   1 69 36 83  0  8834
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   778            791      2774488     0
      Vector Scatter    18             23        29872     0
              Matrix    38             52      1988092     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     7              7        34664     0
Star Forest Bipartite Graph    14             14        11760     0
     Discrete System     7              7         5880     0
           Index Set    36             41        67040     0
   IS L to G Mapping     7              7         8480     0
       Krylov Solver    11             11        13376     0
     DMKSP interface     4              5         3200     0
      Preconditioner    11             11        10864     0
              Viewer    13             11         8272     0

--- Event Stage 1: PCRprt_SetUpMat

              Vector     6              5         7840     0
      Vector Scatter     3              2         2128     0
              Matrix    15             12        43656     0
           Index Set    10             10         7896     0

--- Event Stage 2: PCRprt_Apply

              Vector   369            357       686800     0
      Vector Scatter     5              0            0     0
              Matrix    11              0            0     0
    Distributed Mesh     1              0            0     0
Star Forest Bipartite Graph     2              0            0     0
     Discrete System     1              0            0     0
           Index Set    15             10        16000     0
   IS L to G Mapping     1              0            0     0
     DMKSP interface     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 5.19753e-05
Average time for zero size MPI_Send(): 2.16846e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0 
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " 
--known-mpi-shared-libraries=0 --known-memcmp-ok  
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a 
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable 
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native 
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " 
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " 
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 " 
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " 
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 " 
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " 
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 
-Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} 
${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include 
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include 
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc 
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE 
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib 
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx 
named p���� with 8192 processors, by mrosso Tue Jul 28 15:28:29 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 
10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           5.098e+02      1.00007   5.098e+02
Objects:              7.400e+02      1.00000   7.400e+02
Flops:                5.499e+08      1.00167   5.498e+08  4.504e+12
Flops/sec:            1.079e+06      1.00174   1.078e+06  8.834e+09
MPI Messages:         7.381e+05      1.00619   7.376e+05  6.043e+09
MPI Message Lengths:  1.267e+07      1.36946   1.669e+01  1.008e+11
MPI Reductions:       1.009e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 5.0982e+02 100.0%  4.5037e+12 100.0%  6.043e+09 100.0%  
1.669e+01      100.0%  1.008e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot              174 1.0 1.5646e-01 1.5 1.43e+06 1.0 0.0e+00 0.0e+00 
1.7e+02  0  0  0  0 17   0  0  0  0 17 74621
VecNorm               94 1.0 5.5188e-02 2.5 7.70e+05 1.0 0.0e+00 0.0e+00 
9.4e+01  0  0  0  0  9   0  0  0  0  9 114305
VecScale             787 1.0 1.4017e-03 1.9 1.48e+05 1.8 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 824521
VecCopy               92 1.0 1.0190e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1329 1.0 3.7305e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              522 1.0 5.5845e-03 1.3 4.28e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 6272892
VecAYPX              695 1.0 3.0615e-02 9.2 2.17e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 580237
VecAssemblyBegin       4 1.0 1.3102e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.2e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         4 1.0 1.8620e-0432.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2356 1.0 1.6390e+01 4.7 0.00e+00 0.0 5.9e+09 1.7e+01 
0.0e+00  2  0 98 99  0   2  0 98 99  0     0
VecScatterEnd       2356 1.0 4.1647e+02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 69  0  0  0  0  69  0  0  0  0     0
MatMult              699 1.0 5.2895e+01643.0 2.40e+07 1.0 3.3e+07 1.4e+03 
0.0e+00  1  4  1 44  0   1  4  1 44  0  3703
MatMultAdd           348 1.0 5.8870e-03 1.5 8.14e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 1133153
MatMultTranspose     352 1.0 6.3620e-03 1.3 8.24e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 1060614
MatSolve              87 1.0 3.9927e-01 1.3 2.27e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0 41  0  0  0   0 41  0  0  0 4660544
MatSOR               870 1.0 1.1567e+02523.3 2.27e+07 1.0 3.6e+07 2.2e+02 
0.0e+00  7  4  1  8  0   7  4  1  8  0  1608
MatLUFactorSym         1 1.0 5.9881e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 5.9217e-01 1.1 2.66e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0 48  0  0  0   0 48  0  0  0 3673552
MatConvert             1 1.0 1.0331e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual          348 1.0 3.3047e-0113.8 5.70e+06 1.0 1.6e+07 6.8e+02 
0.0e+00  0  1  0 11  0   0  1  0 11  0 140845
MatAssemblyBegin      22 1.0 2.4983e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
2.6e+01  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        22 1.0 3.3268e-02 1.1 0.00e+00 0.0 4.7e+05 1.4e+02 
7.2e+01  0  0  0  0  7   0  0  0  0  7     0
MatGetRowIJ            1 1.0 5.8293e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 2.2252e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 9.7980e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               40 1.3 3.3014e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 
3.0e+01  0  0  0  0  3   0  0  0  0  3     0
MatPtAP                4 1.0 4.4705e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02 
6.8e+01  0  0  0  0  7   0  0  0  0  7 18789
MatPtAPSymbolic        4 1.0 2.9025e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02 
2.8e+01  0  0  0  0  3   0  0  0  0  3     0
MatPtAPNumeric         4 1.0 1.6840e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01 
4.0e+01  0  0  0  0  4   0  0  0  0  4 49879
MatRedundantMat        1 1.0 2.3107e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetLocalMat         4 1.0 6.1631e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          4 1.0 1.4648e-03 2.8 0.00e+00 0.0 5.6e+05 4.5e+02 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         8 1.0 1.4162e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              10 1.0 4.6747e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.4e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 5.0087e+02 1.0 5.50e+08 1.0 6.0e+09 1.7e+01 
9.2e+02 98100100100 91  98100100100 92  8992
PCSetUp                4 1.0 6.8538e+01 1.0 2.66e+08 1.0 1.4e+08 1.0e+01 
2.1e+02 13 48  2  1 21  13 48  2  1 21 31760
PCApply               87 1.0 4.3206e+02 1.0 2.75e+08 1.0 5.9e+09 1.5e+01 
3.5e+02 85 50 98 90 34  85 50 98 90 35  5213
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   597            597      2364880     0
      Vector Scatter    16             15        20656     0
              Matrix    38             38     18267636     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     5              4        19808     0
Star Forest Bipartite Graph    10              8         6720     0
     Discrete System     5              4         3360     0
           Index Set    37             37       186396     0
   IS L to G Mapping     5              4         6020     0
       Krylov Solver     7              7         8608     0
     DMKSP interface     4              4         2560     0
      Preconditioner     7              7         6792     0
              Viewer     8              6         4512     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 7.26223e-05
Average time for zero size MPI_Send(): 1.60854e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg_defaults.txt
-mg_coarse_ksp_type preonly
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0 
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " 
--known-mpi-shared-libraries=0 --known-memcmp-ok  
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a 
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable 
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native 
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " 
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " 
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 " 
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " 
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 " 
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " 
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 
-Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} 
${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include 
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include 
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc 
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib 
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE 
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib 
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------

Re: [petsc-users] KSP changes for successive solver

Reply via email to