Barry,
I attached ksp_view and log_summary for two different setups:
1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
2) Plain MG on 5 levels + custom PC + LU at the coarse level (files
ending in mg7)
The custom PC works on a subset of processes, thus allowing to use two
more levels of MG, for a total of 7.
Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21
iterations.
Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29
iterations.
Thanks for your help!
Michele
On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> The coarse problem for the PCMG (geometric multigrid) is
>
> Mat Object: 8192 MPI processes
> type: mpiaij
> rows=8192, cols=8192
>
> then it tries to solve it with algebraic multigrid on 8192 processes (which
> is completely insane). A lot of the time is spent in setting up the algebraic
> multigrid (not surprisingly).
>
> 8192 is kind of small to parallelize. Please run the same code but with the
> default coarse grid problem instead of PCGAMG and send us the -log_summary
> again
>
> Barry
>
> > On Jul 24, 2015, at 1:35 PM, Michele Rosso <[email protected]> wrote:
> >
> > Hi Mark and Barry,
> >
> > I am sorry for my late reply: it was a busy week!
> > I run a test case for a larger problem with as many levels (i.e. 5) of MG
> > I could and GAMG as PC at the coarse level. I attached the output of info
> > ( after grep for "gmag"), ksp_view and log_summary.
> > The solve takes about 2 seconds on 8192 cores, which is way too much. The
> > number of iterations to convergence is 24.
> > I hope there is a way to speed it up.
> >
> > Thanks,
> > Michele
> >
> >
> > On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>
> >>
> >> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> >> Barry,
> >>
> >> thank you very much for the detailed answer. I tried what you suggested
> >> and it works.
> >> So far I tried on a small system but the final goal is to use it for very
> >> large runs. How does PCGAMG compares to PCMG as far as performances and
> >> scalability are concerned?
> >> Also, could you help me to tune the GAMG part ( my current setup is in the
> >> attached ksp_view.txt file )?
> >>
> >>
> >>
> >> I am going to add this to the document today but you can run with -info.
> >> This is very noisy so you might want to do the next step at run time.
> >> Then grep on GAMG. This will be about 20 lines. Send that to us and we
> >> can go from there.
> >>
> >>
> >> Mark
> >>
> >>
> >>
> >>
> >> I also tried to use superlu_dist for the LU decomposition on
> >> mg_coarse_mg_sub_
> >> -mg_coarse_mg_coarse_sub_pc_type lu
> >> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>
> >> but I got an error:
> >>
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >>
> >>
> >> Thank you,
> >> Michele
> >>
> >>
> >> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>
> >>> > On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> >>> >
> >>> > Barry,
> >>> >
> >>> > thanks for your reply. So if I want it fixed, I will have to use the
> >>> > master branch, correct?
> >>>
> >>>
> >>> Yes, or edit mg.c and remove the offending lines of code (easy enough).
> >>>
> >>> >
> >>> > On a side note, what I am trying to achieve is to be able to use how
> >>> > many levels of MG I want, despite the limitation imposed by the local
> >>> > number of grid nodes.
> >>>
> >>>
> >>> I assume you are talking about with DMDA? There is no generic
> >>> limitation for PETSc's multigrid, it is only with the way the DMDA code
> >>> figures out the interpolation that causes a restriction.
> >>>
> >>>
> >>> > So far I am using a borrowed code that implements a PC that creates a
> >>> > sub communicator and perform MG on it.
> >>> > While reading the documentation I found out that PCMGSetLevels takes in
> >>> > an optional array of communicators. How does this work?
> >>>
> >>>
> >>> It doesn't work. It was an idea that never got pursued.
> >>>
> >>>
> >>> > Can I can simply define my matrix and rhs on the fine grid as I would
> >>> > do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP
> >>> > would take care of it by using the correct communicator for each level?
> >>>
> >>>
> >>> No.
> >>>
> >>> You can use the PCMG geometric multigrid with DMDA for as many levels
> >>> as it works and then use PCGAMG as the coarse grid solver. PCGAMG
> >>> automatically uses fewer processes for the coarse level matrices and
> >>> vectors. You could do this all from the command line without writing
> >>> code.
> >>>
> >>> For example if your code uses a DMDA and calls KSPSetDM() use for
> >>> example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg
> >>> -ksp_view
> >>>
> >>>
> >>>
> >>> Barry
> >>>
> >>>
> >>>
> >>> >
> >>> > Thanks,
> >>> > Michele
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>> >> Michel,
> >>> >>
> >>> >> This is a very annoying feature that has been fixed in master
> >>> >> http://www.mcs.anl.gov/petsc/developers/index.html
> >>> >> I would like to have changed it in maint but Jed would have a
> >>> >> shit-fit :-) since it changes behavior.
> >>> >>
> >>> >> Barry
> >>> >>
> >>> >>
> >>> >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> wrote:
> >>> >> >
> >>> >> > Hi,
> >>> >> >
> >>> >> > I am performing a series of solves inside a loop. The matrix for
> >>> >> > each solve changes but not enough to justify a rebuilt of the PC at
> >>> >> > each solve.
> >>> >> > Therefore I am using KSPSetReusePreconditioner to avoid rebuilding
> >>> >> > unless necessary. The solver is CG + MG with a custom PC at the
> >>> >> > coarse level.
> >>> >> > If KSP is not updated each time, everything works as it is supposed
> >>> >> > to.
> >>> >> > When instead I allow the default PETSc behavior, i.e. updating PC
> >>> >> > every time the matrix changes, the coarse level KSP , initially set
> >>> >> > to PREONLY, is changed into GMRES
> >>> >> > after the first solve. I am not sure where the problem lies (my PC
> >>> >> > or PETSc), so I would like to have your opinion on this.
> >>> >> > I attached the ksp_view for the 2 successive solve and the options
> >>> >> > stack.
> >>> >> >
> >>> >> > Thanks for your help,
> >>> >> > Michel
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > <ksp_view.txt><petsc_options.txt>
> >>> >>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > <info.txt><ksp_view.txt><log_gamg.txt>
>
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 128 x npcol 64
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots TRUE
Use iterative refinement FALSE
Processors in row 128 col partition 64
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern_SameRowPerm
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: dmdarepart
DMDARepart: parent comm size reduction factor = 64
DMDARepart: subcomm_size = 128
KSP Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_coarse_) 128
MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_coarse_) 128
MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Mat Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 16 x npcol 8
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots TRUE
Use iterative refinement FALSE
Processors in row 16 col partition 8
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern_SameRowPerm
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
total: nonzeros=6528, allocated nonzeros=6528
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_levels_1_)
128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_levels_1_) 128
MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx
named p���� with 8192 processors, by mrosso Fri Jul 24 14:11:55 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17
10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 7.565e+01 1.00002 7.565e+01
Objects: 7.230e+02 1.00000 7.230e+02
Flops: 5.717e+07 1.01632 5.707e+07 4.675e+11
Flops/sec: 7.557e+05 1.01634 7.544e+05 6.180e+09
MPI Messages: 9.084e+03 2.00000 8.611e+03 7.054e+07
MPI Message Lengths: 6.835e+06 2.00000 7.524e+02 5.307e+10
MPI Reductions: 1.000e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 7.5651e+01 100.0% 4.6755e+11 100.0% 7.054e+07 100.0%
7.524e+02 100.0% 9.990e+02 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 174 1.0 1.8118e-01 1.9 1.43e+06 1.0 0.0e+00 0.0e+00
1.7e+02 0 2 0 0 17 0 2 0 0 17 64440
VecNorm 94 1.0 6.4223e-02 2.1 7.70e+05 1.0 0.0e+00 0.0e+00
9.4e+01 0 1 0 0 9 0 1 0 0 9 98224
VecScale 787 1.0 1.0910e-03 1.6 1.48e+05 1.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1059301
VecCopy 179 1.0 1.0858e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1240 1.0 1.4889e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 522 1.0 5.7485e-03 1.2 4.28e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 6093896
VecAYPX 695 1.0 5.3260e-03 1.4 2.17e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 3335289
VecAssemblyBegin 4 1.0 1.3018e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.6499e-0428.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2182 1.0 2.2002e-02 2.1 0.00e+00 0.0 6.9e+07 7.6e+02
0.0e+00 0 0 98 99 0 0 0 98 99 0 0
VecScatterEnd 2182 1.0 5.0710e+0074.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 4 0 0 0 0 4 0 0 0 0 0
MatMult 699 1.0 2.3855e+0031.0 2.40e+07 1.0 3.3e+07 1.4e+03
0.0e+00 0 42 46 84 0 0 42 46 84 0 82105
MatMultAdd 348 1.0 5.8677e-03 1.6 8.14e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1136883
MatMultTranspose 352 1.0 5.7197e-03 1.2 8.24e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1179718
MatSolve 87 1.0 5.8730e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 72 0 0 0 0 72 0 0 0 0 0
MatSOR 870 1.0 5.0801e+0055.5 2.27e+07 1.0 3.6e+07 2.2e+02
0.0e+00 4 40 52 15 0 4 40 52 15 0 36617
MatLUFactorSym 1 1.0 9.5398e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.4040e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 19 0 0 0 0 19 0 0 0 0 0
MatResidual 348 1.0 4.1076e-02 1.8 5.70e+06 1.0 1.6e+07 6.8e+02
0.0e+00 0 10 23 21 0 0 10 23 21 0 1133130
MatAssemblyBegin 21 1.0 2.5973e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
2.6e+01 0 0 0 0 3 0 0 0 0 3 0
MatAssemblyEnd 21 1.0 5.4194e-02 2.0 0.00e+00 0.0 4.7e+05 1.4e+02
7.2e+01 0 0 1 0 7 0 0 1 0 7 0
MatGetRowIJ 1 1.0 5.6028e-0558.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2708e-04 8.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 35 1.0 4.3098e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
3.5e+01 0 0 0 0 4 0 0 0 0 4 0
MatPtAP 4 1.0 6.8662e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02
6.8e+01 0 0 1 1 7 0 0 1 1 7 12233
MatPtAPSymbolic 4 1.0 5.3361e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02
2.8e+01 0 0 1 0 3 0 0 1 0 3 0
MatPtAPNumeric 4 1.0 1.6402e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01
4.0e+01 0 0 1 0 4 0 0 1 0 4 51212
MatGetLocalMat 4 1.0 2.6742e-0269.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 4 1.0 1.5030e-03 2.6 0.00e+00 0.0 5.6e+05 4.5e+02
0.0e+00 0 0 1 0 0 0 0 1 0 0 0
MatGetSymTrans 8 1.0 1.9407e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 9 1.0 5.1131e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 7.3904e+01 1.0 5.72e+07 1.0 7.0e+07 7.5e+02
9.1e+02 98100100100 91 98100100100 91 6325
PCSetUp 4 1.0 1.4206e+01 1.0 1.73e+05 1.0 1.3e+06 2.2e+02
2.0e+02 19 0 2 1 20 19 0 2 1 20 100
PCApply 87 1.0 5.9362e+01 1.0 4.79e+07 1.0 6.5e+07 6.8e+02
3.5e+02 78 84 92 83 35 78 84 92 83 35 6596
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 592 592 2160472 0
Vector Scatter 14 13 18512 0
Matrix 38 38 976248 0
Matrix Null Space 1 1 584 0
Distributed Mesh 5 4 19808 0
Star Forest Bipartite Graph 10 8 6720 0
Discrete System 5 4 3360 0
Index Set 32 32 51488 0
IS L to G Mapping 5 4 6020 0
Krylov Solver 7 7 8608 0
DMKSP interface 4 4 2560 0
Preconditioner 7 7 6968 0
Viewer 3 1 752 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.22001e-05
Average time for zero size MPI_Send(): 1.56337e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 "
--known-mpi-shared-libraries=0 --known-memcmp-ok
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 "
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 "
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 "
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn "
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 "
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 "
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0
-Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx
named p���� with 8192 processors, by mrosso Fri Jul 24 14:33:06 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17
10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 3.447e+00 1.00038 3.446e+00
Objects: 1.368e+03 1.28935 1.066e+03
Flops: 7.647e+07 1.02006 7.608e+07 6.232e+11
Flops/sec: 2.219e+07 1.02020 2.207e+07 1.808e+11
MPI Messages: 2.096e+04 3.38688 1.201e+04 9.840e+07
MPI Message Lengths: 9.104e+06 2.00024 7.189e+02 7.074e+10
MPI Reductions: 1.416e+03 1.08506
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 3.1206e+00 90.5% 6.2314e+11 100.0% 9.376e+07 95.3%
7.181e+02 99.9% 1.261e+03 89.0%
1: PCRprt_SetUpMat: 2.5313e-02 0.7% 6.5418e+05 0.0% 6.123e+05 0.6%
5.931e-02 0.0% 4.425e+01 3.1%
2: PCRprt_Apply: 3.0039e-01 8.7% 8.8424e+07 0.0% 4.029e+06 4.1%
6.738e-01 0.1% 9.062e-01 0.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 232 1.0 4.3392e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00
2.3e+02 1 2 0 0 16 1 2 0 0 18 358757
VecNorm 123 1.0 1.6137e-02 2.0 1.01e+06 1.0 0.0e+00 0.0e+00
1.2e+02 0 1 0 0 9 0 1 0 0 10 511516
VecScale 1048 1.0 1.1351e-03 1.5 1.92e+05 1.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1318105
VecCopy 121 1.0 1.2727e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1647 1.0 1.6043e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 696 1.0 7.1111e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 6568316
VecAYPX 927 1.0 4.7853e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 4961251
VecAssemblyBegin 4 1.0 1.2280e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.6284e-0434.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2907 1.0 2.7515e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02
0.0e+00 1 0 94 99 0 1 0 98 99 0 0
VecScatterEnd 2907 1.0 1.5621e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 3 0 0 0 0 4 0 0 0 0 0
MatMult 931 1.0 2.1213e-01 2.2 3.19e+07 1.0 4.3e+07 1.4e+03
0.0e+00 5 42 44 84 0 5 42 46 84 0 1228981
MatMultAdd 464 1.0 4.5297e-03 1.1 1.09e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1963600
MatMultTranspose 468 1.0 7.2241e-03 1.2 1.10e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1241849
MatSOR 1160 1.0 1.4814e-01 1.2 3.03e+07 1.0 4.9e+07 2.2e+02
0.0e+00 4 40 49 15 0 4 40 52 15 0 1673981
MatResidual 464 1.0 5.4564e-02 1.8 7.60e+06 1.0 2.2e+07 6.8e+02
0.0e+00 1 10 22 21 0 1 10 23 21 0 1137383
MatAssemblyBegin 26 1.0 2.9964e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.6e+01 1 0 0 0 3 1 0 0 0 3 0
MatAssemblyEnd 26 1.0 3.6304e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02
8.0e+01 1 0 0 0 6 1 0 1 0 6 0
MatView 50 1.7 5.7154e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+01 2 0 0 0 2 2 0 0 0 2 0
MatPtAP 8 1.0 4.8214e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02
7.6e+01 1 0 1 1 5 2 0 1 1 6 34843
MatPtAPSymbolic 4 1.0 2.7914e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02
2.8e+01 1 0 1 0 2 1 0 1 0 2 0
MatPtAPNumeric 8 1.0 2.1734e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02
4.8e+01 1 0 1 0 3 1 0 1 0 4 77294
MatGetLocalMat 8 1.0 6.5875e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 8 1.0 1.9593e-03 2.6 0.00e+00 0.0 7.5e+05 5.1e+02
0.0e+00 0 0 1 1 0 0 0 1 1 0 0
MatGetSymTrans 8 1.0 1.4830e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 14 1.0 6.4659e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 9.5956e-01 1.0 7.65e+07 1.0 9.8e+07 7.2e+02
1.2e+03 28100100100 86 31100105100 97 649356
PCSetUp 4 1.0 1.7332e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02
2.8e+02 5 0 2 1 20 5 0 2 1 22 13014
PCApply 116 1.0 7.0218e-01 1.0 6.42e+07 1.0 9.1e+07 6.5e+02
4.6e+02 20 84 92 83 33 22 84 97 83 37 743519
--- Event Stage 1: PCRprt_SetUpMat
VecSet 3 1.5 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 10 1.2 4.3280e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
4.1e+00 0 0 0 0 0 8 0 0 0 9 0
MatAssemblyEnd 10 1.2 8.4145e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00
1.6e+01 0 0 0 0 1 30 0 31 13 36 0
MatGetRow 192 0.0 4.4584e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 1.0426e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01
6.0e+00 0 0 0 0 0 23 0 13 32 14 0
MatZeroEntries 1 0.0 6.9141e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 1.0 1.8841e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00
3.4e+01 1 0 1 0 2 74100 87 67 77 35
MatPtAPSymbolic 2 1.0 9.2332e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00
1.4e+01 0 0 0 0 1 35 0 54 40 32 0
MatPtAPNumeric 2 1.0 1.0050e-02 1.1 8.40e+01 2.6 2.0e+05 7.9e+00
2.0e+01 0 0 0 0 1 39100 33 28 45 65
MatGetLocalMat 2 1.0 5.9128e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 1.0 5.0616e-04 3.8 0.00e+00 0.0 2.8e+05 5.3e+00
0.0e+00 0 0 0 0 0 1 0 46 26 0 0
MatGetSymTrans 4 1.0 1.0729e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: PCRprt_Apply
VecScale 348 0.0 2.4199e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 4 0 0 0 13989
VecCopy 116 0.0 6.5565e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1049 3.0 3.4976e-04 6.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 116 0.0 8.7500e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 1 0 0 0 10860
VecScatterBegin 1161 2.5 1.2123e-0240.8 0.00e+00 0.0 4.0e+06 1.6e+01
0.0e+00 0 0 4 0 0 0 0100100 0 0
VecScatterEnd 1161 2.5 3.0874e-0110.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 9 0 0 0 0 98 0 0 0 0 0
MatMult 232 2.0 9.2895e-0368.7 9.67e+04834.0 1.0e+06 1.6e+01
0.0e+00 0 0 1 0 0 1 15 25 25 0 1469
MatMultAdd 116 0.0 3.1829e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 2 0 0 0 5971
MatMultTranspose 233 2.0 1.1170e-0233.1 1.52e+0465.6 9.4e+05 8.0e+00
0.0e+00 0 0 1 0 0 1 4 23 11 0 342
MatSolve 116 0.0 1.6799e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatSOR 232 0.0 1.7143e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02
0.0e+00 0 0 0 0 0 0 77 5 41 0 3947
MatLUFactorSym 1 0.0 4.6492e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 2 0.0 6.0585e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 116 0.0 4.7536e-03 0.0 1.04e+05 0.0 7.1e+04 1.3e+02
0.0e+00 0 0 0 0 0 0 14 2 14 0 2674
MatAssemblyBegin 5 0.0 4.3392e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
9.4e-02 0 0 0 0 0 0 0 0 0 10 0
MatAssemblyEnd 5 0.0 8.8215e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01
2.5e-01 0 0 0 0 0 0 0 0 0 28 0
MatGetRowIJ 1 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 0.0 2.7895e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 0.0 1.5361e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01
3.0e-01 0 0 0 0 0 0 0 0 0 33 221
MatPtAPSymbolic 1 0.0 6.6018e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01
1.1e-01 0 0 0 0 0 0 0 0 0 12 0
MatPtAPNumeric 2 0.0 8.8406e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01
1.9e-01 0 0 0 0 0 0 0 0 0 21 385
MatGetLocalMat 2 0.0 3.2187e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 0.0 1.9097e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSymTrans 2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 6 0.0 1.2183e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
3.1e-02 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 116 0.0 2.7114e-01 0.0 6.87e+05 0.0 2.9e+05 1.3e+02
9.1e-01 0 0 0 0 0 1 96 7 55100 312
PCSetUp 2 0.0 6.5762e-02 0.0 3.78e+03 0.0 4.9e+03 5.3e+01
9.1e-01 0 0 0 0 0 0 1 0 0100 7
PCApply 116 0.0 2.0491e-01 0.0 6.83e+05 0.0 2.8e+05 1.3e+02
0.0e+00 0 0 0 0 0 1 95 7 54 0 411
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 778 787 2743704 0
Vector Scatter 18 21 27616 0
Matrix 38 52 1034136 0
Matrix Null Space 1 1 584 0
Distributed Mesh 7 7 34664 0
Star Forest Bipartite Graph 14 14 11760 0
Discrete System 7 7 5880 0
Index Set 36 38 56544 0
IS L to G Mapping 7 7 8480 0
Krylov Solver 11 10 12240 0
DMKSP interface 4 5 3200 0
Preconditioner 11 10 10056 0
Viewer 8 6 4512 0
--- Event Stage 1: PCRprt_SetUpMat
Vector 6 5 7840 0
Vector Scatter 3 2 2128 0
Matrix 15 12 43656 0
Index Set 10 10 7896 0
--- Event Stage 2: PCRprt_Apply
Vector 364 356 685152 0
Vector Scatter 3 0 0 0
Matrix 11 0 0 0
Distributed Mesh 1 0 0 0
Star Forest Bipartite Graph 2 0 0 0
Discrete System 1 0 0 0
Index Set 10 8 6304 0
IS L to G Mapping 1 0 0 0
Krylov Solver 0 1 1136 0
DMKSP interface 1 0 0 0
Preconditioner 0 1 984 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 5.24044e-05
Average time for zero size MPI_Send(): 2.16223e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_dmdarepart_mg_coarse_pc_type lu
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 "
--known-mpi-shared-libraries=0 --known-memcmp-ok
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 "
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 "
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 "
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn "
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 "
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 "
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0
-Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------