Hi Barry,
I tried what you suggested:
1) 5 levels of MG + defaults at the coarse level (PCREDUNDANT)
2) 5 levels of MG + 2 levels of MG via DMDAREPART + defaults at the
coarse level (PCREDUNDANT)
I attached ksp_view and log_summary for both cases.
The use of PCREDUNDAND halves the time for case 1 ( from ~ 20 sec per
solve to ~ 10 sec per solve ), while it seems not having much effect on
case 2.
Any thoughts on this?
Thanks,
Michele
On Sat, 2015-07-25 at 22:18 -0500, Barry Smith wrote:
> This dmdarepart business, which I am guessing is running PCMG on smaller
> sets of processes with a DMDDA on that smaller set of processes for a coarse
> problem is a fine idea but you should keep in mind the rule of thumb that
> that parallel iterative (and even more direct) solvers don't do well we there
> is roughly 10,000 or fewer degrees of freedom per processor. So you should
> definitely not be using SuperLU_DIST in parallel to solve a problem with 1048
> degrees of freedom on 128 processes, just use PCREDUNDANT and its default
> (sequential) LU. That should be faster.
>
> Barry
>
> > On Jul 25, 2015, at 10:09 PM, Barry Smith <[email protected]> wrote:
> >
> >
> > Don't use
> >
> > -mg_coarse_pc_factor_mat_solver_package superlu_dist
> > -mg_coarse_pc_type lu
> >
> > with 8000+ processes and 1 degree of freedom per process SuperLU_DIST will
> > be terrible. Just leave the defaults for this and send the -log_summary
> >
> > Barry
> >
> >> On Jul 24, 2015, at 2:44 PM, Michele Rosso <[email protected]> wrote:
> >>
> >> Barry,
> >>
> >> I attached ksp_view and log_summary for two different setups:
> >>
> >> 1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
> >> 2) Plain MG on 5 levels + custom PC + LU at the coarse level (files ending
> >> in mg7)
> >>
> >> The custom PC works on a subset of processes, thus allowing to use two
> >> more levels of MG, for a total of 7.
> >> Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21
> >> iterations.
> >> Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29
> >> iterations.
> >>
> >> Thanks for your help!
> >>
> >> Michele
> >>
> >>
> >> On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> >>> The coarse problem for the PCMG (geometric multigrid) is
> >>>
> >>> Mat Object: 8192 MPI processes
> >>> type: mpiaij
> >>> rows=8192, cols=8192
> >>>
> >>> then it tries to solve it with algebraic multigrid on 8192 processes
> >>> (which is completely insane). A lot of the time is spent in setting up
> >>> the algebraic multigrid (not surprisingly).
> >>>
> >>> 8192 is kind of small to parallelize. Please run the same code but with
> >>> the default coarse grid problem instead of PCGAMG and send us the
> >>> -log_summary again
> >>>
> >>> Barry
> >>>
> >>>
> >>>> On Jul 24, 2015, at 1:35 PM, Michele Rosso <[email protected]> wrote:
> >>>>
> >>>> Hi Mark and Barry,
> >>>>
> >>>> I am sorry for my late reply: it was a busy week!
> >>>> I run a test case for a larger problem with as many levels (i.e. 5) of
> >>>> MG I could and GAMG as PC at the coarse level. I attached the output of
> >>>> info ( after grep for "gmag"), ksp_view and log_summary.
> >>>> The solve takes about 2 seconds on 8192 cores, which is way too much.
> >>>> The number of iterations to convergence is 24.
> >>>> I hope there is a way to speed it up.
> >>>>
> >>>> Thanks,
> >>>> Michele
> >>>>
> >>>>
> >>>> On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>>>>
> >>>>>
> >>>>> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> >>>>> Barry,
> >>>>>
> >>>>> thank you very much for the detailed answer. I tried what you
> >>>>> suggested and it works.
> >>>>> So far I tried on a small system but the final goal is to use it for
> >>>>> very large runs. How does PCGAMG compares to PCMG as far as
> >>>>> performances and scalability are concerned?
> >>>>> Also, could you help me to tune the GAMG part ( my current setup is in
> >>>>> the attached ksp_view.txt file )?
> >>>>>
> >>>>>
> >>>>>
> >>>>> I am going to add this to the document today but you can run with
> >>>>> -info. This is very noisy so you might want to do the next step at run
> >>>>> time. Then grep on GAMG. This will be about 20 lines. Send that to
> >>>>> us and we can go from there.
> >>>>>
> >>>>>
> >>>>> Mark
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> I also tried to use superlu_dist for the LU decomposition on
> >>>>> mg_coarse_mg_sub_
> >>>>> -mg_coarse_mg_coarse_sub_pc_type lu
> >>>>> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>>>>
> >>>>> but I got an error:
> >>>>>
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>>
> >>>>>
> >>>>> Thank you,
> >>>>> Michele
> >>>>>
> >>>>>
> >>>>> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>>>>
> >>>>>>> On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Barry,
> >>>>>>>
> >>>>>>> thanks for your reply. So if I want it fixed, I will have to use the
> >>>>>>> master branch, correct?
> >>>>>>
> >>>>>>
> >>>>>> Yes, or edit mg.c and remove the offending lines of code (easy
> >>>>>> enough).
> >>>>>>
> >>>>>>>
> >>>>>>> On a side note, what I am trying to achieve is to be able to use how
> >>>>>>> many levels of MG I want, despite the limitation imposed by the local
> >>>>>>> number of grid nodes.
> >>>>>>
> >>>>>>
> >>>>>> I assume you are talking about with DMDA? There is no generic
> >>>>>> limitation for PETSc's multigrid, it is only with the way the DMDA
> >>>>>> code figures out the interpolation that causes a restriction.
> >>>>>>
> >>>>>>
> >>>>>>> So far I am using a borrowed code that implements a PC that creates a
> >>>>>>> sub communicator and perform MG on it.
> >>>>>>> While reading the documentation I found out that PCMGSetLevels takes
> >>>>>>> in an optional array of communicators. How does this work?
> >>>>>>
> >>>>>>
> >>>>>> It doesn't work. It was an idea that never got pursued.
> >>>>>>
> >>>>>>
> >>>>>>> Can I can simply define my matrix and rhs on the fine grid as I would
> >>>>>>> do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP
> >>>>>>> would take care of it by using the correct communicator for each
> >>>>>>> level?
> >>>>>>
> >>>>>>
> >>>>>> No.
> >>>>>>
> >>>>>> You can use the PCMG geometric multigrid with DMDA for as many
> >>>>>> levels as it works and then use PCGAMG as the coarse grid solver.
> >>>>>> PCGAMG automatically uses fewer processes for the coarse level
> >>>>>> matrices and vectors. You could do this all from the command line
> >>>>>> without writing code.
> >>>>>>
> >>>>>> For example if your code uses a DMDA and calls KSPSetDM() use for
> >>>>>> example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type
> >>>>>> gamg -ksp_view
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Barry
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Michele
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>>>>>>> Michel,
> >>>>>>>>
> >>>>>>>> This is a very annoying feature that has been fixed in master
> >>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
> >>>>>>>> I would like to have changed it in maint but Jed would have a
> >>>>>>>> shit-fit :-) since it changes behavior.
> >>>>>>>>
> >>>>>>>> Barry
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I am performing a series of solves inside a loop. The matrix for
> >>>>>>>>> each solve changes but not enough to justify a rebuilt of the PC at
> >>>>>>>>> each solve.
> >>>>>>>>> Therefore I am using KSPSetReusePreconditioner to avoid rebuilding
> >>>>>>>>> unless necessary. The solver is CG + MG with a custom PC at the
> >>>>>>>>> coarse level.
> >>>>>>>>> If KSP is not updated each time, everything works as it is supposed
> >>>>>>>>> to.
> >>>>>>>>> When instead I allow the default PETSc behavior, i.e. updating PC
> >>>>>>>>> every time the matrix changes, the coarse level KSP , initially set
> >>>>>>>>> to PREONLY, is changed into GMRES
> >>>>>>>>> after the first solve. I am not sure where the problem lies (my PC
> >>>>>>>>> or PETSc), so I would like to have your opinion on this.
> >>>>>>>>> I attached the ksp_view for the 2 successive solve and the options
> >>>>>>>>> stack.
> >>>>>>>>>
> >>>>>>>>> Thanks for your help,
> >>>>>>>>> Michel
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <ksp_view.txt><petsc_options.txt>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> <info.txt><ksp_view.txt><log_gamg.txt>
> >>>
> >>>
> >>>
> >>
> >> <ksp_view_mg5.txt><ksp_view_mg7.txt><log_mg5.txt><log_mg7.txt>
> >
>
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: dmdarepart
DMDARepart: parent comm size reduction factor = 64
DMDARepart: subcomm_size = 128
KSP Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_coarse_) 128
MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_coarse_) 128
MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 128 PCs follows
KSP Object: (mg_coarse_dmdarepart_mg_coarse_redundant_)
1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_coarse_redundant_)
1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 9.76317
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=1024, cols=1024
package used to perform factorization: petsc
total: nonzeros=63734, allocated nonzeros=63734
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=1024, cols=1024
total: nonzeros=6528, allocated nonzeros=6528
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
total: nonzeros=6528, allocated nonzeros=6528
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_levels_1_)
128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_levels_1_) 128
MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
There are 3 unused database options. They are:
Option left: name:-finput value: input.txt
Option left: name:-mg_coarse_dmdarepart_ksp_constant_null_space (no value)
Option left: name:-pc_dmdarepart_monitor (no value)
Application 25736695 resources: utime ~29149s, stime ~48455s, Rss ~64608,
inblocks ~6174814, outblocks ~18104253
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 8192 PCs follows
KSP Object: (mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 23.9038
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=8192, cols=8192
package used to perform factorization: petsc
total: nonzeros=1.30955e+06, allocated nonzeros=1.30955e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx
named p���� with 8192 processors, by mrosso Tue Jul 28 16:20:21 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17
10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 7.498e+00 1.01676 7.375e+00
Objects: 1.385e+03 1.30537 1.066e+03
Flops: 9.815e+07 1.30922 7.642e+07 6.260e+11
Flops/sec: 1.331e+07 1.30928 1.036e+07 8.488e+10
MPI Messages: 3.595e+04 5.80931 1.225e+04 1.003e+08
MPI Message Lengths: 9.104e+06 2.00024 7.063e+02 7.086e+10
MPI Reductions: 1.427e+03 1.09349
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 7.0526e+00 95.6% 6.2314e+11 99.5% 9.376e+07 93.5%
7.044e+02 99.7% 1.260e+03 88.3%
1: PCRprt_SetUpMat: 2.7279e-02 0.4% 6.5418e+05 0.0% 6.123e+05 0.6%
5.817e-02 0.0% 4.425e+01 3.1%
2: PCRprt_Apply: 2.9504e-01 4.0% 2.8632e+09 0.5% 5.947e+06 5.9%
1.880e+00 0.3% 1.156e+00 0.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 232 1.0 3.9837e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00
2.3e+02 0 2 0 0 16 0 2 0 0 18 390775
VecNorm 123 1.0 1.7174e-02 1.9 1.01e+06 1.0 0.0e+00 0.0e+00
1.2e+02 0 1 0 0 9 0 1 0 0 10 480626
VecScale 1048 1.0 1.5078e-0218.8 1.92e+05 1.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 99231
VecCopy 121 1.0 1.2872e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1647 1.0 1.6298e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 696 1.0 6.7093e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 6961607
VecAYPX 927 1.0 4.6690e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 5084883
VecAssemblyBegin 4 1.0 1.3000e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.4210e-0429.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2907 1.0 2.7453e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02
0.0e+00 0 0 92 99 0 0 0 98 99 0 0
VecScatterEnd 2907 1.0 1.8748e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatMult 931 1.0 2.3768e-01 2.6 3.19e+07 1.0 4.3e+07 1.4e+03
0.0e+00 2 42 43 84 0 2 42 46 84 0 1096892
MatMultAdd 464 1.0 4.9362e-03 1.2 1.09e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1801895
MatMultTranspose 468 1.0 1.6587e-02 2.6 1.10e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 540858
MatSOR 1160 1.0 1.8799e-01 1.6 3.03e+07 1.0 4.9e+07 2.2e+02
0.0e+00 2 40 48 15 0 2 40 52 15 0 1319153
MatResidual 464 1.0 7.4724e-02 2.5 7.60e+06 1.0 2.2e+07 6.8e+02
0.0e+00 1 10 22 21 0 1 10 23 21 0 830522
MatAssemblyBegin 26 1.0 3.0778e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.6e+01 0 0 0 0 3 0 0 0 0 3 0
MatAssemblyEnd 26 1.0 3.6265e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02
8.0e+01 0 0 0 0 6 0 0 1 0 6 0
MatView 55 1.8 3.3602e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+01 4 0 0 0 2 5 0 0 0 2 0
MatPtAP 8 1.0 4.7572e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02
7.6e+01 1 0 1 1 5 1 0 1 1 6 35313
MatPtAPSymbolic 4 1.0 2.7729e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02
2.8e+01 0 0 1 0 2 0 0 1 0 2 0
MatPtAPNumeric 8 1.0 2.1160e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02
4.8e+01 0 0 1 0 3 0 0 1 0 4 79392
MatGetLocalMat 8 1.0 6.5184e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 8 1.0 1.9581e-03 2.4 0.00e+00 0.0 7.5e+05 5.1e+02
0.0e+00 0 0 1 1 0 0 0 1 1 0 0
MatGetSymTrans 8 1.0 1.2302e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 14 1.0 6.8645e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 1.0214e+00 1.0 9.81e+07 1.3 1.0e+08 7.1e+02
1.2e+03 14100100100 86 14100107100 97 612784
PCSetUp 4 1.0 1.7279e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02
2.8e+02 2 0 2 1 20 2 0 2 1 22 13054
PCApply 116 1.0 7.6665e-01 1.0 8.58e+07 1.4 9.2e+07 6.4e+02
4.7e+02 10 84 92 83 33 11 84 99 83 37 684611
--- Event Stage 1: PCRprt_SetUpMat
VecSet 3 1.5 1.3113e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 10 1.2 5.4898e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.1e+00 0 0 0 0 0 11 0 0 0 9 0
MatAssemblyEnd 10 1.2 9.6285e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00
1.6e+01 0 0 0 0 1 33 0 31 13 36 0
MatGetRow 192 0.0 4.2677e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 1.0698e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01
6.0e+00 0 0 0 0 0 22 0 13 32 14 0
MatZeroEntries 1 0.0 3.0994e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 1.0 2.0634e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00
3.4e+01 0 0 1 0 2 75100 87 67 77 32
MatPtAPSymbolic 2 1.0 8.6851e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00
1.4e+01 0 0 0 0 1 31 0 54 40 32 0
MatPtAPNumeric 2 1.0 1.2376e-02 1.0 8.40e+01 2.6 2.0e+05 7.9e+00
2.0e+01 0 0 0 0 1 44100 33 28 45 53
MatGetLocalMat 2 1.0 6.1274e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 1.0 4.8995e-04 3.7 0.00e+00 0.0 2.8e+05 5.3e+00
0.0e+00 0 0 0 0 0 1 0 46 26 0 0
MatGetSymTrans 4 1.0 2.0742e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: PCRprt_Apply
VecScale 348 0.0 2.3985e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 14114
VecSet 1167 3.4 5.2118e-04 9.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 116 0.0 7.3195e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 12983
VecScatterBegin 1393 3.0 3.2119e-02112.6 0.00e+00 0.0 5.9e+06 3.2e+01
0.0e+00 0 0 6 0 0 0 0 99 99 0 0
VecScatterEnd 1393 3.0 3.2946e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 4 0 0 0 0 99 0 0 0 0 0
MatMult 232 2.0 4.5841e-02336.1 9.67e+04834.0 1.0e+06 1.6e+01
0.0e+00 0 0 1 0 0 1 0 17 9 0 298
MatMultAdd 116 0.0 2.9373e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 6470
MatMultTranspose 233 2.0 3.0067e-0290.1 1.52e+0465.6 9.4e+05 8.0e+00
0.0e+00 0 0 1 0 0 1 0 16 4 0 127
MatSolve 116 0.0 2.3469e-02 0.0 1.47e+07 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 66 0 0 0 79995
MatSOR 232 0.0 4.8394e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02
0.0e+00 0 0 0 0 0 0 2 4 14 0 1398
MatLUFactorSym 1 0.0 2.5880e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 2 0.0 1.0722e-02 0.0 7.01e+06 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 31 0 0 0 83692
MatCopy 1 0.0 3.0041e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 0.0 7.4148e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 116 0.0 4.5305e-02 0.0 1.04e+05 0.0 7.1e+04 1.3e+02
0.0e+00 0 0 0 0 0 0 0 1 5 0 281
MatAssemblyBegin 6 0.0 4.5967e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
9.4e-02 0 0 0 0 0 0 0 0 0 8 0
MatAssemblyEnd 6 0.0 9.6583e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01
2.5e-01 0 0 0 0 0 0 0 0 0 22 0
MatGetRowIJ 1 0.0 9.5844e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 0.0 2.3339e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
9.4e-02 0 0 0 0 0 0 0 0 0 8 0
MatGetOrdering 1 0.0 8.8000e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 0.0 1.5650e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01
3.0e-01 0 0 0 0 0 0 0 0 0 26 217
MatPtAPSymbolic 1 0.0 6.5613e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01
1.1e-01 0 0 0 0 0 0 0 0 0 9 0
MatPtAPNumeric 2 0.0 9.1791e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01
1.9e-01 0 0 0 0 0 0 0 0 0 16 370
MatRedundantMat 2 0.0 2.4142e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
9.4e-02 0 0 0 0 0 0 0 0 0 8 0
MatGetLocalMat 2 0.0 3.7909e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 0.0 2.0623e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSymTrans 2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 8 0.0 1.2207e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
3.1e-02 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 116 0.0 2.6315e-01 0.0 2.24e+07 0.0 2.2e+06 7.2e+01
1.2e+00 0 0 2 0 0 1100 37 84100 10866
PCSetUp 2 0.0 4.0980e-02 0.0 7.01e+06 0.0 3.8e+04 5.0e+01
1.2e+00 0 0 0 0 0 0 31 1 1100 21909
PCApply 116 0.0 2.2205e-01 0.0 1.54e+07 0.0 2.2e+06 7.2e+01
0.0e+00 0 0 2 0 0 1 69 36 83 0 8834
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 778 791 2774488 0
Vector Scatter 18 23 29872 0
Matrix 38 52 1988092 0
Matrix Null Space 1 1 584 0
Distributed Mesh 7 7 34664 0
Star Forest Bipartite Graph 14 14 11760 0
Discrete System 7 7 5880 0
Index Set 36 41 67040 0
IS L to G Mapping 7 7 8480 0
Krylov Solver 11 11 13376 0
DMKSP interface 4 5 3200 0
Preconditioner 11 11 10864 0
Viewer 13 11 8272 0
--- Event Stage 1: PCRprt_SetUpMat
Vector 6 5 7840 0
Vector Scatter 3 2 2128 0
Matrix 15 12 43656 0
Index Set 10 10 7896 0
--- Event Stage 2: PCRprt_Apply
Vector 369 357 686800 0
Vector Scatter 5 0 0 0
Matrix 11 0 0 0
Distributed Mesh 1 0 0 0
Star Forest Bipartite Graph 2 0 0 0
Discrete System 1 0 0 0
Index Set 15 10 16000 0
IS L to G Mapping 1 0 0 0
DMKSP interface 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 5.19753e-05
Average time for zero size MPI_Send(): 2.16846e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_type redundant
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 "
--known-mpi-shared-libraries=0 --known-memcmp-ok
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 "
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 "
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 "
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn "
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 "
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 "
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0
-Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx
named p���� with 8192 processors, by mrosso Tue Jul 28 15:28:29 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17
10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 5.098e+02 1.00007 5.098e+02
Objects: 7.400e+02 1.00000 7.400e+02
Flops: 5.499e+08 1.00167 5.498e+08 4.504e+12
Flops/sec: 1.079e+06 1.00174 1.078e+06 8.834e+09
MPI Messages: 7.381e+05 1.00619 7.376e+05 6.043e+09
MPI Message Lengths: 1.267e+07 1.36946 1.669e+01 1.008e+11
MPI Reductions: 1.009e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 5.0982e+02 100.0% 4.5037e+12 100.0% 6.043e+09 100.0%
1.669e+01 100.0% 1.008e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 174 1.0 1.5646e-01 1.5 1.43e+06 1.0 0.0e+00 0.0e+00
1.7e+02 0 0 0 0 17 0 0 0 0 17 74621
VecNorm 94 1.0 5.5188e-02 2.5 7.70e+05 1.0 0.0e+00 0.0e+00
9.4e+01 0 0 0 0 9 0 0 0 0 9 114305
VecScale 787 1.0 1.4017e-03 1.9 1.48e+05 1.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 824521
VecCopy 92 1.0 1.0190e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1329 1.0 3.7305e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 522 1.0 5.5845e-03 1.3 4.28e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 6272892
VecAYPX 695 1.0 3.0615e-02 9.2 2.17e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 580237
VecAssemblyBegin 4 1.0 1.3102e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.8620e-0432.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2356 1.0 1.6390e+01 4.7 0.00e+00 0.0 5.9e+09 1.7e+01
0.0e+00 2 0 98 99 0 2 0 98 99 0 0
VecScatterEnd 2356 1.0 4.1647e+02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 69 0 0 0 0 69 0 0 0 0 0
MatMult 699 1.0 5.2895e+01643.0 2.40e+07 1.0 3.3e+07 1.4e+03
0.0e+00 1 4 1 44 0 1 4 1 44 0 3703
MatMultAdd 348 1.0 5.8870e-03 1.5 8.14e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1133153
MatMultTranspose 352 1.0 6.3620e-03 1.3 8.24e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1060614
MatSolve 87 1.0 3.9927e-01 1.3 2.27e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 41 0 0 0 0 41 0 0 0 4660544
MatSOR 870 1.0 1.1567e+02523.3 2.27e+07 1.0 3.6e+07 2.2e+02
0.0e+00 7 4 1 8 0 7 4 1 8 0 1608
MatLUFactorSym 1 1.0 5.9881e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 5.9217e-01 1.1 2.66e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 48 0 0 0 0 48 0 0 0 3673552
MatConvert 1 1.0 1.0331e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 348 1.0 3.3047e-0113.8 5.70e+06 1.0 1.6e+07 6.8e+02
0.0e+00 0 1 0 11 0 0 1 0 11 0 140845
MatAssemblyBegin 22 1.0 2.4983e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.6e+01 0 0 0 0 3 0 0 0 0 3 0
MatAssemblyEnd 22 1.0 3.3268e-02 1.1 0.00e+00 0.0 4.7e+05 1.4e+02
7.2e+01 0 0 0 0 7 0 0 0 0 7 0
MatGetRowIJ 1 1.0 5.8293e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 1 1.0 2.2252e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 9.7980e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 40 1.3 3.3014e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+01 0 0 0 0 3 0 0 0 0 3 0
MatPtAP 4 1.0 4.4705e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02
6.8e+01 0 0 0 0 7 0 0 0 0 7 18789
MatPtAPSymbolic 4 1.0 2.9025e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02
2.8e+01 0 0 0 0 3 0 0 0 0 3 0
MatPtAPNumeric 4 1.0 1.6840e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01
4.0e+01 0 0 0 0 4 0 0 0 0 4 49879
MatRedundantMat 1 1.0 2.3107e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetLocalMat 4 1.0 6.1631e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 4 1.0 1.4648e-03 2.8 0.00e+00 0.0 5.6e+05 4.5e+02
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSymTrans 8 1.0 1.4162e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 10 1.0 4.6747e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 5.0087e+02 1.0 5.50e+08 1.0 6.0e+09 1.7e+01
9.2e+02 98100100100 91 98100100100 92 8992
PCSetUp 4 1.0 6.8538e+01 1.0 2.66e+08 1.0 1.4e+08 1.0e+01
2.1e+02 13 48 2 1 21 13 48 2 1 21 31760
PCApply 87 1.0 4.3206e+02 1.0 2.75e+08 1.0 5.9e+09 1.5e+01
3.5e+02 85 50 98 90 34 85 50 98 90 35 5213
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 597 597 2364880 0
Vector Scatter 16 15 20656 0
Matrix 38 38 18267636 0
Matrix Null Space 1 1 584 0
Distributed Mesh 5 4 19808 0
Star Forest Bipartite Graph 10 8 6720 0
Discrete System 5 4 3360 0
Index Set 37 37 186396 0
IS L to G Mapping 5 4 6020 0
Krylov Solver 7 7 8608 0
DMKSP interface 4 4 2560 0
Preconditioner 7 7 6792 0
Viewer 8 6 4512 0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 7.26223e-05
Average time for zero size MPI_Send(): 1.60854e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg_defaults.txt
-mg_coarse_ksp_type preonly
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 "
--known-mpi-shared-libraries=0 --known-memcmp-ok
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 "
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 "
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 "
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn "
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 "
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 "
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0
-Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------