Hi Mark and Barry,
I am sorry for my late reply: it was a busy week!
I run a test case for a larger problem with as many levels (i.e. 5) of
MG I could and GAMG as PC at the coarse level. I attached the output of
info ( after grep for "gmag"), ksp_view and log_summary.
The solve takes about 2 seconds on 8192 cores, which is way too much.
The number of iterations to convergence is 24.
I hope there is a way to speed it up.
Thanks,
Michele
On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
>
>
>
>
> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
>
> Barry,
>
> thank you very much for the detailed answer. I tried what you
> suggested and it works.
> So far I tried on a small system but the final goal is to use
> it for very large runs. How does PCGAMG compares to PCMG as
> far as performances and scalability are concerned?
> Also, could you help me to tune the GAMG part ( my current
> setup is in the attached ksp_view.txt file )?
>
>
>
> I am going to add this to the document today but you can run with
> -info. This is very noisy so you might want to do the next step at
> run time. Then grep on GAMG. This will be about 20 lines. Send that
> to us and we can go from there.
>
>
> Mark
>
>
>
>
> I also tried to use superlu_dist for the LU decomposition on
> mg_coarse_mg_sub_
> -mg_coarse_mg_coarse_sub_pc_type lu
> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package
> superlu_dist
>
> but I got an error:
>
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> ****** Error in MC64A/AD. INFO(1) = -2
> symbfact() error returns 0
> symbfact() error returns 0
> symbfact() error returns 0
> symbfact() error returns 0
> symbfact() error returns 0
> symbfact() error returns 0
> symbfact() error returns 0
>
>
> Thank you,
> Michele
>
>
> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
>
> >
> > > On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> > >
> > > Barry,
> > >
> > > thanks for your reply. So if I want it fixed, I will have to use
> the master branch, correct?
> >
> > Yes, or edit mg.c and remove the offending lines of code (easy
> enough).
> > >
> > > On a side note, what I am trying to achieve is to be able to use
> how many levels of MG I want, despite the limitation imposed by the local
> number of grid nodes.
> >
> > I assume you are talking about with DMDA? There is no generic
> limitation for PETSc's multigrid, it is only with the way the DMDA code
> figures out the interpolation that causes a restriction.
> >
> > > So far I am using a borrowed code that implements a PC that
> creates a sub communicator and perform MG on it.
> > > While reading the documentation I found out that PCMGSetLevels
> takes in an optional array of communicators. How does this work?
> >
> > It doesn't work. It was an idea that never got pursued.
> >
> > > Can I can simply define my matrix and rhs on the fine grid as I
> would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP
> would take care of it by using the correct communicator for each level?
> >
> > No.
> >
> > You can use the PCMG geometric multigrid with DMDA for as many
> levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG
> automatically uses fewer processes for the coarse level matrices and vectors.
> You could do this all from the command line without writing code.
> >
> > For example if your code uses a DMDA and calls KSPSetDM() use
> for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg
> -ksp_view
> >
> >
> >
> > Barry
> >
> >
> > >
> > > Thanks,
> > > Michele
> > >
> > >
> > >
> > >
> > > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> > >> Michel,
> > >>
> > >> This is a very annoying feature that has been fixed in
> master
> > >> http://www.mcs.anl.gov/petsc/developers/index.html
> > >> I would like to have changed it in maint but Jed would have a
> shit-fit :-) since it changes behavior.
> > >>
> > >> Barry
> > >>
> > >>
> > >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]>
> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > I am performing a series of solves inside a loop. The matrix
> for each solve changes but not enough to justify a rebuilt of the PC at each
> solve.
> > >> > Therefore I am using KSPSetReusePreconditioner to avoid
> rebuilding unless necessary. The solver is CG + MG with a custom PC at the
> coarse level.
> > >> > If KSP is not updated each time, everything works as it is
> supposed to.
> > >> > When instead I allow the default PETSc behavior, i.e.
> updating PC every time the matrix changes, the coarse level KSP , initially
> set to PREONLY, is changed into GMRES
> > >> > after the first solve. I am not sure where the problem lies
> (my PC or PETSc), so I would like to have your opinion on this.
> > >> > I attached the ksp_view for the 2 successive solve and the
> options stack.
> > >> >
> > >> > Thanks for your help,
> > >> > Michel
> > >> >
> > >> >
> > >> >
> > >> > <ksp_view.txt><petsc_options.txt>
> > >>
> > >>
> > >>
> > >
> >
>
>
>
>
>
>
[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row
(ave)=7, np=8192
[0] PCGAMGFilterGraph(): 100% nnz after filtering, with threshold 0, 4
nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00
min=1.105137e-01 PC=jacobi
[135] MatAssemblyEnd_SeqA[0] PCGAMGCreateLevel_GAMG(): Number of equations
(loc) 0 with simple aggregation
[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row (ave)=27, 16 active pes
[0] PCGAMGFilterGraph(): 100% nnz after filtering, with threshold 0,
20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00
min=1.226917e-03 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple
aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[94] PetscCommDuplicate(): Using[0] PCGAMGFilterGraph(): 100% nnz after
filtering, with threshold 0, 55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00
min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple
aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
type: gamg
GAMG specific options
[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row
(ave)=7, np=8192
[125] MatAssemb[0] PCGAMGFilterGraph(): 100% nnz after filtering, with
threshold 0, 4 nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00
min=1.105137e-01 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple
aggregation
[268] MatAssemblyEn[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row
(ave)=27, 16 active pes
[0] PCGAMGFilterGraph(): 100% nnz after filtering, with threshold 0,
20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00
min=1.226917e-03 PC=jacobi
[233] Pe[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple
aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[0] PCGAMGFilterGraph(): 100% nnz after filtering, with threshold 0,
55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00
min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple
aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
type: gamg
GAMG specific options
type: gamg
GAMG specific options
type: gamg
GAMG specific options
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: gamg
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
GAMG specific options
Threshold for dropping small values from graph 0
AGG specific options
Symmetric graph false
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_mg_coarse_) 8192 MPI processes
type: bjacobi
block Jacobi: number of blocks = 8192
Local solve is same for all blocks, in the following KSP and PC
objects:
KSP Object: (mg_coarse_mg_coarse_sub_) 1 MPI
processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_mg_coarse_sub_) 1 MPI
processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 1
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=6, cols=6
package used to perform factorization: petsc
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=6, cols=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=6, cols=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 2 nodes, limit used
is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_mg_levels_1_) 8192 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0995252, max = 1.09478
Chebyshev: eigenvalues estimated using gmres with translations [0
0.1; 0 1.1]
KSP Object: (mg_coarse_mg_levels_1_esteig_)
8192 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=103, cols=103
total: nonzeros=5717, allocated nonzeros=5717
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_mg_levels_2_) 8192 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.15748, max = 1.73228
Chebyshev: eigenvalues estimated using gmres with translations [0
0.1; 0 1.1]
KSP Object: (mg_coarse_mg_levels_2_esteig_)
8192 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=1005, cols=1005
total: nonzeros=27137, allocated nonzeros=27137
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_coarse_mg_levels_3_) 8192 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.191092, max = 2.10202
Chebyshev: eigenvalues estimated using gmres with translations [0
0.1; 0 1.1]
KSP Object: (mg_coarse_mg_levels_3_esteig_)
8192 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx
named p���� with 8192 processors, by mrosso Fri Jul 24 13:09:23 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17
10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 1.130e+02 1.00023 1.130e+02
Objects: 1.587e+03 1.00253 1.583e+03
Flops: 8.042e+07 1.28093 6.371e+07 5.219e+11
Flops/sec: 7.115e+05 1.28065 5.639e+05 4.619e+09
MPI Messages: 1.267e+05 13.76755 1.879e+04 1.539e+08
MPI Message Lengths: 8.176e+06 2.12933 3.881e+02 5.972e+10
MPI Reductions: 2.493e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.1300e+02 100.0% 5.2195e+11 100.0% 1.539e+08 100.0%
3.881e+02 100.0% 2.492e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 120 1.0 2.4560e-01 1.6 7.24e+04329.0 0.0e+00 0.0e+00
1.2e+02 0 0 0 0 5 0 0 0 0 5 9
VecTDot 194 1.0 2.6155e-01 1.3 1.59e+06 1.0 0.0e+00 0.0e+00
1.9e+02 0 2 0 0 8 0 2 0 0 8 49771
VecNorm 236 1.0 4.8733e-01 1.4 8.67e+05 1.0 0.0e+00 0.0e+00
2.4e+02 0 1 0 0 9 0 1 0 0 9 14323
VecScale 1009 1.0 1.1008e-03 1.5 1.63e+05 1.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1156928
VecCopy 405 1.0 3.2604e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2648 1.0 9.1252e-03 6.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 594 1.0 1.5715e-02 3.8 4.77e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 2485378
VecAYPX 3103 1.0 8.7631e-03 2.4 2.58e+06 1.1 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 2263359
VecAXPBYCZ 1164 1.0 3.5439e-0313.2 3.22e+05166.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 5091
VecMAXPY 132 1.0 3.7217e-04 6.8 8.63e+04166.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 12994
VecAssemblyBegin 36 1.0 3.7324e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
9.0e+01 0 0 0 0 4 0 0 0 0 4 0
VecAssemblyEnd 36 1.0 1.5278e-0364.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 66 1.0 2.8777e-0426.8 3.65e+03166.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 711
VecScatterBegin 4914 1.0 3.9190e-0128.1 0.00e+00 0.0 1.1e+08 5.3e+02
0.0e+00 0 0 72 99 0 0 0 72 99 0 0
VecScatterEnd 4914 1.0 8.6240e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 6 0 0 0 0 6 0 0 0 0 0
VecSetRandom 6 1.0 1.0859e-022168.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 132 1.0 3.4323e-01 1.5 2.19e+04166.0 0.0e+00 0.0e+00
1.3e+02 0 0 0 0 5 0 0 0 0 5 4
MatMult 2645 1.0 3.6311e+0032.2 3.45e+07 1.3 6.5e+07 7.6e+02
0.0e+00 1 42 42 83 0 1 42 42 83 0 60126
MatMultAdd 679 1.0 4.8583e+0031.6 1.08e+06 1.2 1.6e+06 1.4e+01
0.0e+00 4 1 1 0 0 4 1 1 0 0 1532
MatMultTranspose 683 1.0 4.2303e+00667.0 1.09e+06 1.2 1.6e+06 1.4e+01
0.0e+00 0 1 1 0 0 0 1 1 0 0 1778
MatSolve 97 0.0 2.9469e-04 0.0 6.40e+03 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 22
MatSOR 2782 1.0 3.6662e+0035.7 3.29e+07 1.3 4.1e+07 2.2e+02
0.0e+00 1 40 26 15 0 1 40 26 15 0 56576
MatLUFactorSym 2 1.0 1.5128e-02358.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 2 1.0 6.1989e-0513.0 2.58e+02 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 4
MatConvert 6 1.0 9.1314e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 18 1.0 1.3737e-02219.9 3.16e+041579.8 9.3e+04 8.6e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 36
MatResidual 679 1.0 3.0610e-0110.0 7.51e+06 1.2 2.3e+07 5.5e+02
0.0e+00 0 10 15 21 0 0 10 15 21 0 169592
MatAssemblyBegin 119 1.0 1.7048e+01 2.8 0.00e+00 0.0 4.3e+04 7.2e+00
1.3e+02 10 0 0 0 5 10 0 0 0 5 0
MatAssemblyEnd 119 1.0 4.1777e+01 1.4 0.00e+00 0.0 1.7e+06 4.1e+01
3.8e+02 31 0 1 0 15 31 0 1 0 15 0
MatGetRow 1328166.0 3.9291e-0454.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 0.0 1.5020e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrix 12 1.0 2.5183e+01 1.0 0.00e+00 0.0 7.6e+04 1.6e+01
1.9e+02 22 0 0 0 8 22 0 0 0 8 0
MatGetOrdering 2 0.0 9.5510e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 6 1.0 1.5275e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00
2.4e+02 1 0 25 0 10 1 0 25 0 10 0
MatView 60 1.2 1.9943e+0026.6 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+01 1 0 0 0 2 1 0 0 0 2 0
MatAXPY 6 1.0 4.5854e+00326.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 6 1.0 1.7226e+01 1.3 2.80e+041749.0 4.8e+05 5.9e+00
9.6e+01 12 0 0 0 4 12 0 0 0 4 0
MatMatMultSym 6 1.0 1.3093e+01 1.0 0.00e+00 0.0 3.9e+05 5.3e+00
8.4e+01 12 0 0 0 3 12 0 0 0 3 0
MatMatMultNum 6 1.0 4.1413e+0087.5 2.80e+041749.0 9.3e+04 8.6e+00
1.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 14 1.0 2.3092e+01 1.2 3.94e+05 2.0 1.7e+06 2.4e+02
1.8e+02 20 0 1 1 7 20 0 1 1 7 73
MatPtAPSymbolic 10 1.0 1.6246e+01 1.7 0.00e+00 0.0 1.0e+06 2.6e+02
7.0e+01 12 0 1 0 3 12 0 1 0 3 0
MatPtAPNumeric 14 1.0 9.1005e+00 1.3 3.94e+05 2.0 7.2e+05 2.1e+02
1.1e+02 8 0 0 0 4 8 0 0 0 4 185
MatTrnMatMult 2 1.0 5.6152e+00 1.0 3.64e+02 2.9 1.1e+06 1.2e+01
3.8e+01 5 0 1 0 2 5 0 1 0 2 0
MatTrnMatMultSym 2 1.0 5.5943e+00 1.0 0.00e+00 0.0 1.0e+06 7.6e+00
3.4e+01 5 0 1 0 1 5 0 1 0 1 0
MatTrnMatMultNum 2 1.0 2.8538e-02 4.6 3.64e+02 2.9 9.3e+04 5.4e+01
4.0e+00 0 0 0 0 0 0 0 0 0 0 96
MatGetLocalMat 30 1.0 4.4808e+001435.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 26 1.0 4.5314e+00292.0 0.00e+00 0.0 1.4e+06 2.8e+02
0.0e+00 0 0 1 1 0 0 0 1 1 0 0
MatGetSymTrans 20 1.0 3.3071e-0347.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 6 1.0 7.4315e-0443.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFBcastBegin 252 1.0 1.3895e+0014.8 0.00e+00 0.0 3.8e+07 4.0e+00
0.0e+00 1 0 25 0 0 1 0 25 0 0 0
SFBcastEnd 252 1.0 7.1653e-0218.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 120 1.0 2.4587e-01 1.6 1.45e+05220.3 0.0e+00 0.0e+00
1.2e+02 0 0 0 0 5 0 0 0 0 5 26
KSPSetUp 36 1.0 2.2627e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.6e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 1.0763e+02 1.0 8.04e+07 1.3 1.5e+08 3.9e+02
2.4e+03 95100100100 96 95100100100 96 4848
PCGAMGGraph_AGG 6 1.0 4.2442e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00
7.2e+01 4 0 0 0 3 4 0 0 0 3 0
PCGAMGCoarse_AGG 6 1.0 7.2416e+00 1.0 3.64e+02 2.9 4.1e+07 4.4e+00
3.1e+02 6 0 26 0 12 6 0 26 0 12 0
PCGAMGProl_AGG 6 1.0 1.0400e+01 1.0 0.00e+00 0.0 7.9e+05 8.0e+00
1.4e+02 9 0 1 0 6 9 0 1 0 6 0
PCGAMGPOpt_AGG 6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00
3.0e+02 17 0 1 0 12 17 0 1 0 12 0
GAMG: createProl 6 1.0 4.4975e+01 1.1 4.31e+05565.4 4.3e+07 4.6e+00
8.3e+02 36 0 28 0 33 36 0 28 0 33 0
Graph 12 1.0 4.2435e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00
7.2e+01 4 0 0 0 3 4 0 0 0 3 0
MIS/Agg 6 1.0 1.5276e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00
2.4e+02 1 0 25 0 10 1 0 25 0 10 0
SA: col data 6 1.0 6.9602e+00 1.6 0.00e+00 0.0 7.2e+05 8.2e+00
6.0e+01 6 0 0 0 2 6 0 0 0 2 0
SA: frmProl0 6 1.0 3.3989e+00 1.0 0.00e+00 0.0 7.2e+04 5.9e+00
6.0e+01 3 0 0 0 2 3 0 0 0 2 0
SA: smooth 6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00
3.0e+02 17 0 1 0 12 17 0 1 0 12 0
GAMG: partLevel 6 1.0 4.3421e+01 1.1 1.97e+053512.3 6.9e+05 2.0e+01
4.1e+02 38 0 0 0 16 38 0 0 0 16 0
repartition 6 1.0 3.0290e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
3.6e+01 0 0 0 0 1 0 0 0 0 1 0
Invert-Sort 6 1.0 2.2100e+00 7.9 0.00e+00 0.0 0.0e+00 0.0e+00
2.4e+01 2 0 0 0 1 2 0 0 0 1 0
Move A 6 1.0 1.3769e+01 1.0 0.00e+00 0.0 1.1e+04 8.4e+01
1.0e+02 12 0 0 0 4 12 0 0 0 4 0
Move P 6 1.0 1.1463e+01 1.0 0.00e+00 0.0 6.5e+04 5.5e+00
1.0e+02 10 0 0 0 4 10 0 0 0 4 0
PCSetUp 6 1.0 9.5437e+01 1.0 8.96e+05 3.3 4.5e+07 1.4e+01
1.5e+03 84 0 29 1 59 84 0 29 1 59 24
PCSetUpOnBlocks 97 1.0 1.7256e-0221.5 2.58e+02 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 97 1.0 1.1589e+01 1.0 6.95e+07 1.3 1.0e+08 4.8e+02
5.1e+02 10 84 67 83 21 10 84 67 83 21 37682
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 1032 1032 3077992 0
Vector Scatter 64 63 71936 0
Matrix 211 211 2308880 0
Matrix Coarsen 6 6 3720 0
Matrix Null Space 1 1 584 0
Distributed Mesh 5 4 19808 0
Star Forest Bipartite Graph 16 14 11760 0
Discrete System 5 4 3360 0
Index Set 180 180 169588 0
IS L to G Mapping 5 4 6020 0
Krylov Solver 22 22 374160 0
DMKSP interface 4 4 2560 0
Preconditioner 22 22 20924 0
PetscRandom 6 6 3696 0
Viewer 8 6 4512 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.57764e-05
Average time for zero size MPI_Send(): 1.04982e-05
#PETSc Option Table entries:
-finput input.txt
-info
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type gamg
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-sdot-returns-double=0
--known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 "
--known-mpi-shared-libraries=0 --known-memcmp-ok
--with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a
--COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable
-ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native
-mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 "
--with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 "
--with-fortranlib-autodetect="0 " --with-shared-libraries="0 "
--with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn "
--download-hypre=1 --download-blacs="1 " --download-scalapack="1 "
--download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 "
PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0
-Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS}
${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include
-I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc
-Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib
-L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE
-lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib
-L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------