Thanks, Barry. It works.
GAMG is three times better than ASM in terms of the number of linear iterations, but it is five times slower than ASM. Any suggestions to improve the performance of GAMG? Log files are attached. Fande, On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > Thanks, Mark and Barry, > > > > It works pretty wells in terms of the number of linear iterations (using > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am > using the two-level method via "-pc_mg_levels 2". The reason why the > compute time is larger than other preconditioning options is that a matrix > free method is used in the fine level and in my particular problem the > function evaluation is expensive. > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > but I do not think I want to make the preconditioning part matrix-free. Do > you guys know how to turn off the matrix-free method for GAMG? > > -pc_use_amat false > > > > > Here is the detailed solver: > > > > SNES Object: 384 MPI processes > > type: newtonls > > maximum iterations=200, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > total number of linear solver iterations=20 > > total number of function evaluations=166 > > norm schedule ALWAYS > > SNESLineSearch Object: 384 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 384 MPI processes > > type: gmres > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=100, initial guess is zero > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 384 MPI processes > > type: gamg > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > Cycles per PCApply=1 > > Using Galerkin computed coarse grid matrices > > GAMG specific options > > Threshold for dropping small values from graph 0. > > AGG specific options > > Symmetric graph true > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 384 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 384 MPI processes > > type: bjacobi > > block Jacobi: number of blocks = 384 > > Local solve is same for all blocks, in the following KSP and > PC objects: > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > type: preonly > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_sub_) 1 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > > matrix ordering: nd > > factor fill ratio given 5., needed 1.31367 > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > package used to perform factorization: petsc > > total: nonzeros=913, allocated nonzeros=913 > > total number of mallocs used during MatSetValues calls > =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > total: nonzeros=695, allocated nonzeros=695 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 384 MPI processes > > type: mpiaij > > rows=18145, cols=18145 > > total: nonzeros=1709115, allocated nonzeros=1709115 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Down solver (pre-smoother) on level 1 ------------------------------ > - > > KSP Object: (mg_levels_1_) 384 MPI processes > > type: chebyshev > > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 > > Chebyshev: eigenvalues estimated using gmres with > translations [0. 0.1; 0. 1.1] > > KSP Object: (mg_levels_1_esteig_) 384 MPI > processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, > divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > maximum iterations=2 > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using nonzero initial guess > > using NONE norm type for convergence test > > PC Object: (mg_levels_1_) 384 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations > = 1, omega = 1. > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > > > Fande, > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfad...@lbl.gov> wrote: > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > No, it means that for non symmetric nonzero structure you need the > extra flag. So use the extra flag. The reason we don't always use the flag > is because it adds extra cost and isn't needed if the matrix already has a > symmetric nonzero structure. > > > > BTW, if you have symmetric non-zero structure you can just set > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > If you want to mess with the threshold then you need to use the > > symmetrized flag. > > > >
Time Step 10, time = 0.1 dt = 0.01 0 Nonlinear |R| = 2.004779e-03 0 Linear |R| = 2.004779e-03 1 Linear |R| = 1.080152e-03 2 Linear |R| = 5.066679e-04 3 Linear |R| = 3.045271e-04 4 Linear |R| = 1.925133e-04 5 Linear |R| = 1.404396e-04 6 Linear |R| = 1.087962e-04 7 Linear |R| = 9.433190e-05 8 Linear |R| = 8.650164e-05 9 Linear |R| = 7.511298e-05 10 Linear |R| = 6.116103e-05 11 Linear |R| = 5.097880e-05 12 Linear |R| = 4.528093e-05 13 Linear |R| = 4.238188e-05 14 Linear |R| = 3.852598e-05 15 Linear |R| = 3.211727e-05 16 Linear |R| = 2.655089e-05 17 Linear |R| = 2.308499e-05 18 Linear |R| = 1.988423e-05 19 Linear |R| = 1.686685e-05 20 Linear |R| = 1.453042e-05 21 Linear |R| = 1.227912e-05 22 Linear |R| = 9.829701e-06 23 Linear |R| = 7.695993e-06 24 Linear |R| = 6.092649e-06 25 Linear |R| = 5.293533e-06 26 Linear |R| = 4.583670e-06 27 Linear |R| = 3.427266e-06 28 Linear |R| = 2.442730e-06 29 Linear |R| = 1.855485e-06 1 Nonlinear |R| = 1.855485e-06 0 Linear |R| = 1.855485e-06 1 Linear |R| = 1.626392e-06 2 Linear |R| = 1.505583e-06 3 Linear |R| = 1.258325e-06 4 Linear |R| = 8.295100e-07 5 Linear |R| = 6.184171e-07 6 Linear |R| = 5.114149e-07 7 Linear |R| = 4.146942e-07 8 Linear |R| = 3.335395e-07 9 Linear |R| = 2.647491e-07 10 Linear |R| = 2.099801e-07 11 Linear |R| = 1.774148e-07 12 Linear |R| = 1.508766e-07 13 Linear |R| = 1.214361e-07 14 Linear |R| = 1.009707e-07 15 Linear |R| = 9.148193e-08 16 Linear |R| = 8.608036e-08 17 Linear |R| = 7.997930e-08 18 Linear |R| = 7.004223e-08 19 Linear |R| = 5.671891e-08 20 Linear |R| = 4.909039e-08 21 Linear |R| = 4.690188e-08 22 Linear |R| = 4.309895e-08 23 Linear |R| = 3.325854e-08 24 Linear |R| = 2.375529e-08 25 Linear |R| = 1.690025e-08 26 Linear |R| = 1.237871e-08 27 Linear |R| = 8.720643e-09 28 Linear |R| = 5.961891e-09 29 Linear |R| = 4.283073e-09 30 Linear |R| = 3.126338e-09 31 Linear |R| = 2.185008e-09 32 Linear |R| = 1.411854e-09 2 Nonlinear |R| = 1.411854e-09 SNES Object: 384 MPI processes type: newtonls maximum iterations=200, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 total number of linear solver iterations=61 total number of function evaluations=66 norm schedule ALWAYS SNESLineSearch Object: 384 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 384 MPI processes type: gmres GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.001, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 384 MPI processes type: asm Additive Schwarz: total subdomain blocks = 384, amount of overlap = 1 Additive Schwarz: restriction/interpolation type - RESTRICT Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=20493, cols=20493 package used to perform factorization: petsc total: nonzeros=1270950, allocated nonzeros=1270950 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=20493, cols=20493 total: nonzeros=1270950, allocated nonzeros=1270950 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Solve Converged! ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i0n1 with 384 processors, by kongf Tue Mar 14 16:28:04 2017 Using Petsc Release Version 3.7.5, unknown Max Max/Min Avg Total Time (sec): 4.387e+02 1.00001 4.387e+02 Objects: 1.279e+03 1.00000 1.279e+03 Flops: 4.230e+09 1.99161 2.946e+09 1.131e+12 Flops/sec: 9.642e+06 1.99162 6.716e+06 2.579e+09 MPI Messages: 2.935e+05 4.95428 1.810e+05 6.951e+07 MPI Message Lengths: 3.105e+09 3.16103 1.072e+04 7.449e+11 MPI Reductions: 5.022e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.3875e+02 100.0% 1.1314e+12 100.0% 6.951e+07 100.0% 1.072e+04 100.0% 5.022e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 20 1.0 3.2134e-03 2.4 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 37601 VecMDot 839 1.0 6.7209e-01 1.2 3.52e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 8 0 0 2 0 8 0 0 2 139634 VecNorm 1802 1.0 6.7932e+00 2.5 4.08e+07 2.3 0.0e+00 0.0e+00 1.8e+03 1 1 0 0 4 1 1 0 0 4 1603 VecScale 3877 1.0 1.0508e-01 1.4 1.34e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 439546 VecCopy 4153 1.0 7.2803e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 5493 1.0 5.1735e-01 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 5365 1.0 4.0282e-01 2.3 3.01e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 251646 VecWAXPY 884 1.0 5.5227e-02 3.5 1.97e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 95341 VecMAXPY 864 1.0 1.7126e-01 2.6 3.71e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 577621 VecAssemblyBegin 15491 1.0 1.3738e+02 3.0 0.00e+00 0.0 8.9e+06 1.8e+04 4.6e+04 28 0 13 22 93 28 0 13 22 93 0 VecAssemblyEnd 15491 1.0 7.9072e-0128.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 13390 1.0 2.5097e+00 3.6 0.00e+00 0.0 5.9e+07 8.4e+03 2.8e+01 0 0 85 67 0 0 0 85 67 0 0 VecScatterEnd 13362 1.0 5.7428e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecReduceArith 55 1.0 1.2808e-03 2.2 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 259431 VecReduceComm 25 1.0 5.5003e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 864 1.0 4.4664e+00 3.5 2.93e+07 2.3 0.0e+00 0.0e+00 8.6e+02 1 1 0 0 2 1 1 0 0 2 1753 MatMult MF 859 1.0 3.1339e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439 MatMult 859 1.0 3.1340e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439 MatSolve 864 1.0 2.1255e+00 2.0 1.83e+09 2.1 0.0e+00 0.0e+00 0.0e+00 0 43 0 0 0 0 43 0 0 0 226791 MatLUFactorNum 25 1.0 1.0920e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 267745 MatILUFactorSym 13 1.0 1.0606e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 150 1.0 2.0643e+00 1.2 0.00e+00 0.0 1.7e+05 1.7e+05 2.0e+02 0 0 0 4 0 0 0 0 4 0 0 MatAssemblyEnd 150 1.0 4.3198e+00 1.1 0.00e+00 0.0 1.9e+04 1.1e+03 2.1e+02 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 13 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 25 1.0 4.4022e+00 2.8 0.00e+00 0.0 5.9e+05 8.4e+04 7.5e+01 1 0 1 7 0 1 0 1 7 0 0 MatGetOrdering 13 1.0 1.7283e-0217.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 13 1.0 2.0244e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 29 1.0 5.0908e-02 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 52 2.0 5.5351e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0 SNESSolve 13 1.0 3.7214e+02 1.0 4.21e+09 2.0 6.6e+07 1.0e+04 4.8e+04 85100 95 92 95 85100 95 92 95 3026 SNESFunctionEval 897 1.0 3.2606e+02 1.0 3.62e+08 1.3 5.9e+07 9.6e+03 4.3e+04 74 11 85 76 85 74 11 85 76 85 384 SNESJacobianEval 25 1.0 3.4770e+01 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 8 1 3 7 4 8 1 3 7 4 195 SNESLineSearch 25 1.0 1.8090e+01 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 1 4 4 5 4 1 4 4 5 475 BuildTwoSided 25 1.0 4.6378e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 25 1.0 2.7061e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 25 1.0 4.6412e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceEnd 25 1.0 8.1301e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 839 1.0 8.0119e-01 1.2 7.03e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 17 0 0 2 0 17 0 0 2 234277 KSPSetUp 50 1.0 3.0220e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 25 1.0 3.1444e+02 1.0 4.16e+09 2.0 6.0e+07 9.9e+03 4.3e+04 72 98 86 80 85 72 98 86 80 85 3526 PCSetUp 50 1.0 5.4896e+00 2.4 1.20e+09 2.5 7.1e+05 7.0e+04 1.8e+02 1 26 1 7 0 1 26 1 7 0 53260 PCSetUpOnBlocks 25 1.0 1.1928e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 245124 PCApply 864 1.0 2.4803e+00 2.0 1.83e+09 2.1 4.1e+06 4.4e+03 0.0e+00 0 43 6 2 0 0 43 6 2 0 194354 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 740 740 732012968 0. Vector Scatter 76 76 1212680 0. Index Set 176 176 4673716 0. IS L to G Mapping 33 33 3228828 0. MatMFFD 13 13 10088 0. Matrix 45 45 364469360 0. SNES 13 13 17316 0. SNESLineSearch 13 13 12896 0. DMSNES 13 13 8632 0. Distributed Mesh 13 13 60320 0. Star Forest Bipartite Graph 51 51 43248 0. Discrete System 13 13 11232 0. Krylov Solver 26 26 2223520 0. DMKSP interface 13 13 8424 0. Preconditioner 26 26 25688 0. Viewer 15 13 10816 0. ======================================================================================================================== Average time to get PetscTime(): 0. Average time for MPI_Barrier(): 1.27792e-05 Average time for zero size MPI_Send(): 2.08554e-06 #PETSc Option Table entries: --n-threads=1 -i treat-cube_transient.i -ksp_gmres_restart 100 -log_view -pc_hypre_boomeramg_max_iter 4 -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_boomeramg_tol 1.0e-6 -pc_hypre_type boomeramg -pc_type asm -snes_mf_operator -snes_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1 ----------------------------------------- Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1 Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /home/kongf/workhome/projects/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl -----------------------------------------
Time Step 10, time = 0.1 dt = 0.01 0 Nonlinear |R| = 2.004778e-03 0 Linear |R| = 2.004778e-03 1 Linear |R| = 4.440581e-04 2 Linear |R| = 1.283930e-04 3 Linear |R| = 9.874954e-05 4 Linear |R| = 6.589984e-05 5 Linear |R| = 4.483411e-05 6 Linear |R| = 2.787575e-05 7 Linear |R| = 1.435839e-05 8 Linear |R| = 8.720579e-06 9 Linear |R| = 3.704796e-06 10 Linear |R| = 2.317054e-06 11 Linear |R| = 9.060942e-07 1 Nonlinear |R| = 9.060942e-07 0 Linear |R| = 9.060942e-07 1 Linear |R| = 6.874101e-07 2 Linear |R| = 3.052995e-07 3 Linear |R| = 1.728171e-07 4 Linear |R| = 7.805237e-08 5 Linear |R| = 5.011253e-08 6 Linear |R| = 2.903814e-08 7 Linear |R| = 2.421108e-08 8 Linear |R| = 1.594860e-08 9 Linear |R| = 1.116189e-08 10 Linear |R| = 4.372907e-09 11 Linear |R| = 1.575997e-09 12 Linear |R| = 5.765413e-10 2 Nonlinear |R| = 5.765413e-10 SNES Object: 384 MPI processes type: newtonls maximum iterations=200, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 total number of linear solver iterations=23 total number of function evaluations=28 norm schedule ALWAYS SNESLineSearch Object: 384 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 384 MPI processes type: gmres GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.001, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 384 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=2 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0. AGG specific options Symmetric graph true Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 384 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 384 MPI processes type: bjacobi block Jacobi: number of blocks = 384 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1.31367 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 package used to perform factorization: petsc total: nonzeros=913, allocated nonzeros=913 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 total: nonzeros=695, allocated nonzeros=695 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 384 MPI processes type: mpiaij rows=18145, cols=18145 total: nonzeros=1709115, allocated nonzeros=1709115 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 384 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138116, max = 1.51927 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 384 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 384 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Solve Converged! ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i4n2 with 384 processors, by kongf Fri Apr 7 13:36:35 2017 Using Petsc Release Version 3.7.5, unknown Max Max/Min Avg Total Time (sec): 2.266e+03 1.00001 2.266e+03 Objects: 6.020e+03 1.00000 6.020e+03 Flops: 1.064e+10 2.27050 7.337e+09 2.817e+12 Flops/sec: 4.695e+06 2.27050 3.237e+06 1.243e+09 MPI Messages: 3.459e+05 5.11666 2.112e+05 8.111e+07 MPI Message Lengths: 3.248e+09 3.35280 9.453e+03 7.667e+11 MPI Reductions: 4.610e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.2663e+03 100.0% 2.8172e+12 100.0% 8.111e+07 100.0% 9.453e+03 100.0% 4.610e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 20 1.0 6.1171e-01 1.6 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 198 VecMDot 1091 1.0 3.4823e+01 1.7 1.05e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 1 0 0 2 1 1 0 0 2 803 VecNorm 1943 1.0 6.9656e+01 1.6 3.66e+07 2.3 0.0e+00 0.0e+00 1.9e+03 3 0 0 0 4 3 0 0 0 4 140 VecScale 2928 1.0 1.1091e-01 2.8 7.24e+07 1.4 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 219463 VecCopy 3086 1.0 6.0201e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 7168 1.0 4.2314e-01 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 3263 1.0 3.7908e-01 4.1 1.59e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 138504 VecAYPX 4112 1.0 1.1982e-01 4.2 3.59e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 80071 VecAXPBYCZ 2056 1.0 7.5538e-02 3.3 7.18e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 254030 VecWAXPY 743 1.0 7.8864e-02 4.9 1.65e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55963 VecMAXPY 1196 1.0 7.9660e-02 3.3 1.23e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 411137 VecAssemblyBegin 12333 1.0 1.1090e+03 1.2 0.00e+00 0.0 7.6e+06 1.9e+04 3.7e+04 48 0 9 19 80 48 0 9 19 80 0 VecAssemblyEnd 12333 1.0 4.2957e-0124.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 440 1.0 2.2301e-02 5.7 3.12e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37433 VecScatterBegin 13638 1.0 2.3693e+00 4.9 0.00e+00 0.0 6.4e+07 5.6e+03 2.8e+01 0 0 79 46 0 0 0 79 46 0 0 VecScatterEnd 13610 1.0 2.1648e+0213.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecSetRandom 40 1.0 4.5372e-02 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 55 1.0 1.3552e-03 2.7 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 245191 VecReduceComm 25 1.0 2.3911e+00 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 1196 1.0 2.8596e+01 1.1 2.95e+07 2.3 0.0e+00 0.0e+00 1.2e+03 1 0 0 0 3 1 0 0 0 3 275 MatMult MF 718 1.0 1.4078e+03 1.0 2.00e+08 1.4 4.2e+07 8.2e+03 3.2e+04 62 2 52 45 69 62 2 52 45 69 46 MatMult 4195 1.0 1.4272e+03 1.0 3.33e+09 2.2 5.8e+07 6.6e+03 3.2e+04 63 32 72 50 69 63 32 72 50 69 627 MatMultAdd 514 1.0 9.7981e+0016.1 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 995 MatMultTranspose 514 1.0 6.0183e+0019.9 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 1620 MatSolve 316 1.3 1.7905e-0219.7 1.76e+06 4.6 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 18236 MatSOR 3524 1.0 6.6987e+00 3.9 2.50e+09 2.6 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 0 23 0 0 0 97291 MatLUFactorSym 25 1.0 1.7944e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 25 1.0 2.2082e-03 6.0 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 136111 MatConvert 40 1.0 2.6915e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 120 1.0 1.0204e+0022.5 3.86e+07 2.3 1.9e+05 2.9e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 10018 MatResidual 514 1.0 5.3226e+01 1.1 4.35e+08 2.3 3.7e+06 4.2e+03 1.1e+03 2 4 5 2 2 2 4 5 2 2 2165 MatAssemblyBegin 1010 1.0 6.0257e+01 2.2 0.00e+00 0.0 1.7e+06 3.5e+04 8.4e+02 2 0 2 8 2 2 0 2 8 2 0 MatAssemblyEnd 1010 1.0 7.7316e+01 1.0 0.00e+00 0.0 2.5e+06 4.6e+02 2.1e+03 3 0 3 0 5 3 0 3 0 5 0 MatGetRow 1078194 2.3 2.4485e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 25 1.2 3.7956e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrix 30 1.0 1.6949e+01 1.0 0.00e+00 0.0 1.2e+05 2.8e+02 5.1e+02 1 0 0 0 1 1 0 0 0 1 0 MatGetOrdering 25 1.2 1.8878e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 40 1.0 1.5944e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0 MatZeroEntries 69 1.0 7.3145e-02 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 90 1.4 1.1229e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 40 1.0 3.4301e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0 MatTranspose 20 1.0 1.2561e+01 1.1 0.00e+00 0.0 7.1e+05 2.0e+04 2.4e+02 1 0 1 2 1 1 0 1 2 1 0 MatMatMult 40 1.0 2.6365e+01 1.0 3.56e+07 2.3 1.2e+06 1.4e+03 6.4e+02 1 0 1 0 1 1 0 1 0 1 358 MatMatMultSym 40 1.0 2.3430e+01 1.0 0.00e+00 0.0 9.8e+05 1.1e+03 5.6e+02 1 0 1 0 1 1 0 1 0 1 0 MatMatMultNum 40 1.0 2.9809e+00 1.1 3.56e+07 2.3 1.9e+05 2.9e+03 8.0e+01 0 0 0 0 0 0 0 0 0 0 3170 MatPtAP 40 1.0 3.1763e+01 1.0 2.59e+08 2.3 2.7e+06 2.6e+03 6.8e+02 1 2 3 1 1 1 2 3 1 1 2012 MatPtAPSymbolic 40 1.0 1.7240e+01 1.1 0.00e+00 0.0 1.2e+06 4.6e+03 2.8e+02 1 0 1 1 1 1 0 1 1 1 0 MatPtAPNumeric 40 1.0 1.5004e+01 1.1 2.59e+08 2.3 1.5e+06 1.0e+03 4.0e+02 1 2 2 0 1 1 2 2 0 1 4259 MatTrnMatMult 25 1.0 1.1522e+02 1.0 4.05e+09 2.3 7.5e+05 2.6e+05 4.8e+02 5 37 1 25 1 5 37 1 25 1 9105 MatTrnMatMultSym 25 1.0 7.3735e+01 1.0 0.00e+00 0.0 6.3e+05 1.0e+05 4.2e+02 3 0 1 8 1 3 0 1 8 1 0 MatTrnMatMultNum 25 1.0 4.1508e+01 1.0 4.05e+09 2.3 1.2e+05 1.1e+06 5.0e+01 2 37 0 17 0 2 37 0 17 0 25275 MatGetLocalMat 170 1.0 6.0506e-01 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 120 1.0 3.7906e+00 5.3 0.00e+00 0.0 1.3e+06 5.0e+03 0.0e+00 0 0 2 1 0 0 0 2 1 0 0 SNESSolve 13 1.0 1.9975e+03 1.0 1.06e+10 2.3 7.8e+07 9.1e+03 4.3e+04 88100 96 92 94 88100 96 92 94 1408 SNESFunctionEval 756 1.0 1.4539e+03 1.0 1.62e+08 1.4 4.4e+07 8.3e+03 3.3e+04 64 2 55 48 71 64 2 55 48 71 38 SNESJacobianEval 25 1.0 1.0415e+02 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 5 0 3 7 4 5 0 3 7 4 65 SNESLineSearch 25 1.0 1.0113e+02 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 0 4 4 5 4 0 4 4 5 85 BuildTwoSided 85 1.0 5.0838e+00 1.5 0.00e+00 0.0 1.5e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 85 1.0 3.2002e-02 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 382 1.0 3.1338e+00 1.4 0.00e+00 0.0 2.6e+06 2.3e+03 0.0e+00 0 0 3 1 0 0 0 3 1 0 0 SFBcastEnd 382 1.0 5.2611e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 45 1.0 2.5858e+00 1.5 0.00e+00 0.0 2.4e+05 1.8e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceEnd 45 1.0 3.6487e-01253.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 1091 1.0 3.4858e+01 1.7 2.09e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 2 0 0 2 1 2 0 0 2 1604 KSPSetUp 195 1.0 2.9202e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 25 1.0 1.7661e+03 1.0 1.06e+10 2.3 7.2e+07 8.6e+03 3.9e+04 78 99 88 80 84 78 99 88 80 84 1582 PCGAMGGraph_AGG 40 1.0 3.5930e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263 PCGAMGCoarse_AGG 40 1.0 1.4450e+02 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 6 37 5 27 3 6 37 5 27 3 7260 PCGAMGProl_AGG 40 1.0 3.2209e+01 1.0 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02 1 0 1 0 2 1 0 1 0 2 0 PCGAMGPOpt_AGG 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987 GAMG: createProl 40 1.0 2.7631e+02 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 12 42 12 32 10 12 42 12 32 10 4286 Graph 80 1.0 3.5926e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263 MIS/Agg 40 1.0 1.5945e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0 SA: col data 40 1.0 1.3401e+01 1.1 0.00e+00 0.0 4.2e+05 6.1e+03 4.0e+02 1 0 1 0 1 1 0 1 0 1 0 SA: frmProl0 40 1.0 1.4033e+01 1.1 0.00e+00 0.0 5.6e+05 4.6e+02 4.0e+02 1 0 1 0 1 1 0 1 0 1 0 SA: smooth 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987 GAMG: partLevel 40 1.0 5.8738e+01 1.0 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03 3 2 4 1 3 3 2 4 1 3 1088 repartition 35 1.0 3.3741e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+01 0 0 0 0 0 0 0 0 0 0 0 Invert-Sort 15 1.0 2.7445e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 0 0 0 0 0 0 0 Move A 15 1.0 9.3221e+00 1.0 0.00e+00 0.0 6.6e+04 4.9e+02 2.7e+02 0 0 0 0 1 0 0 0 0 1 0 Move P 15 1.0 8.7196e+00 1.0 0.00e+00 0.0 5.7e+04 3.6e+01 2.7e+02 0 0 0 0 1 0 0 0 0 1 0 PCSetUp 50 1.0 3.4248e+02 1.0 4.81e+09 2.3 1.2e+07 2.0e+04 6.5e+03 15 44 15 33 14 15 44 15 33 14 3645 PCSetUpOnBlocks 316 1.0 2.1314e-02 6.3 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14102 PCApply 316 1.0 7.8870e+02 1.0 5.52e+09 2.4 4.0e+07 4.4e+03 1.7e+04 34 52 49 23 37 34 52 49 23 37 1863 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 2951 2951 828338752 0. Vector Scatter 353 353 367264 0. Index Set 833 833 6198336 0. IS L to G Mapping 33 33 3228828 0. MatMFFD 13 13 10088 0. Matrix 1334 1334 3083683516 0. Matrix Coarsen 40 40 25120 0. SNES 13 13 17316 0. SNESLineSearch 13 13 12896 0. DMSNES 13 13 8632 0. Distributed Mesh 13 13 60320 0. Star Forest Bipartite Graph 111 111 94128 0. Discrete System 13 13 11232 0. Krylov Solver 123 123 4660776 0. DMKSP interface 13 13 8424 0. Preconditioner 123 123 117692 0. PetscRandom 13 13 8294 0. Viewer 15 13 10816 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.0217308 Average time for zero size MPI_Send(): 0.000133693 #PETSc Option Table entries: --n-threads=1 -i treat-cube_transient.i -ksp_gmres_restart 100 -log_view -pc_gamg_sym_graph true -pc_hypre_boomeramg_max_iter 4 -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_boomeramg_tol 1.0e-6 -pc_hypre_type boomeramg -pc_mg_levels 2 -pc_type gamg -pc_use_amat false -snes_mf_operator -snes_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1 ----------------------------------------- Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1 Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /home/kongf/workhome/projects/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl -----------------------------------------