On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfad...@lbl.gov> wrote: > You seem to have two levels here and 3M eqs on the fine grid and 37 on > the coarse grid.
37 is on the sub domain. rows=18145, cols=18145 on the entire coarse grid. > I don't understand that. > > You are also calling the AMG setup a lot, but not spending much time > in it. Try running with -info and grep on "GAMG". > > > On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > Thanks, Barry. > > > > It works. > > > > GAMG is three times better than ASM in terms of the number of linear > > iterations, but it is five times slower than ASM. Any suggestions to > improve > > the performance of GAMG? Log files are attached. > > > > Fande, > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > >> > >> > >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.k...@inl.gov> wrote: > >> > > >> > Thanks, Mark and Barry, > >> > > >> > It works pretty wells in terms of the number of linear iterations > (using > >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I > am > >> > using the two-level method via "-pc_mg_levels 2". The reason why the > compute > >> > time is larger than other preconditioning options is that a matrix > free > >> > method is used in the fine level and in my particular problem the > function > >> > evaluation is expensive. > >> > > >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > >> > but I do not think I want to make the preconditioning part > matrix-free. Do > >> > you guys know how to turn off the matrix-free method for GAMG? > >> > >> -pc_use_amat false > >> > >> > > >> > Here is the detailed solver: > >> > > >> > SNES Object: 384 MPI processes > >> > type: newtonls > >> > maximum iterations=200, maximum function evaluations=10000 > >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > >> > total number of linear solver iterations=20 > >> > total number of function evaluations=166 > >> > norm schedule ALWAYS > >> > SNESLineSearch Object: 384 MPI processes > >> > type: bt > >> > interpolation: cubic > >> > alpha=1.000000e-04 > >> > maxstep=1.000000e+08, minlambda=1.000000e-12 > >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > >> > lambda=1.000000e-08 > >> > maximum iterations=40 > >> > KSP Object: 384 MPI processes > >> > type: gmres > >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > >> > Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=100, initial guess is zero > >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > >> > right preconditioning > >> > using UNPRECONDITIONED norm type for convergence test > >> > PC Object: 384 MPI processes > >> > type: gamg > >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v > >> > Cycles per PCApply=1 > >> > Using Galerkin computed coarse grid matrices > >> > GAMG specific options > >> > Threshold for dropping small values from graph 0. > >> > AGG specific options > >> > Symmetric graph true > >> > Coarse grid solver -- level ------------------------------- > >> > KSP Object: (mg_coarse_) 384 MPI processes > >> > type: preonly > >> > maximum iterations=10000, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_) 384 MPI processes > >> > type: bjacobi > >> > block Jacobi: number of blocks = 384 > >> > Local solve is same for all blocks, in the following KSP and > >> > PC objects: > >> > KSP Object: (mg_coarse_sub_) 1 MPI processes > >> > type: preonly > >> > maximum iterations=1, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_sub_) 1 MPI processes > >> > type: lu > >> > LU: out-of-place factorization > >> > tolerance for zero pivot 2.22045e-14 > >> > using diagonal shift on blocks to prevent zero pivot > >> > [INBLOCKS] > >> > matrix ordering: nd > >> > factor fill ratio given 5., needed 1.31367 > >> > Factored matrix follows: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > package used to perform factorization: petsc > >> > total: nonzeros=913, allocated nonzeros=913 > >> > total number of mallocs used during MatSetValues > calls > >> > =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > total: nonzeros=695, allocated nonzeros=695 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 384 MPI processes > >> > type: mpiaij > >> > rows=18145, cols=18145 > >> > total: nonzeros=1709115, allocated nonzeros=1709115 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Down solver (pre-smoother) on level 1 > >> > ------------------------------- > >> > KSP Object: (mg_levels_1_) 384 MPI processes > >> > type: chebyshev > >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = > >> > 1.46673 > >> > Chebyshev: eigenvalues estimated using gmres with > translations > >> > [0. 0.1; 0. 1.1] > >> > KSP Object: (mg_levels_1_esteig_) 384 MPI > >> > processes > >> > type: gmres > >> > GMRES: restart=30, using Classical (unmodified) > >> > Gram-Schmidt Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=10, initial guess is zero > >> > tolerances: relative=1e-12, absolute=1e-50, > >> > divergence=10000. > >> > left preconditioning > >> > using PRECONDITIONED norm type for convergence test > >> > maximum iterations=2 > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using nonzero initial guess > >> > using NONE norm type for convergence test > >> > PC Object: (mg_levels_1_) 384 MPI processes > >> > type: sor > >> > SOR: type = local_symmetric, iterations = 1, local > iterations > >> > = 1, omega = 1. > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Up solver (post-smoother) same as down solver (pre-smoother) > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > > >> > > >> > Fande, > >> > > >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfad...@lbl.gov> wrote: > >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsm...@mcs.anl.gov> > wrote: > >> > > > >> > >> Does this mean that GAMG works for the symmetrical matrix only? > >> > > > >> > > No, it means that for non symmetric nonzero structure you need the > >> > > extra flag. So use the extra flag. The reason we don't always use > the flag > >> > > is because it adds extra cost and isn't needed if the matrix > already has a > >> > > symmetric nonzero structure. > >> > > >> > BTW, if you have symmetric non-zero structure you can just set > >> > -pc_gamg_threshold -1.0', note the "or" in the message. > >> > > >> > If you want to mess with the threshold then you need to use the > >> > symmetrized flag. > >> > > >> > > >