On Fri, Oct 20, 2017 at 11:10 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> > David, > > GAMG picks the number of levels based on how the coarsening process etc > proceeds. You cannot hardwire it to a particular value. Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend using it. > You can run with -info to get more info potentially on the decisions GAMG > is making. > this is noisy but grep on GAMG and you will see the levels and sizes, etc. > > Barry > > > On Oct 20, 2017, at 2:06 PM, David Nolte <dno...@dim.uchile.cl> wrote: > > > > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > > was not taken into account: > > type: gamg > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > > > > > > On 10/20/2017 03:32 PM, David Nolte wrote: > >> Dear all, > >> > >> I have some problems using GAMG as a preconditioner for (F)GMRES. > >> Background: I am solving the incompressible, unsteady Navier-Stokes > >> equations with a coupled mixed FEM approach, using P1/P1 elements for > >> velocity and pressure on an unstructured tetrahedron mesh with about > >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > >> hence, no zeros on the diagonal of the pressure block. Time > >> discretization with semi-implicit backward Euler. The flow is a > >> convection dominated flow through a nozzle. > >> > >> So far, for this setup, I have been quite happy with a simple FGMRES/ML > >> solver for the full system (rather bruteforce, I admit, but much faster > >> than any block/Schur preconditioners I tried): > >> > >> -ksp_converged_reason > >> -ksp_monitor_true_residual > >> -ksp_type fgmres > >> -ksp_rtol 1.0e-6 > >> -ksp_initial_guess_nonzero > >> > >> -pc_type ml > >> -pc_ml_Threshold 0.03 > >> -pc_ml_maxNlevels 3 > >> > >> This setup converges in ~100 iterations (see below the ksp_view output) > >> to rtol: > >> > >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >> > >> > >> Now I'd like to try GAMG instead of ML. However, I don't know how to set > >> it up to get similar performance. > >> The obvious/naive > >> > >> -pc_type gamg > >> -pc_gamg_type agg > >> > >> # with and without > >> -pc_gamg_threshold 0.03 > >> -pc_mg_levels 3 > >> > >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per > >> proc), for instance: > >> np = 1: > >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >> > >> np = 8: > >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> > >> A very high threshold seems to improve the GAMG PC, for instance with > >> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >> What else should I try? > >> > >> I would very much appreciate any advice on configuring GAMG and > >> differences w.r.t ML to be taken into account (not a multigrid expert > >> though). > >> > >> Thanks, best wishes > >> David > >> > >> > >> ------ > >> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > >> > >> KSP Object: 1 MPI processes > >> type: fgmres > >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >> Orthogonalization with no iterative refinement > >> GMRES: happy breakdown tolerance 1e-30 > >> maximum iterations=10000 > >> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >> right preconditioning > >> using nonzero initial guess > >> using UNPRECONDITIONED norm type for convergence test > >> PC Object: 1 MPI processes > >> type: gamg > >> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> Cycles per PCApply=1 > >> Using Galerkin computed coarse grid matrices > >> GAMG specific options > >> Threshold for dropping small values from graph 0.75 > >> AGG specific options > >> Symmetric graph false > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_levels_0_) 1 MPI processes > >> type: preonly > >> maximum iterations=2, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_0_) 1 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 1037847 nodes, limit used is 5 > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 1037847 nodes, limit used is 5 > >> > >> > >> ------ > >> ksp_view for -pc_type ml: > >> > >> KSP Object: 8 MPI processes > >> type: fgmres > >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >> Orthogonalization with no iterative refinement > >> GMRES: happy breakdown tolerance 1e-30 > >> maximum iterations=10000 > >> tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > >> right preconditioning > >> using nonzero initial guess > >> using UNPRECONDITIONED norm type for convergence test > >> PC Object: 8 MPI processes > >> type: ml > >> MG: type is MULTIPLICATIVE, levels=3 cycles=v > >> Cycles per PCApply=1 > >> Using Galerkin computed coarse grid matrices > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 8 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 8 MPI processes > >> type: redundant > >> Redundant preconditioner: First (color=0) of 8 PCs follows > >> KSP Object: (mg_coarse_redundant_) 1 MPI > processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_redundant_) 1 MPI processes > >> type: lu > >> LU: out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > >> matrix ordering: nd > >> factor fill ratio given 5., needed 10.4795 > >> Factored matrix follows: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6822, cols=6822 > >> package used to perform factorization: petsc > >> total: nonzeros=9575688, allocated nonzeros=9575688 > >> total number of mallocs used during MatSetValues > calls =0 > >> not using I-node routines > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6822, cols=6822 > >> total: nonzeros=913758, allocated nonzeros=913758 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node routines > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=6822, cols=6822 > >> total: nonzeros=913758, allocated nonzeros=913758 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node (on process 0) routines > >> Down solver (pre-smoother) on level 1 ------------------------------- > >> KSP Object: (mg_levels_1_) 8 MPI processes > >> type: richardson > >> Richardson: damping factor=1. > >> maximum iterations=2 > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using nonzero initial guess > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_1_) 8 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=67087, cols=67087 > >> total: nonzeros=9722749, allocated nonzeros=9722749 > >> total number of mallocs used during MatSetValues calls =0 > >> not using I-node (on process 0) routines > >> Up solver (post-smoother) same as down solver (pre-smoother) > >> Down solver (pre-smoother) on level 2 ------------------------------- > >> KSP Object: (mg_levels_2_) 8 MPI processes > >> type: richardson > >> Richardson: damping factor=1. > >> maximum iterations=2 > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using nonzero initial guess > >> using NONE norm type for convergence test > >> PC Object: (mg_levels_2_) 8 MPI processes > >> type: sor > >> SOR: type = local_symmetric, iterations = 1, local iterations = > >> 1, omega = 1. > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 126690 nodes, > >> limit used is 5 > >> Up solver (post-smoother) same as down solver (pre-smoother) > >> linear system matrix = precond matrix: > >> Mat Object: 8 MPI processes > >> type: mpiaij > >> rows=1745224, cols=1745224 > >> total: nonzeros=99452608, allocated nonzeros=99452608 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 126690 nodes, limit > >> used is 5 > >> > > > >