Dear all, I have some problems using GAMG as a preconditioner for (F)GMRES. Background: I am solving the incompressible, unsteady Navier-Stokes equations with a coupled mixed FEM approach, using P1/P1 elements for velocity and pressure on an unstructured tetrahedron mesh with about 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, hence, no zeros on the diagonal of the pressure block. Time discretization with semi-implicit backward Euler. The flow is a convection dominated flow through a nozzle.
So far, for this setup, I have been quite happy with a simple FGMRES/ML solver for the full system (rather bruteforce, I admit, but much faster than any block/Schur preconditioners I tried): -ksp_converged_reason -ksp_monitor_true_residual -ksp_type fgmres -ksp_rtol 1.0e-6 -ksp_initial_guess_nonzero -pc_type ml -pc_ml_Threshold 0.03 -pc_ml_maxNlevels 3 This setup converges in ~100 iterations (see below the ksp_view output) to rtol: 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 Now I'd like to try GAMG instead of ML. However, I don't know how to set it up to get similar performance. The obvious/naive -pc_type gamg -pc_gamg_type agg # with and without -pc_gamg_threshold 0.03 -pc_mg_levels 3 converges very slowly on 1 proc and much worse on 8 (~200k dofs per proc), for instance: np = 1: 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 np = 8: 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 A very high threshold seems to improve the GAMG PC, for instance with 0.75 I get convergence to rtol=1e-6 after 744 iterations. What else should I try? I would very much appreciate any advice on configuring GAMG and differences w.r.t ML to be taken into account (not a multigrid expert though). Thanks, best wishes David ------ ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 KSP Object: 1 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000 tolerances: relative=1e-06, absolute=1e-50, divergence=10000. right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=1 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0.75 AGG specific options Symmetric graph false Coarse grid solver -- level ------------------------------- KSP Object: (mg_levels_0_) 1 MPI processes type: preonly maximum iterations=2, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_0_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 1037847 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 1037847 nodes, limit used is 5 ------ ksp_view for -pc_type ml: KSP Object: 8 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000 tolerances: relative=1e-06, absolute=1e-50, divergence=10000. right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: ml MG: type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: redundant Redundant preconditioner: First (color=0) of 8 PCs follows KSP Object: (mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 10.4795 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=6822, cols=6822 package used to perform factorization: petsc total: nonzeros=9575688, allocated nonzeros=9575688 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=6822, cols=6822 total: nonzeros=913758, allocated nonzeros=913758 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=6822, cols=6822 total: nonzeros=913758, allocated nonzeros=913758 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=67087, cols=67087 total: nonzeros=9722749, allocated nonzeros=9722749 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 126690 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 126690 nodes, limit used is 5
signature.asc
Description: OpenPGP digital signature