Re: [petsc-users] GAMG advice
On Thu, Nov 9, 2017 at 2:19 PM, David Noltewrote: > Hi Mark, > > thanks for clarifying. > When I wrote the initial question I had somehow overlooked the fact that > the GAMG standard smoother was Chebychev while ML uses SOR. All the other > comments concerning threshold etc were based on this mistake. > > The following settings work quite well, of course LU is used on the coarse > level. > > -pc_type gamg > -pc_gamg_type agg > -pc_gamg_threshold 0.03 > -pc_gamg_square_graph 10# no effect ? > -pc_gamg_sym_graph > -mg_levels_ksp_type richardson > -mg_levels_pc_type sor > > -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence. > Looks reasonable. And this smoothing is good for elliptic operators convergence but it makes the operator more expensive. It's worth doing for elliptic operators but in my experience not for others. If you convergence rate does not change then you probably want -pc_gamg_agg_nsmooths 0. This is a cheaper (if smoothing does not help convergence a lot), simpler method and want to use it. > > The ksp view now looks like this: (does this seem reasonable?) > > > KSP Object: 4 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1 > tolerances: relative=1e-06, absolute=1e-50, divergence=1. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=5 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0.03 > AGG specific options > Symmetric graph true > Coarse grid solver -- level --- > KSP Object:(mg_coarse_) 4 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object:(mg_coarse_) 4 MPI processes > type: bjacobi > block Jacobi: number of blocks = 4 > Local solve is same for all blocks, in the following KSP and PC > objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=38, cols=38 > package used to perform factorization: petsc > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 8 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=38, cols=38 > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 8 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=38, cols=38 > total: nonzeros=1444, allocated nonzeros=1444 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 8 nodes, limit used > is 5 > Down solver (pre-smoother) on level 1 --- > KSP Object:(mg_levels_1_) 4 MPI processes > type: richardson > Richardson: damping factor=1. > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object:(mg_levels_1_) 4 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=168, cols=168 > total: nonzeros=19874, allocated
Re: [petsc-users] GAMG advice
Hi Mark, thanks for clarifying. When I wrote the initial question I had somehow overlooked the fact that the GAMG standard smoother was Chebychev while ML uses SOR. All the other comments concerning threshold etc were based on this mistake. The following settings work quite well, of course LU is used on the coarse level. -pc_type gamg -pc_gamg_type agg -pc_gamg_threshold 0.03 -pc_gamg_square_graph 10 # no effect ? -pc_gamg_sym_graph -mg_levels_ksp_type richardson -mg_levels_pc_type sor -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence. The ksp view now looks like this: (does this seem reasonable?) KSP Object: 4 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1 tolerances: relative=1e-06, absolute=1e-50, divergence=1. right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0.03 AGG specific options Symmetric graph true Coarse grid solver -- level --- KSP Object: (mg_coarse_) 4 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4 MPI processes type: bjacobi block Jacobi: number of blocks = 4 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=38, cols=38 package used to perform factorization: petsc total: nonzeros=1444, allocated nonzeros=1444 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=38, cols=38 total: nonzeros=1444, allocated nonzeros=1444 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 8 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=38, cols=38 total: nonzeros=1444, allocated nonzeros=1444 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 8 nodes, limit used is 5 Down solver (pre-smoother) on level 1 --- KSP Object: (mg_levels_1_) 4 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=168, cols=168 total: nonzeros=19874, allocated nonzeros=19874 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 17 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 --- KSP Object: (mg_levels_2_) 4 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4 MPI processes type:
Re: [petsc-users] GAMG advice
On Wed, Nov 1, 2017 at 5:45 PM, David Noltewrote: > Thanks Barry. > By simply replacing chebychev by richardson I get similar performance > with GAMG and ML That too (I assumed you were using the same, I could not see cheby in your view data). I guess SOR works for the coarse grid solver because the coarse grid is small. It should help using lu. > (GAMG even slightly faster): > This is "random" fluctuations. > > -pc_type > gamg > > > > -pc_gamg_type > agg > > > > -pc_gamg_threshold > 0.03 > > > > -pc_gamg_square_graph 10 > -pc_gamg_sym_graph > -mg_levels_ksp_type > richardson > > > > -mg_levels_pc_type sor > > Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix > is asymmetric? yes, > For serial runs it doesn't seem to matter, yes, > but in > parallel the PC setup hangs (after calls of > PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. > yep, > > David > > > On 10/21/2017 12:10 AM, Barry Smith wrote: > > David, > > > >GAMG picks the number of levels based on how the coarsening process > etc proceeds. You cannot hardwire it to a particular value. You can run > with -info to get more info potentially on the decisions GAMG is making. > > > > Barry > > > >> On Oct 20, 2017, at 2:06 PM, David Nolte wrote: > >> > >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > >> was not taken into account: > >> type: gamg > >> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> > >> > >> > >> On 10/20/2017 03:32 PM, David Nolte wrote: > >>> Dear all, > >>> > >>> I have some problems using GAMG as a preconditioner for (F)GMRES. > >>> Background: I am solving the incompressible, unsteady Navier-Stokes > >>> equations with a coupled mixed FEM approach, using P1/P1 elements for > >>> velocity and pressure on an unstructured tetrahedron mesh with about > >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > >>> hence, no zeros on the diagonal of the pressure block. Time > >>> discretization with semi-implicit backward Euler. The flow is a > >>> convection dominated flow through a nozzle. > >>> > >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML > >>> solver for the full system (rather bruteforce, I admit, but much faster > >>> than any block/Schur preconditioners I tried): > >>> > >>> -ksp_converged_reason > >>> -ksp_monitor_true_residual > >>> -ksp_type fgmres > >>> -ksp_rtol 1.0e-6 > >>> -ksp_initial_guess_nonzero > >>> > >>> -pc_type ml > >>> -pc_ml_Threshold 0.03 > >>> -pc_ml_maxNlevels 3 > >>> > >>> This setup converges in ~100 iterations (see below the ksp_view output) > >>> to rtol: > >>> > >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >>> > >>> > >>> Now I'd like to try GAMG instead of ML. However, I don't know how to > set > >>> it up to get similar performance. > >>> The obvious/naive > >>> > >>> -pc_type gamg > >>> -pc_gamg_type agg > >>> > >>> # with and without > >>> -pc_gamg_threshold 0.03 > >>> -pc_mg_levels 3 > >>> > >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per > >>> proc), for instance: > >>> np = 1: > >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >>> > >>> np = 8: > >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >>> > >>> A very high threshold seems to improve the GAMG PC, for instance with > >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >>> What else should I try? > >>> > >>> I would very much appreciate any advice on configuring GAMG and > >>> differences w.r.t ML to be taken into account (not a multigrid expert > >>> though). > >>> > >>> Thanks, best wishes > >>> David > >>> > >>> > >>> -- > >>> ksp_view for -pc_type gamg
Re: [petsc-users] GAMG advice
On Fri, Oct 20, 2017 at 11:10 PM, Barry Smithwrote: > > David, > >GAMG picks the number of levels based on how the coarsening process etc > proceeds. You cannot hardwire it to a particular value. Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend using it. > You can run with -info to get more info potentially on the decisions GAMG > is making. > this is noisy but grep on GAMG and you will see the levels and sizes, etc. > > Barry > > > On Oct 20, 2017, at 2:06 PM, David Nolte wrote: > > > > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > > was not taken into account: > > type: gamg > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > > > > > > On 10/20/2017 03:32 PM, David Nolte wrote: > >> Dear all, > >> > >> I have some problems using GAMG as a preconditioner for (F)GMRES. > >> Background: I am solving the incompressible, unsteady Navier-Stokes > >> equations with a coupled mixed FEM approach, using P1/P1 elements for > >> velocity and pressure on an unstructured tetrahedron mesh with about > >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > >> hence, no zeros on the diagonal of the pressure block. Time > >> discretization with semi-implicit backward Euler. The flow is a > >> convection dominated flow through a nozzle. > >> > >> So far, for this setup, I have been quite happy with a simple FGMRES/ML > >> solver for the full system (rather bruteforce, I admit, but much faster > >> than any block/Schur preconditioners I tried): > >> > >> -ksp_converged_reason > >> -ksp_monitor_true_residual > >> -ksp_type fgmres > >> -ksp_rtol 1.0e-6 > >> -ksp_initial_guess_nonzero > >> > >> -pc_type ml > >> -pc_ml_Threshold 0.03 > >> -pc_ml_maxNlevels 3 > >> > >> This setup converges in ~100 iterations (see below the ksp_view output) > >> to rtol: > >> > >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > >> > >> > >> Now I'd like to try GAMG instead of ML. However, I don't know how to set > >> it up to get similar performance. > >> The obvious/naive > >> > >> -pc_type gamg > >> -pc_gamg_type agg > >> > >> # with and without > >> -pc_gamg_threshold 0.03 > >> -pc_mg_levels 3 > >> > >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per > >> proc), for instance: > >> np = 1: > >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > >> > >> np = 8: > >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > >> > >> A very high threshold seems to improve the GAMG PC, for instance with > >> 0.75 I get convergence to rtol=1e-6 after 744 iterations. > >> What else should I try? > >> > >> I would very much appreciate any advice on configuring GAMG and > >> differences w.r.t ML to be taken into account (not a multigrid expert > >> though). > >> > >> Thanks, best wishes > >> David > >> > >> > >> -- > >> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > >> > >> KSP Object: 1 MPI processes > >> type: fgmres > >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >> Orthogonalization with no iterative refinement > >> GMRES: happy breakdown tolerance 1e-30 > >> maximum iterations=1 > >> tolerances: relative=1e-06, absolute=1e-50, divergence=1. > >> right preconditioning > >> using nonzero initial guess > >> using UNPRECONDITIONED norm type for convergence test > >> PC Object: 1 MPI processes > >> type: gamg > >> MG: type is MULTIPLICATIVE, levels=1 cycles=v > >> Cycles per PCApply=1 > >> Using Galerkin computed coarse grid matrices > >> GAMG specific options > >> Threshold for dropping small values from graph 0.75 > >> AGG specific options > >> Symmetric graph false >
Re: [petsc-users] GAMG advice
> > > Now I'd like to try GAMG instead of ML. However, I don't know how to set > it up to get similar performance. > The obvious/naive > > -pc_type gamg > -pc_gamg_type agg > > # with and without > -pc_gamg_threshold 0.03 > -pc_mg_levels 3 > > This looks fine. I would not set the number of levels but if it helps then go for it. > converges very slowly on 1 proc and much worse on 8 (~200k dofs per > proc), for instance: > np = 1: > 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > > np = 8: > 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > > A very high threshold seems to improve the GAMG PC, for instance with > 0.75 I get convergence to rtol=1e-6 after 744 iterations. > What else should I try? > Not sure. ML use the same algorithm as GAMG (so the threshold means the same thing pretty much). ML is a good solver and the leader, Ray Tuminaro, has had a lot of NS experience. But I'm not sure what the differences are that are resulting in this performance. * It looks like you are using sor for the coarse grid solver in gamg: Coarse grid solver -- level --- KSP Object:(mg_levels_0_) 1 MPI processes type: preonly maximum iterations=2, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object:(mg_levels_0_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = You should/must use lu, like in ML. This will kill you. * smoothed aggregation vs unsmoothed: GAMG's view data does not say if it is smoothing. Damn, I need to fix that. For NS, you probably want unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is for this nor do I know the default. It should make a noticable difference (good or bad). * Threshold for dropping small values from graph 0.75 -- this is crazy :) This is all that I can think of now. Mark > > I would very much appreciate any advice on configuring GAMG and > differences w.r.t ML to be taken into account (not a multigrid expert > though). > > Thanks, best wishes > David > > > -- > ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > > KSP Object: 1 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1 > tolerances: relative=1e-06, absolute=1e-50, divergence=1. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=1 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0.75 > AGG specific options > Symmetric graph false > Coarse grid solver -- level --- > KSP Object:(mg_levels_0_) 1 MPI processes > type: preonly > maximum iterations=2, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object:(mg_levels_0_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = > 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1037847 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1037847 nodes, limit used is 5 > > > -- > ksp_view for -pc_type ml: > > KSP Object: 8 MPI processes > type: fgmres > GMRES:
Re: [petsc-users] GAMG advice
Thanks Barry. By simply replacing chebychev by richardson I get similar performance with GAMG and ML (GAMG even slightly faster): -pc_type gamg -pc_gamg_type agg -pc_gamg_threshold 0.03 -pc_gamg_square_graph 10 -pc_gamg_sym_graph -mg_levels_ksp_type richardson -mg_levels_pc_type sor Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix is asymmetric? For serial runs it doesn't seem to matter, but in parallel the PC setup hangs (after calls of PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. David On 10/21/2017 12:10 AM, Barry Smith wrote: > David, > >GAMG picks the number of levels based on how the coarsening process etc > proceeds. You cannot hardwire it to a particular value. You can run with > -info to get more info potentially on the decisions GAMG is making. > > Barry > >> On Oct 20, 2017, at 2:06 PM, David Noltewrote: >> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option >> was not taken into account: >> type: gamg >> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> >> >> >> On 10/20/2017 03:32 PM, David Nolte wrote: >>> Dear all, >>> >>> I have some problems using GAMG as a preconditioner for (F)GMRES. >>> Background: I am solving the incompressible, unsteady Navier-Stokes >>> equations with a coupled mixed FEM approach, using P1/P1 elements for >>> velocity and pressure on an unstructured tetrahedron mesh with about >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, >>> hence, no zeros on the diagonal of the pressure block. Time >>> discretization with semi-implicit backward Euler. The flow is a >>> convection dominated flow through a nozzle. >>> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML >>> solver for the full system (rather bruteforce, I admit, but much faster >>> than any block/Schur preconditioners I tried): >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-6 >>> -ksp_initial_guess_nonzero >>> >>> -pc_type ml >>> -pc_ml_Threshold 0.03 >>> -pc_ml_maxNlevels 3 >>> >>> This setup converges in ~100 iterations (see below the ksp_view output) >>> to rtol: >>> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 >>> >>> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to set >>> it up to get similar performance. >>> The obvious/naive >>> >>> -pc_type gamg >>> -pc_gamg_type agg >>> >>> # with and without >>> -pc_gamg_threshold 0.03 >>> -pc_mg_levels 3 >>> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per >>> proc), for instance: >>> np = 1: >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 >>> >>> np = 8: >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> >>> A very high threshold seems to improve the GAMG PC, for instance with >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. >>> What else should I try? >>> >>> I would very much appreciate any advice on configuring GAMG and >>> differences w.r.t ML to be taken into account (not a multigrid expert >>> though). >>> >>> Thanks, best wishes >>> David >>> >>> >>> -- >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75
Re: [petsc-users] GAMG advice
David, GAMG picks the number of levels based on how the coarsening process etc proceeds. You cannot hardwire it to a particular value. You can run with -info to get more info potentially on the decisions GAMG is making. Barry > On Oct 20, 2017, at 2:06 PM, David Noltewrote: > > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option > was not taken into account: > type: gamg > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > > On 10/20/2017 03:32 PM, David Nolte wrote: >> Dear all, >> >> I have some problems using GAMG as a preconditioner for (F)GMRES. >> Background: I am solving the incompressible, unsteady Navier-Stokes >> equations with a coupled mixed FEM approach, using P1/P1 elements for >> velocity and pressure on an unstructured tetrahedron mesh with about >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, >> hence, no zeros on the diagonal of the pressure block. Time >> discretization with semi-implicit backward Euler. The flow is a >> convection dominated flow through a nozzle. >> >> So far, for this setup, I have been quite happy with a simple FGMRES/ML >> solver for the full system (rather bruteforce, I admit, but much faster >> than any block/Schur preconditioners I tried): >> >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_type fgmres >> -ksp_rtol 1.0e-6 >> -ksp_initial_guess_nonzero >> >> -pc_type ml >> -pc_ml_Threshold 0.03 >> -pc_ml_maxNlevels 3 >> >> This setup converges in ~100 iterations (see below the ksp_view output) >> to rtol: >> >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 >> >> >> Now I'd like to try GAMG instead of ML. However, I don't know how to set >> it up to get similar performance. >> The obvious/naive >> >> -pc_type gamg >> -pc_gamg_type agg >> >> # with and without >> -pc_gamg_threshold 0.03 >> -pc_mg_levels 3 >> >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per >> proc), for instance: >> np = 1: >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 >> >> np = 8: >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >> >> A very high threshold seems to improve the GAMG PC, for instance with >> 0.75 I get convergence to rtol=1e-6 after 744 iterations. >> What else should I try? >> >> I would very much appreciate any advice on configuring GAMG and >> differences w.r.t ML to be taken into account (not a multigrid expert >> though). >> >> Thanks, best wishes >> David >> >> >> -- >> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 >> >> KSP Object: 1 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=1 >> tolerances: relative=1e-06, absolute=1e-50, divergence=1. >> right preconditioning >> using nonzero initial guess >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: gamg >> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> GAMG specific options >> Threshold for dropping small values from graph 0.75 >> AGG specific options >> Symmetric graph false >> Coarse grid solver -- level --- >> KSP Object:(mg_levels_0_) 1 MPI processes >> type: preonly >> maximum iterations=2, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=1. >> left preconditioning >> using NONE norm type for convergence test >> PC Object:(mg_levels_0_) 1 MPI processes >> type: sor >> SOR: type =
Re: [petsc-users] GAMG advice
PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option was not taken into account: type: gamg MG: type is MULTIPLICATIVE, levels=1 cycles=v On 10/20/2017 03:32 PM, David Nolte wrote: > Dear all, > > I have some problems using GAMG as a preconditioner for (F)GMRES. > Background: I am solving the incompressible, unsteady Navier-Stokes > equations with a coupled mixed FEM approach, using P1/P1 elements for > velocity and pressure on an unstructured tetrahedron mesh with about > 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, > hence, no zeros on the diagonal of the pressure block. Time > discretization with semi-implicit backward Euler. The flow is a > convection dominated flow through a nozzle. > > So far, for this setup, I have been quite happy with a simple FGMRES/ML > solver for the full system (rather bruteforce, I admit, but much faster > than any block/Schur preconditioners I tried): > > -ksp_converged_reason > -ksp_monitor_true_residual > -ksp_type fgmres > -ksp_rtol 1.0e-6 > -ksp_initial_guess_nonzero > > -pc_type ml > -pc_ml_Threshold 0.03 > -pc_ml_maxNlevels 3 > > This setup converges in ~100 iterations (see below the ksp_view output) > to rtol: > > 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm > 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 > 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm > 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 > 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm > 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 > 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm > 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 > > > Now I'd like to try GAMG instead of ML. However, I don't know how to set > it up to get similar performance. > The obvious/naive > > -pc_type gamg > -pc_gamg_type agg > > # with and without > -pc_gamg_threshold 0.03 > -pc_mg_levels 3 > > converges very slowly on 1 proc and much worse on 8 (~200k dofs per > proc), for instance: > np = 1: > 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm > 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 > 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm > 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 > 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm > 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 > > np = 8: > 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm > 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 > 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm > 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 > > A very high threshold seems to improve the GAMG PC, for instance with > 0.75 I get convergence to rtol=1e-6 after 744 iterations. > What else should I try? > > I would very much appreciate any advice on configuring GAMG and > differences w.r.t ML to be taken into account (not a multigrid expert > though). > > Thanks, best wishes > David > > > -- > ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 > > KSP Object: 1 MPI processes > type: fgmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1 > tolerances: relative=1e-06, absolute=1e-50, divergence=1. > right preconditioning > using nonzero initial guess > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=1 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0.75 > AGG specific options > Symmetric graph false > Coarse grid solver -- level --- > KSP Object: (mg_levels_0_) 1 MPI processes > type: preonly > maximum iterations=2, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=1. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_0_) 1 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = > 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1745224, cols=1745224 > total: nonzeros=99452608, allocated nonzeros=99452608 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1037847 nodes, limit used is 5 > linear system matrix = precond matrix:
[petsc-users] GAMG advice
Dear all, I have some problems using GAMG as a preconditioner for (F)GMRES. Background: I am solving the incompressible, unsteady Navier-Stokes equations with a coupled mixed FEM approach, using P1/P1 elements for velocity and pressure on an unstructured tetrahedron mesh with about 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, hence, no zeros on the diagonal of the pressure block. Time discretization with semi-implicit backward Euler. The flow is a convection dominated flow through a nozzle. So far, for this setup, I have been quite happy with a simple FGMRES/ML solver for the full system (rather bruteforce, I admit, but much faster than any block/Schur preconditioners I tried): -ksp_converged_reason -ksp_monitor_true_residual -ksp_type fgmres -ksp_rtol 1.0e-6 -ksp_initial_guess_nonzero -pc_type ml -pc_ml_Threshold 0.03 -pc_ml_maxNlevels 3 This setup converges in ~100 iterations (see below the ksp_view output) to rtol: 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 Now I'd like to try GAMG instead of ML. However, I don't know how to set it up to get similar performance. The obvious/naive -pc_type gamg -pc_gamg_type agg # with and without -pc_gamg_threshold 0.03 -pc_mg_levels 3 converges very slowly on 1 proc and much worse on 8 (~200k dofs per proc), for instance: np = 1: 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 np = 8: 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 A very high threshold seems to improve the GAMG PC, for instance with 0.75 I get convergence to rtol=1e-6 after 744 iterations. What else should I try? I would very much appreciate any advice on configuring GAMG and differences w.r.t ML to be taken into account (not a multigrid expert though). Thanks, best wishes David -- ksp_view for -pc_type gamg -pc_gamg_threshold 0.75 -pc_mg_levels 3 KSP Object: 1 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=1 tolerances: relative=1e-06, absolute=1e-50, divergence=1. right preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=1 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0.75 AGG specific options Symmetric graph false Coarse grid solver -- level --- KSP Object: (mg_levels_0_) 1 MPI processes type: preonly maximum iterations=2, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=1. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_0_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 1037847 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1745224, cols=1745224 total: nonzeros=99452608, allocated nonzeros=99452608 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 1037847 nodes, limit used is 5 -- ksp_view for -pc_type ml: KSP Object: 8 MPI processes type: fgmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with