On 21 Jul 2014, at 12:52, Dave May <dave.mayhe...@gmail.com> wrote: > > -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi > -mg_levels_ksp_max_it 2 > > then I get identical convergence in serial and parallel > > > Good. That's the correct result. > > if, however, I run with > > -pc_type mg -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor > -mg_levels_ksp_max_it 2 > (the default according to -ksp_view) > > then I get very differing convergence in serial and parallel as described. > > > It's normal that the behaviour is different. The PETSc SOR implementation is > not parallel. It only performs SOR on your local subdomain.
Sure, however, with only two subdomains, I was not expecting to see such poor behaviour. Below I show output from a run on 1 process and then two (along with ksp_view) for the following options: -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type sor -ksp_monitor On 1 process: 0 KSP Residual norm 5.865090856053e+02 1 KSP Residual norm 1.293159126247e+01 2 KSP Residual norm 5.181199296299e-01 3 KSP Residual norm 1.268870802643e-02 4 KSP Residual norm 5.116058930806e-04 5 KSP Residual norm 3.735036960550e-05 6 KSP Residual norm 1.755288530515e-06 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=6, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=2 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 3.17724 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=144, cols=144 package used to perform factorization: petsc total: nonzeros=2904, allocated nonzeros=2904 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=144, cols=144 total: nonzeros=914, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.0999972, max = 1.09997 Chebyshev: estimated using: [0 0.1; 0 1.1] KSP Object: (mg_levels_1_est_) 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=529, cols=529 total: nonzeros=3521, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 not using I-node routines maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=529, cols=529 total: nonzeros=3521, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=529, cols=529 total: nonzeros=3521, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 not using I-node routines On 2: 0 KSP Residual norm 5.867749653193e+02 1 KSP Residual norm 1.353369658350e+01 2 KSP Residual norm 1.350163644248e+01 3 KSP Residual norm 1.007552895680e+01 4 KSP Residual norm 1.294191582208e+00 5 KSP Residual norm 9.409953768968e-01 6 KSP Residual norm 9.409360529590e-01 KSP Object: 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=6, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 2 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=2 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 2 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 2 MPI processes type: redundant Redundant preconditioner: First (color=0) of 2 PCs follows KSP Object: (mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5, needed 2.72494 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=144, cols=144 package used to perform factorization: petsc total: nonzeros=2120, allocated nonzeros=2120 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=144, cols=144 total: nonzeros=778, allocated nonzeros=778 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=144, cols=144 total: nonzeros=778, allocated nonzeros=914 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 2 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.099992, max = 1.09991 Chebyshev: estimated using: [0 0.1; 0 1.1] KSP Object: (mg_levels_1_est_) 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 2 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=529, cols=529 total: nonzeros=3253, allocated nonzeros=3521 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 2 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1 linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=529, cols=529 total: nonzeros=3253, allocated nonzeros=3521 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=529, cols=529 total: nonzeros=3253, allocated nonzeros=3521 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines So notice that in the parallel case the residual reduction was ~10^3, rather than ~10^8 for the serial case. > I see that this is a nested Krylov solve. Using fgmres on the outer sometimes > is not enough. I've had problems where I needed to use the more stable > orthogonalization routine in gmres. > > Do you also observe different convergence behaviour (serial versus parallel) > with these choices > 1) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 1 Full options are (in addition to the above): -ksp_type fgmres -pc_mg_levels 2 -ksp_monitor -ksp_max_it 6 -ksp_rtol 1e-8 -pc_type mg 1 process: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.294103921871e+01 2 KSP Residual norm 4.325949294172e+00 3 KSP Residual norm 1.373260455913e+00 4 KSP Residual norm 1.612639229769e-01 5 KSP Residual norm 1.896600662807e-02 6 KSP Residual norm 5.900847991084e-03 2 processes: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.242896923248e+01 2 KSP Residual norm 1.092088559774e+01 3 KSP Residual norm 7.383276000966e+00 4 KSP Residual norm 5.634790202135e+00 5 KSP Residual norm 4.329897745238e+00 6 KSP Residual norm 3.754170628391e+00 > 2) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it > 100 -mg_coarse_ksp_gmres_modifiedgramschmidt 1 process: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.030455192067e+01 2 KSP Residual norm 4.628068378242e-01 3 KSP Residual norm 1.965313019262e-02 4 KSP Residual norm 1.204109484597e-03 5 KSP Residual norm 5.812650812813e-05 6 KSP Residual norm 3.161780444565e-06 2 processes: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.324768309183e+01 2 KSP Residual norm 1.225921121405e+01 3 KSP Residual norm 1.173286143250e+01 4 KSP Residual norm 7.033886488294e+00 5 KSP Residual norm 4.825036058054e+00 6 KSP Residual norm 4.265434976636e+00 > 3) -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 1 process: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.030455192067e+01 2 KSP Residual norm 4.628068378242e-01 3 KSP Residual norm 1.965313019262e-02 4 KSP Residual norm 1.204109484597e-03 5 KSP Residual norm 5.812650812814e-05 6 KSP Residual norm 3.161780444567e-06 2 processes: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.324768309183e+01 2 KSP Residual norm 1.225921121405e+01 3 KSP Residual norm 1.173286143250e+01 4 KSP Residual norm 7.033886488294e+00 5 KSP Residual norm 4.825036058053e+00 6 KSP Residual norm 4.265434976635e+00 > Sure - this wasn't a convergence test. I just wanted to see that the methods > which should be identical in serial and parallel are in fact behaving as > expected. Seems there are. So I'm included to think the problem is associated > with having nested Krylov solves. My observation appears to be that if I use unpreconditioned chebyshev as a smoother, then convergence in serial and parallel is identical and good. As soon as I turn on SOR preconditioning for the smoother, the parallel convergence falls to pieces (and the preconditioner becomes indefinite): e.g. with -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type none 1 process: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.530397174638e+01 2 KSP Residual norm 1.027554200472e+00 3 KSP Residual norm 3.809236982955e-02 4 KSP Residual norm 2.445633720099e-03 5 KSP Residual norm 1.192136916270e-04 6 KSP Residual norm 7.067629143105e-06 2 processes: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.530397174638e+01 2 KSP Residual norm 1.027554200472e+00 3 KSP Residual norm 3.809236982955e-02 4 KSP Residual norm 2.445633720099e-03 5 KSP Residual norm 1.192136916270e-04 6 KSP Residual norm 7.067629143079e-06 with sor as a preconditioner: -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type sor 1 process: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.030455192067e+01 2 KSP Residual norm 4.628068378242e-01 3 KSP Residual norm 1.965313019262e-02 4 KSP Residual norm 1.204109484597e-03 5 KSP Residual norm 5.812650812814e-05 6 KSP Residual norm 3.161780444567e-06 2 processes: 0 KSP Residual norm 2.802543487620e+02 1 KSP Residual norm 1.324768309183e+01 2 KSP Residual norm 1.225921121405e+01 3 KSP Residual norm 1.173286143250e+01 4 KSP Residual norm 7.033886488294e+00 5 KSP Residual norm 4.825036058053e+00 6 KSP Residual norm 4.265434976635e+00 Maybe it's just that I shouldn't be expecting this to work, but it seems odd to me. Cheers, Lawrence
signature.asc
Description: Message signed with OpenPGP using GPGMail