On 21 Jul 2014, at 12:52, Dave May <[email protected]> wrote: > > -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi > -mg_levels_ksp_max_it 2 > > then I get identical convergence in serial and parallel > > > Good. That's the correct result. > > if, however, I run with > > -pc_type mg -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor > -mg_levels_ksp_max_it 2 > (the default according to -ksp_view) > > then I get very differing convergence in serial and parallel as described. > > > It's normal that the behaviour is different. The PETSc SOR implementation is > not parallel. It only performs SOR on your local subdomain.
Sure, however, with only two subdomains, I was not expecting to see such poor
behaviour.
Below I show output from a run on 1 process and then two (along with ksp_view)
for the following options:
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type
sor -ksp_monitor
On 1 process:
0 KSP Residual norm 5.865090856053e+02
1 KSP Residual norm 1.293159126247e+01
2 KSP Residual norm 5.181199296299e-01
3 KSP Residual norm 1.268870802643e-02
4 KSP Residual norm 5.116058930806e-04
5 KSP Residual norm 3.735036960550e-05
6 KSP Residual norm 1.755288530515e-06
KSP Object: 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=6, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 3.17724
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
package used to perform factorization: petsc
total: nonzeros=2904, allocated nonzeros=2904
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
total: nonzeros=914, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999972, max = 1.09997
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_1_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=529, cols=529
total: nonzeros=3521, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
not using I-node routines
On 2:
0 KSP Residual norm 5.867749653193e+02
1 KSP Residual norm 1.353369658350e+01
2 KSP Residual norm 1.350163644248e+01
3 KSP Residual norm 1.007552895680e+01
4 KSP Residual norm 1.294191582208e+00
5 KSP Residual norm 9.409953768968e-01
6 KSP Residual norm 9.409360529590e-01
KSP Object: 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=6, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 2 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 2 MPI processes
type: redundant
Redundant preconditioner: First (color=0) of 2 PCs follows
KSP Object: (mg_coarse_redundant_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_redundant_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 2.72494
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
package used to perform factorization: petsc
total: nonzeros=2120, allocated nonzeros=2120
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=144, cols=144
total: nonzeros=778, allocated nonzeros=778
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=144, cols=144
total: nonzeros=778, allocated nonzeros=914
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 2 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.099992, max = 1.09991
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_1_est_) 2 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1,
omega = 1
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
type: mpiaij
rows=529, cols=529
total: nonzeros=3253, allocated nonzeros=3521
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
So notice that in the parallel case the residual reduction was ~10^3, rather
than ~10^8 for the serial case.
> I see that this is a nested Krylov solve. Using fgmres on the outer sometimes
> is not enough. I've had problems where I needed to use the more stable
> orthogonalization routine in gmres.
>
> Do you also observe different convergence behaviour (serial versus parallel)
> with these choices
> 1) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 1
Full options are (in addition to the above):
-ksp_type fgmres -pc_mg_levels 2 -ksp_monitor -ksp_max_it 6 -ksp_rtol 1e-8
-pc_type mg
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.294103921871e+01
2 KSP Residual norm 4.325949294172e+00
3 KSP Residual norm 1.373260455913e+00
4 KSP Residual norm 1.612639229769e-01
5 KSP Residual norm 1.896600662807e-02
6 KSP Residual norm 5.900847991084e-03
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.242896923248e+01
2 KSP Residual norm 1.092088559774e+01
3 KSP Residual norm 7.383276000966e+00
4 KSP Residual norm 5.634790202135e+00
5 KSP Residual norm 4.329897745238e+00
6 KSP Residual norm 3.754170628391e+00
> 2) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it
> 100 -mg_coarse_ksp_gmres_modifiedgramschmidt
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812813e-05
6 KSP Residual norm 3.161780444565e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058054e+00
6 KSP Residual norm 4.265434976636e+00
> 3) -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812814e-05
6 KSP Residual norm 3.161780444567e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058053e+00
6 KSP Residual norm 4.265434976635e+00
> Sure - this wasn't a convergence test. I just wanted to see that the methods
> which should be identical in serial and parallel are in fact behaving as
> expected. Seems there are. So I'm included to think the problem is associated
> with having nested Krylov solves.
My observation appears to be that if I use unpreconditioned chebyshev as a
smoother, then convergence in serial and parallel is identical and good. As
soon as I turn on SOR preconditioning for the smoother, the parallel
convergence falls to pieces (and the preconditioner becomes indefinite):
e.g. with
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor
-ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi
-mg_coarse_ksp_max_it 100 -mg_levels_pc_type none
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.530397174638e+01
2 KSP Residual norm 1.027554200472e+00
3 KSP Residual norm 3.809236982955e-02
4 KSP Residual norm 2.445633720099e-03
5 KSP Residual norm 1.192136916270e-04
6 KSP Residual norm 7.067629143105e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.530397174638e+01
2 KSP Residual norm 1.027554200472e+00
3 KSP Residual norm 3.809236982955e-02
4 KSP Residual norm 2.445633720099e-03
5 KSP Residual norm 1.192136916270e-04
6 KSP Residual norm 7.067629143079e-06
with sor as a preconditioner:
-pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -ksp_monitor
-ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi
-mg_coarse_ksp_max_it 100 -mg_levels_pc_type sor
1 process:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.030455192067e+01
2 KSP Residual norm 4.628068378242e-01
3 KSP Residual norm 1.965313019262e-02
4 KSP Residual norm 1.204109484597e-03
5 KSP Residual norm 5.812650812814e-05
6 KSP Residual norm 3.161780444567e-06
2 processes:
0 KSP Residual norm 2.802543487620e+02
1 KSP Residual norm 1.324768309183e+01
2 KSP Residual norm 1.225921121405e+01
3 KSP Residual norm 1.173286143250e+01
4 KSP Residual norm 7.033886488294e+00
5 KSP Residual norm 4.825036058053e+00
6 KSP Residual norm 4.265434976635e+00
Maybe it's just that I shouldn't be expecting this to work, but it seems odd to
me.
Cheers,
Lawrence
signature.asc
Description: Message signed with OpenPGP using GPGMail
