Re: [petsc-users] Poor multigrid convergence in parallel

Lawrence Mitchell Mon, 21 Jul 2014 05:12:21 -0700

On 21 Jul 2014, at 12:52, Dave May <dave.mayhe...@gmail.com> wrote:

> 
> -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi 
> -mg_levels_ksp_max_it 2
> 
> then I get identical convergence in serial and parallel
> 
> 
> Good. That's the correct result.
>  
> if, however, I run with
> 
> -pc_type mg -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor 
> -mg_levels_ksp_max_it 2
> (the default according to -ksp_view)
> 
> then I get very differing convergence in serial and parallel as described.
> 
>  
> It's normal that the behaviour is different. The PETSc SOR implementation is 
> not parallel. It only performs SOR on your local subdomain.


Sure, however, with only two subdomains, I was not expecting to see such poor 
behaviour.
Below I show output from a run on 1 process and then two (along with ksp_view) 
for the following options:

 -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type 
sor -ksp_monitor

On 1 process:
  0 KSP Residual norm 5.865090856053e+02 
  1 KSP Residual norm 1.293159126247e+01 
  2 KSP Residual norm 5.181199296299e-01 
  3 KSP Residual norm 1.268870802643e-02 
  4 KSP Residual norm 5.116058930806e-04 
  5 KSP Residual norm 3.735036960550e-05 
  6 KSP Residual norm 1.755288530515e-06 
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=6, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Not using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     1 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     1 MPI processes
      type: lu
        LU: out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
        matrix ordering: nd
        factor fill ratio given 5, needed 3.17724
          Factored matrix follows:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=144, cols=144
              package used to perform factorization: petsc
              total: nonzeros=2904, allocated nonzeros=2904
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       1 MPI processes
        type: seqaij
        rows=144, cols=144
        total: nonzeros=914, allocated nonzeros=0
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     1 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.0999972, max = 1.09997
        Chebyshev: estimated using:  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_est_)         1 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_levels_1_)         1 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=529, cols=529
            total: nonzeros=3521, allocated nonzeros=0
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     1 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       1 MPI processes
        type: seqaij
        rows=529, cols=529
        total: nonzeros=3521, allocated nonzeros=0
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=529, cols=529
    total: nonzeros=3521, allocated nonzeros=0
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines

On 2:

  0 KSP Residual norm 5.867749653193e+02 
  1 KSP Residual norm 1.353369658350e+01 
  2 KSP Residual norm 1.350163644248e+01 
  3 KSP Residual norm 1.007552895680e+01 
  4 KSP Residual norm 1.294191582208e+00 
  5 KSP Residual norm 9.409953768968e-01 
  6 KSP Residual norm 9.409360529590e-01 
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=6, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Not using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     2 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     2 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 2 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5, needed 2.72494
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=144, cols=144
                package used to perform factorization: petsc
                total: nonzeros=2120, allocated nonzeros=2120
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Mat Object:         1 MPI processes
          type: seqaij
          rows=144, cols=144
          total: nonzeros=778, allocated nonzeros=778
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       2 MPI processes
        type: mpiaij
        rows=144, cols=144
        total: nonzeros=778, allocated nonzeros=914
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     2 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.099992, max = 1.09991
        Chebyshev: estimated using:  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_est_)         2 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_levels_1_)         2 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
          linear system matrix = precond matrix:
          Mat Object:           2 MPI processes
            type: mpiaij
            rows=529, cols=529
            total: nonzeros=3253, allocated nonzeros=3521
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     2 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1
      linear system matrix = precond matrix:
      Mat Object:       2 MPI processes
        type: mpiaij
        rows=529, cols=529
        total: nonzeros=3253, allocated nonzeros=3521
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpiaij
    rows=529, cols=529
    total: nonzeros=3253, allocated nonzeros=3521
    total number of mallocs used during MatSetValues calls =0
      not using I-node (on process 0) routines


So notice that in the parallel case the residual reduction was ~10^3, rather 
than ~10^8 for the serial case.

> I see that this is a nested Krylov solve. Using fgmres on the outer sometimes 
> is not enough. I've had problems where I needed to use the more stable 
> orthogonalization routine in gmres.
> 
> Do you also observe different convergence behaviour (serial versus parallel) 
> with these choices
> 1) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 1

Full options are (in addition to the above):

-ksp_type fgmres -pc_mg_levels 2 -ksp_monitor -ksp_max_it 6 -ksp_rtol 1e-8 
-pc_type mg

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.294103921871e+01 
  2 KSP Residual norm 4.325949294172e+00 
  3 KSP Residual norm 1.373260455913e+00 
  4 KSP Residual norm 1.612639229769e-01 
  5 KSP Residual norm 1.896600662807e-02 
  6 KSP Residual norm 5.900847991084e-03 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.242896923248e+01 
  2 KSP Residual norm 1.092088559774e+01 
  3 KSP Residual norm 7.383276000966e+00 
  4 KSP Residual norm 5.634790202135e+00 
  5 KSP Residual norm 4.329897745238e+00 
  6 KSP Residual norm 3.754170628391e+00 


> 2) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 
> 100 -mg_coarse_ksp_gmres_modifiedgramschmidt

1 process:
  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812813e-05 
  6 KSP Residual norm 3.161780444565e-06 

2 processes:
  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058054e+00 
  6 KSP Residual norm 4.265434976636e+00 


> 3) -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812814e-05 
  6 KSP Residual norm 3.161780444567e-06 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058053e+00 
  6 KSP Residual norm 4.265434976635e+00 


> Sure - this wasn't a convergence test. I just wanted to see that the methods 
> which should be identical in serial and parallel are in fact behaving as 
> expected. Seems there are. So I'm included to think the problem is associated 
> with having nested Krylov solves.


My observation appears to be that if I use unpreconditioned chebyshev as a 
smoother, then convergence in serial and parallel is identical and good.  As 
soon as I turn on SOR preconditioning for the smoother, the parallel 
convergence falls to pieces (and the preconditioner becomes indefinite):

e.g. with
-pc_type mg  -ksp_rtol 1e-8 -ksp_max_it 6      -pc_mg_levels 2   -ksp_monitor  
-ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi 
-mg_coarse_ksp_max_it 100 -mg_levels_pc_type none

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.530397174638e+01 
  2 KSP Residual norm 1.027554200472e+00 
  3 KSP Residual norm 3.809236982955e-02 
  4 KSP Residual norm 2.445633720099e-03 
  5 KSP Residual norm 1.192136916270e-04 
  6 KSP Residual norm 7.067629143105e-06 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.530397174638e+01 
  2 KSP Residual norm 1.027554200472e+00 
  3 KSP Residual norm 3.809236982955e-02 
  4 KSP Residual norm 2.445633720099e-03 
  5 KSP Residual norm 1.192136916270e-04 
  6 KSP Residual norm 7.067629143079e-06 

with sor as a preconditioner:

-pc_type mg  -ksp_rtol 1e-8 -ksp_max_it 6      -pc_mg_levels 2   -ksp_monitor  
-ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi 
-mg_coarse_ksp_max_it 100 -mg_levels_pc_type sor

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812814e-05 
  6 KSP Residual norm 3.161780444567e-06

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058053e+00 
  6 KSP Residual norm 4.265434976635e+00 

Maybe it's just that I shouldn't be expecting this to work, but it seems odd to 
me.

Cheers,

Lawrence

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [petsc-users] Poor multigrid convergence in parallel

Reply via email to