[petsc-users] GAMG advice

David Nolte Fri, 20 Oct 2017 11:33:07 -0700

Dear all,

I have some problems using GAMG as a preconditioner for (F)GMRES.
Background: I am solving the incompressible, unsteady Navier-Stokes
equations with a coupled mixed FEM approach, using P1/P1 elements for
velocity and pressure on an unstructured tetrahedron mesh with about
2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
hence, no zeros on the diagonal of the pressure block. Time
discretization with semi-implicit backward Euler. The flow is a
convection dominated flow through a nozzle.


So far, for this setup, I have been quite happy with a simple FGMRES/ML
solver for the full system (rather bruteforce, I admit, but much faster
than any block/Schur preconditioners I tried):

    -ksp_converged_reason
    -ksp_monitor_true_residual
    -ksp_type fgmres
    -ksp_rtol 1.0e-6
    -ksp_initial_guess_nonzero

    -pc_type ml
    -pc_ml_Threshold 0.03
    -pc_ml_maxNlevels 3

This setup converges in ~100 iterations (see below the ksp_view output)
to rtol:

119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07


Now I'd like to try GAMG instead of ML. However, I don't know how to set
it up to get similar performance.
The obvious/naive

    -pc_type gamg
    -pc_gamg_type agg

# with and without
    -pc_gamg_threshold 0.03
    -pc_mg_levels 3

converges very slowly on 1 proc and much worse on 8 (~200k dofs per
proc), for instance:
np = 1:
980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04

np = 8:
980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03

A very high threshold seems to improve the GAMG PC, for instance with
0.75 I get convergence to rtol=1e-6 after 744 iterations.
What else should I try?

I would very much appreciate any advice on configuring GAMG and
differences w.r.t ML to be taken into account (not a multigrid expert
though).

Thanks, best wishes
David


------
ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3

KSP Object: 1 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000
  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: gamg
    MG: type is MULTIPLICATIVE, levels=1 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
      GAMG specific options
        Threshold for dropping small values from graph 0.75
        AGG specific options
          Symmetric graph false
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_levels_0_)     1 MPI processes
      type: preonly
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_0_)     1 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       1 MPI processes
        type: seqaij
        rows=1745224, cols=1745224
        total: nonzeros=99452608, allocated nonzeros=99452608
        total number of mallocs used during MatSetValues calls =0
          using I-node routines: found 1037847 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=1745224, cols=1745224
    total: nonzeros=99452608, allocated nonzeros=99452608
    total number of mallocs used during MatSetValues calls =0
      using I-node routines: found 1037847 nodes, limit used is 5


------
ksp_view for -pc_type ml:

KSP Object: 8 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000
  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: ml
    MG: type is MULTIPLICATIVE, levels=3 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 8 PCs follows
        KSP Object:        (mg_coarse_redundant_)         1 MPI processes
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_redundant_)         1 MPI processes
          type: lu
            LU: out-of-place factorization
            tolerance for zero pivot 2.22045e-14
            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
            matrix ordering: nd
            factor fill ratio given 5., needed 10.4795
              Factored matrix follows:
                Mat Object:                 1 MPI processes
                  type: seqaij
                  rows=6822, cols=6822
                  package used to perform factorization: petsc
                  total: nonzeros=9575688, allocated nonzeros=9575688
                  total number of mallocs used during MatSetValues calls =0
                    not using I-node routines
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=6822, cols=6822
            total: nonzeros=913758, allocated nonzeros=913758
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=6822, cols=6822
        total: nonzeros=913758, allocated nonzeros=913758
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=67087, cols=67087
        total: nonzeros=9722749, allocated nonzeros=9722749
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=1745224, cols=1745224
        total: nonzeros=99452608, allocated nonzeros=99452608
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 126690 nodes,
limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8 MPI processes
    type: mpiaij
    rows=1745224, cols=1745224
    total: nonzeros=99452608, allocated nonzeros=99452608
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 126690 nodes, limit
used is 5

signature.asc
Description: OpenPGP digital signature

[petsc-users] GAMG advice

Reply via email to