Re: [petsc-users] GAMG advice

2017-11-10 Thread Mark Adams
On Thu, Nov 9, 2017 at 2:19 PM, David Nolte  wrote:

> Hi Mark,
>
> thanks for clarifying.
> When I wrote the initial question I had somehow overlooked the fact that
> the GAMG standard smoother was Chebychev while ML uses SOR. All the other
> comments concerning threshold etc were based on this mistake.
>
> The following settings work quite well, of course LU is used on the coarse
> level.
>
> -pc_type gamg
> -pc_gamg_type agg
> -pc_gamg_threshold 0.03
> -pc_gamg_square_graph 10# no effect ?
> -pc_gamg_sym_graph
> -mg_levels_ksp_type richardson
> -mg_levels_pc_type sor
>
> -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.
>

Looks reasonable. And this smoothing is good for elliptic operators
convergence but it makes the operator more expensive. It's worth doing for
elliptic operators but in my experience not for others. If you convergence
rate does not change then you probably want -pc_gamg_agg_nsmooths 0. This
is a cheaper (if smoothing does not help convergence a lot), simpler method
and want to use it.


>
> The ksp view now looks like this: (does this seem reasonable?)
>
>
> KSP Object: 4 MPI processes
>   type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 4 MPI processes
>   type: gamg
> MG: type is MULTIPLICATIVE, levels=5 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
> Threshold for dropping small values from graph 0.03
> AGG specific options
>   Symmetric graph true
>   Coarse grid solver -- level ---
> KSP Object:(mg_coarse_) 4 MPI processes
>   type: preonly
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(mg_coarse_) 4 MPI processes
>   type: bjacobi
> block Jacobi: number of blocks = 4
> Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object:  (mg_coarse_sub_)   1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object:  (mg_coarse_sub_)   1 MPI processes
> type: lu
>   LU: out-of-place factorization
>   tolerance for zero pivot 2.22045e-14
>   using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>   matrix ordering: nd
>   factor fill ratio given 5., needed 1.
> Factored matrix follows:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=38, cols=38
> package used to perform factorization: petsc
> total: nonzeros=1444, allocated nonzeros=1444
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 8 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=38, cols=38
>   total: nonzeros=1444, allocated nonzeros=1444
>   total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 8 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:   4 MPI processes
> type: mpiaij
> rows=38, cols=38
> total: nonzeros=1444, allocated nonzeros=1444
> total number of mallocs used during MatSetValues calls =0
>   using I-node (on process 0) routines: found 8 nodes, limit used
> is 5
>   Down solver (pre-smoother) on level 1 ---
> KSP Object:(mg_levels_1_) 4 MPI processes
>   type: richardson
> Richardson: damping factor=1.
>   maximum iterations=2
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using nonzero initial guess
>   using NONE norm type for convergence test
> PC Object:(mg_levels_1_) 4 MPI processes
>   type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   4 MPI processes
> type: mpiaij
> rows=168, cols=168
> total: nonzeros=19874, allocated 

Re: [petsc-users] GAMG advice

2017-11-09 Thread David Nolte
Hi Mark,

thanks for clarifying.
When I wrote the initial question I had somehow overlooked the fact that
the GAMG standard smoother was Chebychev while ML uses SOR. All the
other comments concerning threshold etc were based on this mistake.

The following settings work quite well, of course LU is used on the
coarse level.

    -pc_type gamg
    -pc_gamg_type agg
    -pc_gamg_threshold 0.03
    -pc_gamg_square_graph 10        # no effect ?
    -pc_gamg_sym_graph
    -mg_levels_ksp_type richardson
    -mg_levels_pc_type sor

-pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.

The ksp view now looks like this: (does this seem reasonable?)


KSP Object: 4 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1
  tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: gamg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
  Cycles per PCApply=1
  Using Galerkin computed coarse grid matrices
  GAMG specific options
    Threshold for dropping small values from graph 0.03
    AGG specific options
  Symmetric graph true
  Coarse grid solver -- level ---
    KSP Object:    (mg_coarse_) 4 MPI processes
  type: preonly
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
    PC Object:    (mg_coarse_) 4 MPI processes
  type: bjacobi
    block Jacobi: number of blocks = 4
    Local solve is same for all blocks, in the following KSP and PC
objects:
  KSP Object:  (mg_coarse_sub_)   1 MPI processes
    type: preonly
    maximum iterations=1, initial guess is zero
    tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
    left preconditioning
    using NONE norm type for convergence test
  PC Object:  (mg_coarse_sub_)   1 MPI processes
    type: lu
  LU: out-of-place factorization
  tolerance for zero pivot 2.22045e-14
  using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
  matrix ordering: nd
  factor fill ratio given 5., needed 1.
    Factored matrix follows:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=38, cols=38
    package used to perform factorization: petsc
    total: nonzeros=1444, allocated nonzeros=1444
    total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 8 nodes, limit used is 5
    linear system matrix = precond matrix:
    Mat Object: 1 MPI processes
  type: seqaij
  rows=38, cols=38
  total: nonzeros=1444, allocated nonzeros=1444
  total number of mallocs used during MatSetValues calls =0
    using I-node routines: found 8 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=38, cols=38
    total: nonzeros=1444, allocated nonzeros=1444
    total number of mallocs used during MatSetValues calls =0
  using I-node (on process 0) routines: found 8 nodes, limit
used is 5
  Down solver (pre-smoother) on level 1 ---
    KSP Object:    (mg_levels_1_) 4 MPI processes
  type: richardson
    Richardson: damping factor=1.
  maximum iterations=2
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
    PC Object:    (mg_levels_1_) 4 MPI processes
  type: sor
    SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=168, cols=168
    total: nonzeros=19874, allocated nonzeros=19874
    total number of mallocs used during MatSetValues calls =0
  using I-node (on process 0) routines: found 17 nodes, limit
used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
    KSP Object:    (mg_levels_2_) 4 MPI processes
  type: richardson
    Richardson: damping factor=1.
  maximum iterations=2
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
    PC Object:    (mg_levels_2_) 4 MPI processes
  type: 

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
On Wed, Nov 1, 2017 at 5:45 PM, David Nolte  wrote:

> Thanks Barry.
> By simply replacing chebychev by richardson I get similar performance
> with GAMG and ML


That too (I assumed you were using the same, I could not see cheby in your
view data).

I guess SOR works for the coarse grid solver because the coarse grid is
small. It should help using lu.


> (GAMG even slightly faster):
>

This is "random" fluctuations.


>
> -pc_type
> gamg
>
>
>
> -pc_gamg_type
> agg
>
>
>
> -pc_gamg_threshold
> 0.03
>
>
>
> -pc_gamg_square_graph 10
> -pc_gamg_sym_graph
> -mg_levels_ksp_type
> richardson
>
>
>
> -mg_levels_pc_type sor
>
> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
> is asymmetric?


yes,


> For serial runs it doesn't seem to matter,


yes,


> but in
> parallel the PC setup hangs (after calls of
> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>

yep,


>
> David
>
>
> On 10/21/2017 12:10 AM, Barry Smith wrote:
> >   David,
> >
> >GAMG picks the number of levels based on how the coarsening process
> etc proceeds. You cannot hardwire it to a particular value. You can run
> with -info to get more info potentially on the decisions GAMG is making.
> >
> >   Barry
> >
> >> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> >>
> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> >> was not taken into account:
> >> type: gamg
> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>
> >>
> >>
> >> On 10/20/2017 03:32 PM, David Nolte wrote:
> >>> Dear all,
> >>>
> >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >>> Background: I am solving the incompressible, unsteady Navier-Stokes
> >>> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >>> velocity and pressure on an unstructured tetrahedron mesh with about
> >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >>> hence, no zeros on the diagonal of the pressure block. Time
> >>> discretization with semi-implicit backward Euler. The flow is a
> >>> convection dominated flow through a nozzle.
> >>>
> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >>> solver for the full system (rather bruteforce, I admit, but much faster
> >>> than any block/Schur preconditioners I tried):
> >>>
> >>> -ksp_converged_reason
> >>> -ksp_monitor_true_residual
> >>> -ksp_type fgmres
> >>> -ksp_rtol 1.0e-6
> >>> -ksp_initial_guess_nonzero
> >>>
> >>> -pc_type ml
> >>> -pc_ml_Threshold 0.03
> >>> -pc_ml_maxNlevels 3
> >>>
> >>> This setup converges in ~100 iterations (see below the ksp_view output)
> >>> to rtol:
> >>>
> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>>
> >>>
> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to
> set
> >>> it up to get similar performance.
> >>> The obvious/naive
> >>>
> >>> -pc_type gamg
> >>> -pc_gamg_type agg
> >>>
> >>> # with and without
> >>> -pc_gamg_threshold 0.03
> >>> -pc_mg_levels 3
> >>>
> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >>> proc), for instance:
> >>> np = 1:
> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>>
> >>> np = 8:
> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>>
> >>> A very high threshold seems to improve the GAMG PC, for instance with
> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >>> What else should I try?
> >>>
> >>> I would very much appreciate any advice on configuring GAMG and
> >>> differences w.r.t ML to be taken into account (not a multigrid expert
> >>> though).
> >>>
> >>> Thanks, best wishes
> >>> David
> >>>
> >>>
> >>> --
> >>> ksp_view for -pc_type gamg  

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
On Fri, Oct 20, 2017 at 11:10 PM, Barry Smith  wrote:

>
>   David,
>
>GAMG picks the number of levels based on how the coarsening process etc
> proceeds. You cannot hardwire it to a particular value.


Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend
using it.


> You can run with -info to get more info potentially on the decisions GAMG
> is making.
>

this is noisy but grep on GAMG and you will see the levels and sizes, etc.


>
>   Barry
>
> > On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> >
> > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> > was not taken into account:
> > type: gamg
> > MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >
> >
> >
> > On 10/20/2017 03:32 PM, David Nolte wrote:
> >> Dear all,
> >>
> >> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >> Background: I am solving the incompressible, unsteady Navier-Stokes
> >> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >> velocity and pressure on an unstructured tetrahedron mesh with about
> >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >> hence, no zeros on the diagonal of the pressure block. Time
> >> discretization with semi-implicit backward Euler. The flow is a
> >> convection dominated flow through a nozzle.
> >>
> >> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >> solver for the full system (rather bruteforce, I admit, but much faster
> >> than any block/Schur preconditioners I tried):
> >>
> >> -ksp_converged_reason
> >> -ksp_monitor_true_residual
> >> -ksp_type fgmres
> >> -ksp_rtol 1.0e-6
> >> -ksp_initial_guess_nonzero
> >>
> >> -pc_type ml
> >> -pc_ml_Threshold 0.03
> >> -pc_ml_maxNlevels 3
> >>
> >> This setup converges in ~100 iterations (see below the ksp_view output)
> >> to rtol:
> >>
> >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>
> >>
> >> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> >> it up to get similar performance.
> >> The obvious/naive
> >>
> >> -pc_type gamg
> >> -pc_gamg_type agg
> >>
> >> # with and without
> >> -pc_gamg_threshold 0.03
> >> -pc_mg_levels 3
> >>
> >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >> proc), for instance:
> >> np = 1:
> >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>
> >> np = 8:
> >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>
> >> A very high threshold seems to improve the GAMG PC, for instance with
> >> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >> What else should I try?
> >>
> >> I would very much appreciate any advice on configuring GAMG and
> >> differences w.r.t ML to be taken into account (not a multigrid expert
> >> though).
> >>
> >> Thanks, best wishes
> >> David
> >>
> >>
> >> --
> >> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
> >>
> >> KSP Object: 1 MPI processes
> >>   type: fgmres
> >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >> Orthogonalization with no iterative refinement
> >> GMRES: happy breakdown tolerance 1e-30
> >>   maximum iterations=1
> >>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
> >>   right preconditioning
> >>   using nonzero initial guess
> >>   using UNPRECONDITIONED norm type for convergence test
> >> PC Object: 1 MPI processes
> >>   type: gamg
> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>   Cycles per PCApply=1
> >>   Using Galerkin computed coarse grid matrices
> >>   GAMG specific options
> >> Threshold for dropping small values from graph 0.75
> >> AGG specific options
> >>   Symmetric graph false
> 

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
>
>
> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> it up to get similar performance.
> The obvious/naive
>
> -pc_type gamg
> -pc_gamg_type agg
>
> # with and without
> -pc_gamg_threshold 0.03
> -pc_mg_levels 3
>
>
This looks fine. I would not set the number of levels but if it helps then
go for it.


> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> proc), for instance:
> np = 1:
> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>
> np = 8:
> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>
> A very high threshold seems to improve the GAMG PC, for instance with
> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> What else should I try?
>

Not sure. ML use the same algorithm as GAMG (so the threshold means the
same thing pretty much). ML is a good solver and the leader, Ray Tuminaro,
has had a lot of NS experience. But I'm not sure what the differences are
that are resulting in this performance.

* It looks like you are using sor for the coarse grid solver in gamg:

  Coarse grid solver -- level ---
KSP Object:(mg_levels_0_) 1 MPI processes
  type: preonly
  maximum iterations=2, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object:(mg_levels_0_) 1 MPI processes
  type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =

You should/must use lu, like in ML. This will kill you.

* smoothed aggregation vs unsmoothed: GAMG's view data does not say if it
is smoothing. Damn, I need to fix that. For NS, you probably want
unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is
for this nor do I know the default. It should make a noticable difference
(good or bad).

* Threshold for dropping small values from graph 0.75 -- this is crazy :)

This is all that I can think of now.

Mark


>
> I would very much appreciate any advice on configuring GAMG and
> differences w.r.t ML to be taken into account (not a multigrid expert
> though).
>
> Thanks, best wishes
> David
>
>
> --
> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
>
> KSP Object: 1 MPI processes
>   type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: gamg
> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
> Threshold for dropping small values from graph 0.75
> AGG specific options
>   Symmetric graph false
>   Coarse grid solver -- level ---
> KSP Object:(mg_levels_0_) 1 MPI processes
>   type: preonly
>   maximum iterations=2, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(mg_levels_0_) 1 MPI processes
>   type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>
>
> --
> ksp_view for -pc_type ml:
>
> KSP Object: 8 MPI processes
>   type: fgmres
> GMRES: 

Re: [petsc-users] GAMG advice

2017-11-01 Thread David Nolte
Thanks Barry.
By simply replacing chebychev by richardson I get similar performance
with GAMG and ML (GAMG even slightly faster):

-pc_type
gamg
   

-pc_gamg_type
agg 
  

-pc_gamg_threshold
0.03
 

-pc_gamg_square_graph 10
-pc_gamg_sym_graph
-mg_levels_ksp_type
richardson  


-mg_levels_pc_type sor

Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
is asymmetric? For serial runs it doesn't seem to matter, but in
parallel the PC setup hangs (after calls of
PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.

David


On 10/21/2017 12:10 AM, Barry Smith wrote:
>   David,
>
>GAMG picks the number of levels based on how the coarsening process etc 
> proceeds. You cannot hardwire it to a particular value. You can run with 
> -info to get more info potentially on the decisions GAMG is making.
>
>   Barry
>
>> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
>>
>> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
>> was not taken into account:  
>> type: gamg
>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>
>>
>>
>> On 10/20/2017 03:32 PM, David Nolte wrote:
>>> Dear all,
>>>
>>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>>> Background: I am solving the incompressible, unsteady Navier-Stokes
>>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>>> velocity and pressure on an unstructured tetrahedron mesh with about
>>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>>> hence, no zeros on the diagonal of the pressure block. Time
>>> discretization with semi-implicit backward Euler. The flow is a
>>> convection dominated flow through a nozzle.
>>>
>>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>>> solver for the full system (rather bruteforce, I admit, but much faster
>>> than any block/Schur preconditioners I tried):
>>>
>>> -ksp_converged_reason
>>> -ksp_monitor_true_residual
>>> -ksp_type fgmres
>>> -ksp_rtol 1.0e-6
>>> -ksp_initial_guess_nonzero
>>>
>>> -pc_type ml
>>> -pc_ml_Threshold 0.03
>>> -pc_ml_maxNlevels 3
>>>
>>> This setup converges in ~100 iterations (see below the ksp_view output)
>>> to rtol:
>>>
>>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>>>
>>>
>>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>>> it up to get similar performance.
>>> The obvious/naive
>>>
>>> -pc_type gamg
>>> -pc_gamg_type agg
>>>
>>> # with and without
>>> -pc_gamg_threshold 0.03
>>> -pc_mg_levels 3
>>>
>>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>>> proc), for instance:
>>> np = 1:
>>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>>>
>>> np = 8:
>>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>>
>>> A very high threshold seems to improve the GAMG PC, for instance with
>>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>>> What else should I try?
>>>
>>> I would very much appreciate any advice on configuring GAMG and
>>> differences w.r.t ML to be taken into account (not a multigrid expert
>>> though).
>>>
>>> Thanks, best wishes
>>> David
>>>
>>>
>>> --
>>> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 

Re: [petsc-users] GAMG advice

2017-10-20 Thread Barry Smith

  David,

   GAMG picks the number of levels based on how the coarsening process etc 
proceeds. You cannot hardwire it to a particular value. You can run with -info 
to get more info potentially on the decisions GAMG is making.

  Barry

> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> 
> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> was not taken into account:  
> type: gamg
> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> 
> 
> 
> On 10/20/2017 03:32 PM, David Nolte wrote:
>> Dear all,
>> 
>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>> Background: I am solving the incompressible, unsteady Navier-Stokes
>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>> velocity and pressure on an unstructured tetrahedron mesh with about
>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>> hence, no zeros on the diagonal of the pressure block. Time
>> discretization with semi-implicit backward Euler. The flow is a
>> convection dominated flow through a nozzle.
>> 
>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>> solver for the full system (rather bruteforce, I admit, but much faster
>> than any block/Schur preconditioners I tried):
>> 
>> -ksp_converged_reason
>> -ksp_monitor_true_residual
>> -ksp_type fgmres
>> -ksp_rtol 1.0e-6
>> -ksp_initial_guess_nonzero
>> 
>> -pc_type ml
>> -pc_ml_Threshold 0.03
>> -pc_ml_maxNlevels 3
>> 
>> This setup converges in ~100 iterations (see below the ksp_view output)
>> to rtol:
>> 
>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>> 
>> 
>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>> it up to get similar performance.
>> The obvious/naive
>> 
>> -pc_type gamg
>> -pc_gamg_type agg
>> 
>> # with and without
>> -pc_gamg_threshold 0.03
>> -pc_mg_levels 3
>> 
>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>> proc), for instance:
>> np = 1:
>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>> 
>> np = 8:
>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> 
>> A very high threshold seems to improve the GAMG PC, for instance with
>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>> What else should I try?
>> 
>> I would very much appreciate any advice on configuring GAMG and
>> differences w.r.t ML to be taken into account (not a multigrid expert
>> though).
>> 
>> Thanks, best wishes
>> David
>> 
>> 
>> --
>> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
>> 
>> KSP Object: 1 MPI processes
>>   type: fgmres
>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>>   maximum iterations=1
>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>>   right preconditioning
>>   using nonzero initial guess
>>   using UNPRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>   type: gamg
>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>   Cycles per PCApply=1
>>   Using Galerkin computed coarse grid matrices
>>   GAMG specific options
>> Threshold for dropping small values from graph 0.75
>> AGG specific options
>>   Symmetric graph false
>>   Coarse grid solver -- level ---
>> KSP Object:(mg_levels_0_) 1 MPI processes
>>   type: preonly
>>   maximum iterations=2, initial guess is zero
>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>>   left preconditioning
>>   using NONE norm type for convergence test
>> PC Object:(mg_levels_0_) 1 MPI processes
>>   type: sor
>> SOR: type = 

Re: [petsc-users] GAMG advice

2017-10-20 Thread David Nolte
PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
was not taken into account:  
type: gamg
    MG: type is MULTIPLICATIVE, levels=1 cycles=v



On 10/20/2017 03:32 PM, David Nolte wrote:
> Dear all,
>
> I have some problems using GAMG as a preconditioner for (F)GMRES.
> Background: I am solving the incompressible, unsteady Navier-Stokes
> equations with a coupled mixed FEM approach, using P1/P1 elements for
> velocity and pressure on an unstructured tetrahedron mesh with about
> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> hence, no zeros on the diagonal of the pressure block. Time
> discretization with semi-implicit backward Euler. The flow is a
> convection dominated flow through a nozzle.
>
> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> solver for the full system (rather bruteforce, I admit, but much faster
> than any block/Schur preconditioners I tried):
>
>     -ksp_converged_reason
>     -ksp_monitor_true_residual
>     -ksp_type fgmres
>     -ksp_rtol 1.0e-6
>     -ksp_initial_guess_nonzero
>
>     -pc_type ml
>     -pc_ml_Threshold 0.03
>     -pc_ml_maxNlevels 3
>
> This setup converges in ~100 iterations (see below the ksp_view output)
> to rtol:
>
> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>
>
> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> it up to get similar performance.
> The obvious/naive
>
>     -pc_type gamg
>     -pc_gamg_type agg
>
> # with and without
>     -pc_gamg_threshold 0.03
>     -pc_mg_levels 3
>
> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> proc), for instance:
> np = 1:
> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>
> np = 8:
> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>
> A very high threshold seems to improve the GAMG PC, for instance with
> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> What else should I try?
>
> I would very much appreciate any advice on configuring GAMG and
> differences w.r.t ML to be taken into account (not a multigrid expert
> though).
>
> Thanks, best wishes
> David
>
>
> --
> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3
>
> KSP Object: 1 MPI processes
>   type: fgmres
>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: gamg
>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
>     Threshold for dropping small values from graph 0.75
>     AGG specific options
>   Symmetric graph false
>   Coarse grid solver -- level ---
>     KSP Object:    (mg_levels_0_) 1 MPI processes
>   type: preonly
>   maximum iterations=2, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
>     PC Object:    (mg_levels_0_) 1 MPI processes
>   type: sor
>     SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
>     type: seqaij
>     rows=1745224, cols=1745224
>     total: nonzeros=99452608, allocated nonzeros=99452608
>     total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>   linear system matrix = precond matrix:

[petsc-users] GAMG advice

2017-10-20 Thread David Nolte
Dear all,

I have some problems using GAMG as a preconditioner for (F)GMRES.
Background: I am solving the incompressible, unsteady Navier-Stokes
equations with a coupled mixed FEM approach, using P1/P1 elements for
velocity and pressure on an unstructured tetrahedron mesh with about
2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
hence, no zeros on the diagonal of the pressure block. Time
discretization with semi-implicit backward Euler. The flow is a
convection dominated flow through a nozzle.

So far, for this setup, I have been quite happy with a simple FGMRES/ML
solver for the full system (rather bruteforce, I admit, but much faster
than any block/Schur preconditioners I tried):

    -ksp_converged_reason
    -ksp_monitor_true_residual
    -ksp_type fgmres
    -ksp_rtol 1.0e-6
    -ksp_initial_guess_nonzero

    -pc_type ml
    -pc_ml_Threshold 0.03
    -pc_ml_maxNlevels 3

This setup converges in ~100 iterations (see below the ksp_view output)
to rtol:

119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07


Now I'd like to try GAMG instead of ML. However, I don't know how to set
it up to get similar performance.
The obvious/naive

    -pc_type gamg
    -pc_gamg_type agg

# with and without
    -pc_gamg_threshold 0.03
    -pc_mg_levels 3

converges very slowly on 1 proc and much worse on 8 (~200k dofs per
proc), for instance:
np = 1:
980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04

np = 8:
980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03

A very high threshold seems to improve the GAMG PC, for instance with
0.75 I get convergence to rtol=1e-6 after 744 iterations.
What else should I try?

I would very much appreciate any advice on configuring GAMG and
differences w.r.t ML to be taken into account (not a multigrid expert
though).

Thanks, best wishes
David


--
ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3

KSP Object: 1 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1
  tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: gamg
    MG: type is MULTIPLICATIVE, levels=1 cycles=v
  Cycles per PCApply=1
  Using Galerkin computed coarse grid matrices
  GAMG specific options
    Threshold for dropping small values from graph 0.75
    AGG specific options
  Symmetric graph false
  Coarse grid solver -- level ---
    KSP Object:    (mg_levels_0_) 1 MPI processes
  type: preonly
  maximum iterations=2, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
    PC Object:    (mg_levels_0_) 1 MPI processes
  type: sor
    SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=1745224, cols=1745224
    total: nonzeros=99452608, allocated nonzeros=99452608
    total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 1037847 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=1745224, cols=1745224
    total: nonzeros=99452608, allocated nonzeros=99452608
    total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 1037847 nodes, limit used is 5


--
ksp_view for -pc_type ml:

KSP Object: 8 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with