> On Sep 13, 2017, at 10:56 AM, Federico Golfrè Andreasi 
> <[email protected]> wrote:
> 
> Hi Barry,
> 
> I understand and perfectly agree with you that the behavior increase after 
> the release due to better tuning.
> 
> In my case, the difference in the solution is negligible, but the runtime 
> increases up to +70% (with the same number of ksp_iterations).

  Ok this is an important (and bad) difference. 

> So I was wondering if maybe there were just some flags related to memory 
> preallocation or re-usage of intermediate solution that before was defaulted.

   Note likely it is this. 

   Are both compiled with the same level of compiler optimization?

   Please run both with -log_summary and send the output, this will tell us 
WHAT parts are now slower.

  Barry

> 
> Thank you,
> Federico
> 
> 
> 
> On 13 September 2017 at 17:29, Barry Smith <[email protected]> wrote:
> 
>    There will likely always be slight differences in convergence over that 
> many releases. Lots of little defaults etc get changed over time as we learn 
> from users and increase the robustness of the defaults.
> 
>     So in your case do the differences matter?
> 
> 1) What is the time to solution in both cases, is it a few percent different 
> or now much slower?
> 
> 2) What about number of iterations? Almost identical (say 1 or 2 different) 
> or does it now take 30 iterations when it use to take 5?
> 
>   Barry
> 
> > On Sep 13, 2017, at 10:25 AM, Federico Golfrè Andreasi 
> > <[email protected]> wrote:
> >
> > Dear PETSc users/developers,
> >
> > I recently switched from PETSc-3.4 to PETSc-3.7 and found that some default 
> > setup for the "mg" (mutigrid) preconditioner have changed.
> >
> > We were solving a linear system passing, throug command line, the following 
> > options:
> > -ksp_type      fgmres
> > -ksp_max_it    100000
> > -ksp_rtol      0.000001
> > -pc_type       mg
> > -ksp_view
> >
> > The output of the KSP view is as follow:
> >
> > KSP Object: 128 MPI processes
> >   type: fgmres
> >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
> > Orthogonalization with no iterative refinement
> >     GMRES: happy breakdown tolerance 1e-30
> >   maximum iterations=100000, initial guess is zero
> >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 128 MPI processes
> >   type: mg
> >     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >       Cycles per PCApply=1
> >       Not using Galerkin computed coarse grid matrices
> >   Coarse grid solver -- level -------------------------------
> >     KSP Object:    (mg_levels_0_)     128 MPI processes
> >       type: chebyshev
> >         Chebyshev: eigenvalue estimates:  min = 0.223549, max = 2.45903
> >         Chebyshev: estimated using:  [0 0.1; 0 1.1]
> >         KSP Object:        (mg_levels_0_est_)         128 MPI processes
> >           type: gmres
> >             GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
> > Orthogonalization with no iterative refinement
> >             GMRES: happy breakdown tolerance 1e-30
> >           maximum iterations=10, initial guess is zero
> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >           left preconditioning
> >           using NONE norm type for convergence test
> >         PC Object:        (mg_levels_0_)         128 MPI processes
> >           type: sor
> >             SOR: type = local_symmetric, iterations = 1, local iterations = 
> > 1, omega = 1
> >           linear system matrix followed by preconditioner matrix:
> >           Matrix Object:           128 MPI processes
> >             type: mpiaij
> >             rows=279669, cols=279669
> >             total: nonzeros=6427943, allocated nonzeros=6427943
> >             total number of mallocs used during MatSetValues calls =0
> >               not using I-node (on process 0) routines
> >           Matrix Object:           128 MPI processes
> >             type: mpiaij
> >             rows=279669, cols=279669
> >             total: nonzeros=6427943, allocated nonzeros=6427943
> >             total number of mallocs used during MatSetValues calls =0
> >               not using I-node (on process 0) routines
> >       maximum iterations=1, initial guess is zero
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object:    (mg_levels_0_)     128 MPI processes
> >       type: sor
> >         SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
> > omega = 1
> >       linear system matrix followed by preconditioner matrix:
> >       Matrix Object:       128 MPI processes
> >         type: mpiaij
> >         rows=279669, cols=279669
> >         total: nonzeros=6427943, allocated nonzeros=6427943
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node (on process 0) routines
> >       Matrix Object:       128 MPI processes
> >         type: mpiaij
> >         rows=279669, cols=279669
> >         total: nonzeros=6427943, allocated nonzeros=6427943
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node (on process 0) routines
> >   linear system matrix followed by preconditioner matrix:
> >   Matrix Object:   128 MPI processes
> >     type: mpiaij
> >     rows=279669, cols=279669
> >     total: nonzeros=6427943, allocated nonzeros=6427943
> >     total number of mallocs used during MatSetValues calls =0
> >       not using I-node (on process 0) routines
> >   Matrix Object:   128 MPI processes
> >     type: mpiaij
> >     rows=279669, cols=279669
> >     total: nonzeros=6427943, allocated nonzeros=6427943
> >     total number of mallocs used during MatSetValues calls =0
> >       not using I-node (on process 0) routines
> >
> > When I build the same program using PETSc-3.7 and run it with the same 
> > options we observe that the runtime increases and the convergence is 
> > slightly different. The output of the KSP view is:
> >
> > KSP Object: 128 MPI processes
> >   type: fgmres
> >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
> > Orthogonalization with no iterative refinement
> >     GMRES: happy breakdown tolerance 1e-30
> >   maximum iterations=100000, initial guess is zero
> >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 128 MPI processes
> >   type: mg
> >     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >       Cycles per PCApply=1
> >       Not using Galerkin computed coarse grid matrices
> >   Coarse grid solver -- level -------------------------------
> >     KSP Object:    (mg_levels_0_)     128 MPI processes
> >       type: chebyshev
> >         Chebyshev: eigenvalue estimates:  min = 0.223549, max = 2.45903
> >         Chebyshev: eigenvalues estimated using gmres with translations  [0. 
> > 0.1; 0. 1.1]
> >         KSP Object:        (mg_levels_0_esteig_)         128 MPI processes
> >           type: gmres
> >             GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
> > Orthogonalization with no iterative refinement
> >             GMRES: happy breakdown tolerance 1e-30
> >           maximum iterations=10, initial guess is zero
> >           tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
> >           left preconditioning
> >           using PRECONDITIONED norm type for convergence test
> >       maximum iterations=2, initial guess is zero
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object:    (mg_levels_0_)     128 MPI processes
> >       type: sor
> >         SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
> > omega = 1.
> >       linear system matrix followed by preconditioner matrix:
> >       Mat Object:       128 MPI processes
> >         type: mpiaij
> >         rows=279669, cols=279669
> >         total: nonzeros=6427943, allocated nonzeros=6427943
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node (on process 0) routines
> >       Mat Object:       128 MPI processes
> >         type: mpiaij
> >         rows=279669, cols=279669
> >         total: nonzeros=6427943, allocated nonzeros=6427943
> >         total number of mallocs used during MatSetValues calls =0
> >           not using I-node (on process 0) routines
> >   linear system matrix followed by preconditioner matrix:
> >   Mat Object:   128 MPI processes
> >     type: mpiaij
> >     rows=279669, cols=279669
> >     total: nonzeros=6427943, allocated nonzeros=6427943
> >     total number of mallocs used during MatSetValues calls =0
> >       not using I-node (on process 0) routines
> >   Mat Object:   128 MPI processes
> >     type: mpiaij
> >     rows=279669, cols=279669
> >     total: nonzeros=6427943, allocated nonzeros=6427943
> >     total number of mallocs used during MatSetValues calls =0
> >       not using I-node (on process 0) routines
> >
> > I was able to get a closer solution adding the following options:
> > -mg_levels_0_esteig_ksp_norm_type   none
> > -mg_levels_0_esteig_ksp_rtol        1.0e-5
> > -mg_levels_ksp_max_it               1
> >
> > But I still can reach the same runtime we were observing with PETSc-3.4, 
> > could you please advice me if I should specify any other options?
> >
> > Thank you very much for your support,
> > Federico Golfre' Andreasi
> >
> 
> 

Reply via email to