> On Sep 13, 2017, at 10:56 AM, Federico Golfrè Andreasi > <[email protected]> wrote: > > Hi Barry, > > I understand and perfectly agree with you that the behavior increase after > the release due to better tuning. > > In my case, the difference in the solution is negligible, but the runtime > increases up to +70% (with the same number of ksp_iterations).
Ok this is an important (and bad) difference. > So I was wondering if maybe there were just some flags related to memory > preallocation or re-usage of intermediate solution that before was defaulted. Note likely it is this. Are both compiled with the same level of compiler optimization? Please run both with -log_summary and send the output, this will tell us WHAT parts are now slower. Barry > > Thank you, > Federico > > > > On 13 September 2017 at 17:29, Barry Smith <[email protected]> wrote: > > There will likely always be slight differences in convergence over that > many releases. Lots of little defaults etc get changed over time as we learn > from users and increase the robustness of the defaults. > > So in your case do the differences matter? > > 1) What is the time to solution in both cases, is it a few percent different > or now much slower? > > 2) What about number of iterations? Almost identical (say 1 or 2 different) > or does it now take 30 iterations when it use to take 5? > > Barry > > > On Sep 13, 2017, at 10:25 AM, Federico Golfrè Andreasi > > <[email protected]> wrote: > > > > Dear PETSc users/developers, > > > > I recently switched from PETSc-3.4 to PETSc-3.7 and found that some default > > setup for the "mg" (mutigrid) preconditioner have changed. > > > > We were solving a linear system passing, throug command line, the following > > options: > > -ksp_type fgmres > > -ksp_max_it 100000 > > -ksp_rtol 0.000001 > > -pc_type mg > > -ksp_view > > > > The output of the KSP view is as follow: > > > > KSP Object: 128 MPI processes > > type: fgmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=100000, initial guess is zero > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000 > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 128 MPI processes > > type: mg > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > Cycles per PCApply=1 > > Not using Galerkin computed coarse grid matrices > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_levels_0_) 128 MPI processes > > type: chebyshev > > Chebyshev: eigenvalue estimates: min = 0.223549, max = 2.45903 > > Chebyshev: estimated using: [0 0.1; 0 1.1] > > KSP Object: (mg_levels_0_est_) 128 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_0_) 128 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations = > > 1, omega = 1 > > linear system matrix followed by preconditioner matrix: > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_0_) 128 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > > omega = 1 > > linear system matrix followed by preconditioner matrix: > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix followed by preconditioner matrix: > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Matrix Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > When I build the same program using PETSc-3.7 and run it with the same > > options we observe that the runtime increases and the convergence is > > slightly different. The output of the KSP view is: > > > > KSP Object: 128 MPI processes > > type: fgmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=100000, initial guess is zero > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 128 MPI processes > > type: mg > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > Cycles per PCApply=1 > > Not using Galerkin computed coarse grid matrices > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_levels_0_) 128 MPI processes > > type: chebyshev > > Chebyshev: eigenvalue estimates: min = 0.223549, max = 2.45903 > > Chebyshev: eigenvalues estimated using gmres with translations [0. > > 0.1; 0. 1.1] > > KSP Object: (mg_levels_0_esteig_) 128 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > maximum iterations=2, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_0_) 128 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > > omega = 1. > > linear system matrix followed by preconditioner matrix: > > Mat Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Mat Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix followed by preconditioner matrix: > > Mat Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Mat Object: 128 MPI processes > > type: mpiaij > > rows=279669, cols=279669 > > total: nonzeros=6427943, allocated nonzeros=6427943 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > I was able to get a closer solution adding the following options: > > -mg_levels_0_esteig_ksp_norm_type none > > -mg_levels_0_esteig_ksp_rtol 1.0e-5 > > -mg_levels_ksp_max_it 1 > > > > But I still can reach the same runtime we were observing with PETSc-3.4, > > could you please advice me if I should specify any other options? > > > > Thank you very much for your support, > > Federico Golfre' Andreasi > > > >
