$ python Python 2.7.13 (default, Dec 18 2016, 07:03:39) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. MatMult >>> 3.6272e+03 - 2.0894e+03 1537.7999999999997 KSP Solve >>> 3.6329e+03 - 2.0949e+03 1538.0 >>>
You are right, all the extra time is within the MatMult() so for some reason your shell mat mult is much slower. I cannot guess why unless I can see inside your shell matmult at what it is doing. Make sure your configure options are identical and using same compiler. Barry > On Sep 15, 2017, at 5:08 AM, Federico Golfrè Andreasi > <[email protected]> wrote: > > Hi Barry, > > I have attached an extract of the our program output for both the versions: > PETSc-3.4.4 and PETSc-3.7.3. > > In this program the KSP has a shell matrix as operator and a MPIAIJ matrix as > pre-conditioner. > I was wondering if the slowing is related more on the operations done in the > MatMult of the shell matrix; > because on a test program where I solve a similar system without shell matrix > I do not see the performance degradation. > > Perhaps you could give me some hints, > Thank you and best regards, > Federico > > > > > On 13 September 2017 at 17:58, Barry Smith <[email protected]> wrote: > > > On Sep 13, 2017, at 10:56 AM, Federico Golfrè Andreasi > > <[email protected]> wrote: > > > > Hi Barry, > > > > I understand and perfectly agree with you that the behavior increase after > > the release due to better tuning. > > > > In my case, the difference in the solution is negligible, but the runtime > > increases up to +70% (with the same number of ksp_iterations). > > Ok this is an important (and bad) difference. > > > So I was wondering if maybe there were just some flags related to memory > > preallocation or re-usage of intermediate solution that before was > > defaulted. > > Note likely it is this. > > Are both compiled with the same level of compiler optimization? > > Please run both with -log_summary and send the output, this will tell us > WHAT parts are now slower. > > Barry > > > > > Thank you, > > Federico > > > > > > > > On 13 September 2017 at 17:29, Barry Smith <[email protected]> wrote: > > > > There will likely always be slight differences in convergence over that > > many releases. Lots of little defaults etc get changed over time as we > > learn from users and increase the robustness of the defaults. > > > > So in your case do the differences matter? > > > > 1) What is the time to solution in both cases, is it a few percent > > different or now much slower? > > > > 2) What about number of iterations? Almost identical (say 1 or 2 different) > > or does it now take 30 iterations when it use to take 5? > > > > Barry > > > > > On Sep 13, 2017, at 10:25 AM, Federico Golfrè Andreasi > > > <[email protected]> wrote: > > > > > > Dear PETSc users/developers, > > > > > > I recently switched from PETSc-3.4 to PETSc-3.7 and found that some > > > default setup for the "mg" (mutigrid) preconditioner have changed. > > > > > > We were solving a linear system passing, throug command line, the > > > following options: > > > -ksp_type fgmres > > > -ksp_max_it 100000 > > > -ksp_rtol 0.000001 > > > -pc_type mg > > > -ksp_view > > > > > > The output of the KSP view is as follow: > > > > > > KSP Object: 128 MPI processes > > > type: fgmres > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > > Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=100000, initial guess is zero > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000 > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 128 MPI processes > > > type: mg > > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > Cycles per PCApply=1 > > > Not using Galerkin computed coarse grid matrices > > > Coarse grid solver -- level ------------------------------- > > > KSP Object: (mg_levels_0_) 128 MPI processes > > > type: chebyshev > > > Chebyshev: eigenvalue estimates: min = 0.223549, max = 2.45903 > > > Chebyshev: estimated using: [0 0.1; 0 1.1] > > > KSP Object: (mg_levels_0_est_) 128 MPI processes > > > type: gmres > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > > Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=10, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_levels_0_) 128 MPI processes > > > type: sor > > > SOR: type = local_symmetric, iterations = 1, local iterations > > > = 1, omega = 1 > > > linear system matrix followed by preconditioner matrix: > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_levels_0_) 128 MPI processes > > > type: sor > > > SOR: type = local_symmetric, iterations = 1, local iterations = > > > 1, omega = 1 > > > linear system matrix followed by preconditioner matrix: > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix followed by preconditioner matrix: > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Matrix Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > When I build the same program using PETSc-3.7 and run it with the same > > > options we observe that the runtime increases and the convergence is > > > slightly different. The output of the KSP view is: > > > > > > KSP Object: 128 MPI processes > > > type: fgmres > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > > Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=100000, initial guess is zero > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 128 MPI processes > > > type: mg > > > MG: type is MULTIPLICATIVE, levels=1 cycles=v > > > Cycles per PCApply=1 > > > Not using Galerkin computed coarse grid matrices > > > Coarse grid solver -- level ------------------------------- > > > KSP Object: (mg_levels_0_) 128 MPI processes > > > type: chebyshev > > > Chebyshev: eigenvalue estimates: min = 0.223549, max = 2.45903 > > > Chebyshev: eigenvalues estimated using gmres with translations > > > [0. 0.1; 0. 1.1] > > > KSP Object: (mg_levels_0_esteig_) 128 MPI processes > > > type: gmres > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > > > Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=10, initial guess is zero > > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > maximum iterations=2, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_levels_0_) 128 MPI processes > > > type: sor > > > SOR: type = local_symmetric, iterations = 1, local iterations = > > > 1, omega = 1. > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Mat Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Mat Object: 128 MPI processes > > > type: mpiaij > > > rows=279669, cols=279669 > > > total: nonzeros=6427943, allocated nonzeros=6427943 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > I was able to get a closer solution adding the following options: > > > -mg_levels_0_esteig_ksp_norm_type none > > > -mg_levels_0_esteig_ksp_rtol 1.0e-5 > > > -mg_levels_ksp_max_it 1 > > > > > > But I still can reach the same runtime we were observing with PETSc-3.4, > > > could you please advice me if I should specify any other options? > > > > > > Thank you very much for your support, > > > Federico Golfre' Andreasi > > > > > > > > > > <run_petsc34.txt><run_petsc37.txt>
