I run the code in the debugger, and you are right with your conjecture. For the case without monitoring the richardson KSP, PCApply_HYPRE is called once per outer iteration with the following call stack:
KSPSolve (itfunc.c:446) KSPSolve_Richardson (rich.c:69) PCApplyRichardson (precon.c:780) PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597) PCApply_HYPRE (hypre.c:151) When the monitor is enabled, PCApply_HYPRE is called twice per outer iteration from two different places in KSPSolve_Richardson: KSPSolve (itfunc.c:446) KSPSolve_Richardson (rich.c:124) PCApply (precon.c:384) PCApply_HYPRE (hypre.c:151) and KSPSolve (itfunc.c:446) KSPSolve_Richardson (rich.c:147) PCApply (precon.c:384) PCApply_HYPRE (hypre.c:151) By the way, regardless of the options I provide to the inner solver which we are talking about, I cannot reproduce the small outer iteration number I see when using the KSP monitor. Can this be due to the tolerances that are set in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the options before calling PCApply_HYPRE, and than the same options are reseted before leaving this function? Thomas Am 07.11.2012 22:28, schrieb Barry Smith: > Thomas, > > I don't have a complete explanation why in this case it changes but I > can point you in the right direction of how this happens. You may need to put > breakpoints in the debugger to see exactly what goes different with and > without that option. > > 1) When richardson is used and no monitoring is done then > PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG v-cycle. > Note that it changes the tolerance and its before calling PCApply_HYPRE() > > 2) When monitoring is turned on we need to compute the residual norm at > each iteration so PCApply_HYPRE() is instead called directly by > KSPSolve_Richardson() once for each iteration. > > Now since you are trying to use just one smoothing step inside > richardson the two approaches (I think) should be identical. Somehow when > KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner > iterations (iterations on the monitored thing) must be happening, thus > leading to a stronger preconditioner and hence less iterations on the entire > thing. > > You can run (for example on one process but two is ok also) both cases > with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then when > it gets to that function do where to see how it is being called. Continue > repeatedly to see why the one case triggers more (of these inner calls) then > the other case. > > Barry > > > Depending on the outcome (reason for the difference) I might call this > issue a bug or a strange feature. I am leaning toward bug. > > > On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at > tu-dresden.de> wrote: > >> Okay, the outer KSP is as follows: >> >> KSP Object:(ns_) 2 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=100, initial guess is zero >> tolerances: relative=1e-06, absolute=1e-08, divergence=10000 >> right preconditioning >> has attached null space >> using UNPRECONDITIONED norm type for convergence test >> PC Object:(ns_) 2 MPI processes >> type: fieldsplit >> FieldSplit with Schur preconditioner, factorization FULL >> Preconditioner for the Schur complement formed from the block diagonal >> part of A11 >> Split info: >> Split number 0 Defined by IS >> Split number 1 Defined by IS >> KSP solver for A00 block >> KSP Object: (velocity_) 2 MPI processes >> type: richardson >> Richardson: damping factor=1 >> maximum iterations=1, initial guess is zero >> tolerances: relative=0, absolute=1e-14, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: (velocity_) 2 MPI processes >> type: hypre >> HYPRE BoomerAMG preconditioning >> HYPRE BoomerAMG: Cycle type V >> HYPRE BoomerAMG: Maximum number of levels 25 >> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >> HYPRE BoomerAMG: Interpolation truncation factor 0 >> HYPRE BoomerAMG: Interpolation: max elements per row 0 >> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >> HYPRE BoomerAMG: Maximum row sums 0.9 >> HYPRE BoomerAMG: Sweeps down 1 >> HYPRE BoomerAMG: Sweeps up 1 >> HYPRE BoomerAMG: Sweeps on coarse 1 >> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >> HYPRE BoomerAMG: Relax weight (all) 1 >> HYPRE BoomerAMG: Outer relax weight (all) 1 >> HYPRE BoomerAMG: Using CF-relaxation >> HYPRE BoomerAMG: Measure type local >> HYPRE BoomerAMG: Coarsen type Falgout >> HYPRE BoomerAMG: Interpolation type classical >> linear system matrix = precond matrix: >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=2754, cols=2754 >> total: nonzeros=25026, allocated nonzeros=25026 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> KSP solver for S = A11 - A10 inv(A00) A01 >> KSP Object: (ns_fieldsplit_pressure_) 2 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> left preconditioning >> has attached null space >> using NONE norm type for convergence test >> PC Object: (ns_fieldsplit_pressure_) 2 MPI processes >> type: shell >> Shell: no name >> linear system matrix followed by preconditioner matrix: >> Matrix Object: 2 MPI processes >> type: schurcomplement >> rows=369, cols=369 >> Schur complement A11 - A10 inv(A00) A01 >> A11 >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=369, cols=369 >> total: nonzeros=0, allocated nonzeros=0 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 33 nodes, >> limit used is 5 >> A10 >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=369, cols=2754 >> total: nonzeros=8973, allocated nonzeros=8973 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> KSP of A00 >> KSP Object: (velocity_) 2 MPI >> processes >> type: richardson >> Richardson: damping factor=1 >> maximum iterations=1, initial guess is zero >> tolerances: relative=0, absolute=1e-14, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: (velocity_) 2 MPI >> processes >> type: hypre >> HYPRE BoomerAMG preconditioning >> HYPRE BoomerAMG: Cycle type V >> HYPRE BoomerAMG: Maximum number of levels 25 >> HYPRE BoomerAMG: Maximum number of iterations PER hypre >> call 1 >> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >> HYPRE BoomerAMG: Interpolation truncation factor 0 >> HYPRE BoomerAMG: Interpolation: max elements per row 0 >> HYPRE BoomerAMG: Number of levels of aggressive coarsening >> 0 >> HYPRE BoomerAMG: Number of paths for aggressive coarsening >> 1 >> HYPRE BoomerAMG: Maximum row sums 0.9 >> HYPRE BoomerAMG: Sweeps down 1 >> HYPRE BoomerAMG: Sweeps up 1 >> HYPRE BoomerAMG: Sweeps on coarse 1 >> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >> HYPRE BoomerAMG: Relax weight (all) 1 >> HYPRE BoomerAMG: Outer relax weight (all) 1 >> HYPRE BoomerAMG: Using CF-relaxation >> HYPRE BoomerAMG: Measure type local >> HYPRE BoomerAMG: Coarsen type Falgout >> HYPRE BoomerAMG: Interpolation type classical >> linear system matrix = precond matrix: >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=2754, cols=2754 >> total: nonzeros=25026, allocated nonzeros=25026 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> A01 >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=2754, cols=369 >> total: nonzeros=7883, allocated nonzeros=7883 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=369, cols=369 >> total: nonzeros=0, allocated nonzeros=0 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 33 nodes, limit used >> is 5 >> linear system matrix = precond matrix: >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=3123, cols=3123 >> total: nonzeros=41882, allocated nonzeros=52732 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> >> >> >> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of two >> KSP objects "mass_" and "laplace_" >> >> >> >> KSP Object:(mass_) 2 MPI processes >> type: cg >> maximum iterations=2 >> tolerances: relative=0, absolute=1e-14, divergence=10000 >> left preconditioning >> using nonzero initial guess >> using PRECONDITIONED norm type for convergence test >> PC Object:(mass_) 2 MPI processes >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=369, cols=369 >> total: nonzeros=2385, allocated nonzeros=2506 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> >> >> >> AND >> >> >> >> KSP Object:(laplace_) 2 MPI processes >> type: richardson >> Richardson: damping factor=1 >> maximum iterations=1 >> tolerances: relative=0, absolute=1e-14, divergence=10000 >> left preconditioning >> has attached null space >> using nonzero initial guess >> using PRECONDITIONED norm type for convergence test >> PC Object:(laplace_) 2 MPI processes >> type: hypre >> HYPRE BoomerAMG preconditioning >> HYPRE BoomerAMG: Cycle type V >> HYPRE BoomerAMG: Maximum number of levels 25 >> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >> HYPRE BoomerAMG: Interpolation truncation factor 0 >> HYPRE BoomerAMG: Interpolation: max elements per row 0 >> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >> HYPRE BoomerAMG: Maximum row sums 0.9 >> HYPRE BoomerAMG: Sweeps down 1 >> HYPRE BoomerAMG: Sweeps up 1 >> HYPRE BoomerAMG: Sweeps on coarse 1 >> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >> HYPRE BoomerAMG: Relax weight (all) 1 >> HYPRE BoomerAMG: Outer relax weight (all) 1 >> HYPRE BoomerAMG: Using CF-relaxation >> HYPRE BoomerAMG: Measure type local >> HYPRE BoomerAMG: Coarsen type Falgout >> HYPRE BoomerAMG: Interpolation type classical >> linear system matrix = precond matrix: >> Matrix Object: 2 MPI processes >> type: mpiaij >> rows=369, cols=369 >> total: nonzeros=1745, allocated nonzeros=2506 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> >> >> The outer iteration count is now influenced when adding >> "-laplace_ksp_monitor" to the command line options. >> >> Thomas >> >> >> Am 07.11.2012 19:49, schrieb Barry Smith: >>> This is normally not expected but might happen under some combination >>> of solver options. Please send the output of -ksp_view and the options you >>> use and we'll try to understand the situation. >>> >>> Barry >>> >>> >>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at >>> tu-dresden.de> wrote: >>> >>>> I have a very curious behavior in one of my codes: Whenever I enable a KSP >>>> Monitor for an inner solver, the outer iteration count goes down from 25 >>>> to 18! Okay, this is great :) I like it so see iteration counts >>>> decreasing, but I would like to know what's going on, and eventually, a >>>> KSP monitor should not influence the whole game. An to answer your first >>>> question, I run the code through valgrind and its free of any errors. Any >>>> idea what to check next? Thanks for any advice. >>>> >>>> Thomas >>>>
