I run the code in the debugger, and you are right with your conjecture. 
For the case without monitoring the richardson KSP, PCApply_HYPRE is 
called once per outer iteration with the following call stack:

KSPSolve (itfunc.c:446)
KSPSolve_Richardson (rich.c:69)
PCApplyRichardson (precon.c:780)
PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597)
PCApply_HYPRE (hypre.c:151)


When the monitor is enabled, PCApply_HYPRE is called twice per outer 
iteration from two different places in KSPSolve_Richardson:

KSPSolve (itfunc.c:446)
KSPSolve_Richardson (rich.c:124)
PCApply (precon.c:384)
PCApply_HYPRE (hypre.c:151)

and

KSPSolve (itfunc.c:446)
KSPSolve_Richardson (rich.c:147)
PCApply (precon.c:384)
PCApply_HYPRE (hypre.c:151)



By the way, regardless of the options I provide to the inner solver 
which we are talking about, I cannot reproduce the small outer iteration 
number I see when using the KSP monitor. Can this be due to the 
tolerances that are set in PCApplyRichardson_HYPRE_BoomerAMG ? What's 
the reason you set the options before calling PCApply_HYPRE, and than 
the same options are reseted before leaving this function?

Thomas

Am 07.11.2012 22:28, schrieb Barry Smith:
>    Thomas,
>
>       I don't have a complete explanation why in this case it changes but I 
> can point you in the right direction of how this happens. You may need to put 
> breakpoints in the debugger to see exactly what goes different with and 
> without that option.
>
>       1) When richardson is used and no monitoring is done then 
> PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG v-cycle. 
>  Note that it changes the tolerance and its before calling PCApply_HYPRE()
>
>       2) When monitoring is turned on we need to compute the residual norm at 
> each iteration so PCApply_HYPRE() is instead called directly by 
> KSPSolve_Richardson() once for each iteration.
>
>      Now since you are trying to use just one smoothing step inside 
> richardson the two approaches (I think) should be identical. Somehow when 
> KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner 
> iterations (iterations on the monitored thing) must be happening, thus 
> leading to a stronger preconditioner and hence less iterations on the entire 
> thing.
>
>      You can run (for example on one process but two is ok also) both cases 
> with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then when 
> it gets to that function do where to see how it is being called. Continue 
> repeatedly to see why the one case triggers more (of these inner calls) then 
> the other case.
>
>      Barry
>
>
>    Depending on the outcome (reason for the difference) I might call this 
> issue a bug or a strange feature. I am leaning toward bug.
>
>
> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at 
> tu-dresden.de> wrote:
>
>> Okay, the outer KSP is as follows:
>>
>> KSP Object:(ns_) 2 MPI processes
>>   type: fgmres
>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>     GMRES: happy breakdown tolerance 1e-30
>>   maximum iterations=100, initial guess is zero
>>   tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>   right preconditioning
>>   has attached null space
>>   using UNPRECONDITIONED norm type for convergence test
>> PC Object:(ns_) 2 MPI processes
>>   type: fieldsplit
>>     FieldSplit with Schur preconditioner, factorization FULL
>>     Preconditioner for the Schur complement formed from the block diagonal 
>> part of A11
>>     Split info:
>>     Split number 0 Defined by IS
>>     Split number 1 Defined by IS
>>     KSP solver for A00 block
>>       KSP Object:      (velocity_)       2 MPI processes
>>         type: richardson
>>           Richardson: damping factor=1
>>         maximum iterations=1, initial guess is zero
>>         tolerances:  relative=0, absolute=1e-14, divergence=10000
>>         left preconditioning
>>         using PRECONDITIONED norm type for convergence test
>>       PC Object:      (velocity_)       2 MPI processes
>>         type: hypre
>>           HYPRE BoomerAMG preconditioning
>>           HYPRE BoomerAMG: Cycle type V
>>           HYPRE BoomerAMG: Maximum number of levels 25
>>           HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>           HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>           HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>           HYPRE BoomerAMG: Interpolation truncation factor 0
>>           HYPRE BoomerAMG: Interpolation: max elements per row 0
>>           HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>           HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>           HYPRE BoomerAMG: Maximum row sums 0.9
>>           HYPRE BoomerAMG: Sweeps down         1
>>           HYPRE BoomerAMG: Sweeps up           1
>>           HYPRE BoomerAMG: Sweeps on coarse    1
>>           HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>           HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>           HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>           HYPRE BoomerAMG: Relax weight  (all)      1
>>           HYPRE BoomerAMG: Outer relax weight (all) 1
>>           HYPRE BoomerAMG: Using CF-relaxation
>>           HYPRE BoomerAMG: Measure type        local
>>           HYPRE BoomerAMG: Coarsen type        Falgout
>>           HYPRE BoomerAMG: Interpolation type  classical
>>         linear system matrix = precond matrix:
>>         Matrix Object:         2 MPI processes
>>           type: mpiaij
>>           rows=2754, cols=2754
>>           total: nonzeros=25026, allocated nonzeros=25026
>>           total number of mallocs used during MatSetValues calls =0
>>             not using I-node (on process 0) routines
>>     KSP solver for S = A11 - A10 inv(A00) A01
>>       KSP Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>         type: preonly
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>         left preconditioning
>>         has attached null space
>>         using NONE norm type for convergence test
>>       PC Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>         type: shell
>>           Shell: no name
>>         linear system matrix followed by preconditioner matrix:
>>         Matrix Object:         2 MPI processes
>>           type: schurcomplement
>>           rows=369, cols=369
>>             Schur complement A11 - A10 inv(A00) A01
>>             A11
>>               Matrix Object:               2 MPI processes
>>                 type: mpiaij
>>                 rows=369, cols=369
>>                 total: nonzeros=0, allocated nonzeros=0
>>                 total number of mallocs used during MatSetValues calls =0
>>                   using I-node (on process 0) routines: found 33 nodes, 
>> limit used is 5
>>             A10
>>               Matrix Object:               2 MPI processes
>>                 type: mpiaij
>>                 rows=369, cols=2754
>>                 total: nonzeros=8973, allocated nonzeros=8973
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node (on process 0) routines
>>             KSP of A00
>>               KSP Object:              (velocity_)               2 MPI 
>> processes
>>                 type: richardson
>>                   Richardson: damping factor=1
>>                 maximum iterations=1, initial guess is zero
>>                 tolerances:  relative=0, absolute=1e-14, divergence=10000
>>                 left preconditioning
>>                 using PRECONDITIONED norm type for convergence test
>>               PC Object:              (velocity_)               2 MPI 
>> processes
>>                 type: hypre
>>                   HYPRE BoomerAMG preconditioning
>>                   HYPRE BoomerAMG: Cycle type V
>>                   HYPRE BoomerAMG: Maximum number of levels 25
>>                   HYPRE BoomerAMG: Maximum number of iterations PER hypre 
>> call 1
>>                   HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>                   HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>                   HYPRE BoomerAMG: Interpolation truncation factor 0
>>                   HYPRE BoomerAMG: Interpolation: max elements per row 0
>>                   HYPRE BoomerAMG: Number of levels of aggressive coarsening >> 0
>>                   HYPRE BoomerAMG: Number of paths for aggressive coarsening 
>> 1
>>                   HYPRE BoomerAMG: Maximum row sums 0.9
>>                   HYPRE BoomerAMG: Sweeps down         1
>>                   HYPRE BoomerAMG: Sweeps up           1
>>                   HYPRE BoomerAMG: Sweeps on coarse    1
>>                   HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>                   HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>                   HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>                   HYPRE BoomerAMG: Relax weight  (all)      1
>>                   HYPRE BoomerAMG: Outer relax weight (all) 1
>>                   HYPRE BoomerAMG: Using CF-relaxation
>>                   HYPRE BoomerAMG: Measure type        local
>>                   HYPRE BoomerAMG: Coarsen type        Falgout
>>                   HYPRE BoomerAMG: Interpolation type  classical
>>                 linear system matrix = precond matrix:
>>                 Matrix Object:                 2 MPI processes
>>                   type: mpiaij
>>                   rows=2754, cols=2754
>>                   total: nonzeros=25026, allocated nonzeros=25026
>>                   total number of mallocs used during MatSetValues calls =0
>>                     not using I-node (on process 0) routines
>>             A01
>>               Matrix Object:               2 MPI processes
>>                 type: mpiaij
>>                 rows=2754, cols=369
>>                 total: nonzeros=7883, allocated nonzeros=7883
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node (on process 0) routines
>>         Matrix Object:         2 MPI processes
>>           type: mpiaij
>>           rows=369, cols=369
>>           total: nonzeros=0, allocated nonzeros=0
>>           total number of mallocs used during MatSetValues calls =0
>>             using I-node (on process 0) routines: found 33 nodes, limit used 
>> is 5
>>   linear system matrix = precond matrix:
>>   Matrix Object:   2 MPI processes
>>     type: mpiaij
>>     rows=3123, cols=3123
>>     total: nonzeros=41882, allocated nonzeros=52732
>>     total number of mallocs used during MatSetValues calls =0
>>       not using I-node (on process 0) routines
>>
>>
>>
>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of two 
>> KSP objects "mass_" and "laplace_"
>>
>>
>>
>> KSP Object:(mass_) 2 MPI processes
>>   type: cg
>>   maximum iterations=2
>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>   left preconditioning
>>   using nonzero initial guess
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:(mass_) 2 MPI processes
>>   type: jacobi
>>   linear system matrix = precond matrix:
>>   Matrix Object:   2 MPI processes
>>     type: mpiaij
>>     rows=369, cols=369
>>     total: nonzeros=2385, allocated nonzeros=2506
>>     total number of mallocs used during MatSetValues calls =0
>>       not using I-node (on process 0) routines
>>
>>
>>
>> AND
>>
>>
>>
>> KSP Object:(laplace_) 2 MPI processes
>>   type: richardson
>>     Richardson: damping factor=1
>>   maximum iterations=1
>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>   left preconditioning
>>   has attached null space
>>   using nonzero initial guess
>>   using PRECONDITIONED norm type for convergence test
>> PC Object:(laplace_) 2 MPI processes
>>   type: hypre
>>     HYPRE BoomerAMG preconditioning
>>     HYPRE BoomerAMG: Cycle type V
>>     HYPRE BoomerAMG: Maximum number of levels 25
>>     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>     HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>     HYPRE BoomerAMG: Interpolation truncation factor 0
>>     HYPRE BoomerAMG: Interpolation: max elements per row 0
>>     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>     HYPRE BoomerAMG: Maximum row sums 0.9
>>     HYPRE BoomerAMG: Sweeps down         1
>>     HYPRE BoomerAMG: Sweeps up           1
>>     HYPRE BoomerAMG: Sweeps on coarse    1
>>     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>     HYPRE BoomerAMG: Relax weight  (all)      1
>>     HYPRE BoomerAMG: Outer relax weight (all) 1
>>     HYPRE BoomerAMG: Using CF-relaxation
>>     HYPRE BoomerAMG: Measure type        local
>>     HYPRE BoomerAMG: Coarsen type        Falgout
>>     HYPRE BoomerAMG: Interpolation type  classical
>>   linear system matrix = precond matrix:
>>   Matrix Object:   2 MPI processes
>>     type: mpiaij
>>     rows=369, cols=369
>>     total: nonzeros=1745, allocated nonzeros=2506
>>     total number of mallocs used during MatSetValues calls =0
>>       not using I-node (on process 0) routines
>>
>>
>> The outer iteration count is now influenced when adding 
>> "-laplace_ksp_monitor" to the command line options.
>>
>> Thomas
>>
>>
>> Am 07.11.2012 19:49, schrieb Barry Smith:
>>>      This is normally not expected but might happen under some combination 
>>> of solver options. Please send the output of -ksp_view and the options you 
>>> use and we'll try to understand the situation.
>>>
>>>     Barry
>>>
>>>
>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at 
>>> tu-dresden.de> wrote:
>>>
>>>> I have a very curious behavior in one of my codes: Whenever I enable a KSP 
>>>> Monitor for an inner solver, the outer iteration count goes down from 25 
>>>> to 18! Okay, this is great :) I like it so see iteration counts 
>>>> decreasing, but I would like to know what's going on, and eventually, a 
>>>> KSP monitor should not influence the whole game. An to answer your first 
>>>> question, I run the code through valgrind and its free of any errors. Any 
>>>> idea what to check next? Thanks for any advice.
>>>>
>>>> Thomas
>>>>

Reply via email to