[petsc-dev] Iteration counter depends on KSP monitor!

Barry Smith Thu, 8 Nov 2012 14:21:26 -0600

On Nov 8, 2012, at 2:48 AM, Thomas Witkowski <thomas.witkowski at 
tu-dresden.de> wrote:


> Maybe the following information is also useful to find the problem or/and to 
> understand what's going on: The inner solver solves with a Laplace matrix 
> (discretized with free boundaries). So I set the constant nullspace to the 
> KSP object. This inner solver is used inside the Schur complement solver of a 
> PCFIELDSPLIT object. The Schur complement solver (KSPPREONLY and PCSHELL) is 
> also set to have a constant null space. But the vector, on which the PCSHELL 
> is applied, is not orthogonal to the constant null space. Can you help me to 
> understand why this is still the case?
> 
> To go back to the problem of the influence of the KSP monitor to the solution 
> process: When I project out the constant null space before calling the 
> KSPRICHARDSON with PCHYPRE, the monitor has no influence anymore on the 
> solution process.

    Jed figured this out. Our KSP_PCApply() projects out the given null space, 
when hypre is given the job of applying the preconditioner it doesn't know 
about the null space and hence does not remove it; hence it is like you didn't 
provide a null space.

     The fix is that when a null space has been attached to the KSP we should 
always do the standard richardson and not call the "short cut" routine. 

    So now if you are using petsc-dev you should see no difference between 
monitoring or not

   Barry


> 
> Thomas
> 
> 
> Am 08.11.2012 08:27, schrieb Thomas Witkowski:
>> I run the code in the debugger, and you are right with your conjecture. For 
>> the case without monitoring the richardson KSP, PCApply_HYPRE is called once 
>> per outer iteration with the following call stack:
>> 
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:69)
>> PCApplyRichardson (precon.c:780)
>> PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597)
>> PCApply_HYPRE (hypre.c:151)
>> 
>> 
>> When the monitor is enabled, PCApply_HYPRE is called twice per outer 
>> iteration from two different places in KSPSolve_Richardson:
>> 
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:124)
>> PCApply (precon.c:384)
>> PCApply_HYPRE (hypre.c:151)
>> 
>> and
>> 
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:147)
>> PCApply (precon.c:384)
>> PCApply_HYPRE (hypre.c:151)
>> 
>> 
>> 
>> By the way, regardless of the options I provide to the inner solver which we 
>> are talking about, I cannot reproduce the small outer iteration number I see 
>> when using the KSP monitor. Can this be due to the tolerances that are set 
>> in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the options 
>> before calling PCApply_HYPRE, and than the same options are reseted before 
>> leaving this function?
>> 
>> Thomas
>> 
>> Am 07.11.2012 22:28, schrieb Barry Smith:
>>>   Thomas,
>>> 
>>>      I don't have a complete explanation why in this case it changes but I 
>>> can point you in the right direction of how this happens. You may need to 
>>> put breakpoints in the debugger to see exactly what goes different with and 
>>> without that option.
>>> 
>>>      1) When richardson is used and no monitoring is done then 
>>> PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG 
>>> v-cycle.  Note that it changes the tolerance and its before calling 
>>> PCApply_HYPRE()
>>> 
>>>      2) When monitoring is turned on we need to compute the residual norm 
>>> at each iteration so PCApply_HYPRE() is instead called directly by 
>>> KSPSolve_Richardson() once for each iteration.
>>> 
>>>     Now since you are trying to use just one smoothing step inside 
>>> richardson the two approaches (I think) should be identical. Somehow when 
>>> KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner 
>>> iterations (iterations on the monitored thing) must be happening, thus 
>>> leading to a stronger preconditioner and hence less iterations on the 
>>> entire thing.
>>> 
>>>     You can run (for example on one process but two is ok also) both cases 
>>> with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then 
>>> when it gets to that function do where to see how it is being called. 
>>> Continue repeatedly to see why the one case triggers more (of these inner 
>>> calls) then the other case.
>>> 
>>>     Barry
>>> 
>>> 
>>>   Depending on the outcome (reason for the difference) I might call this 
>>> issue a bug or a strange feature. I am leaning toward bug.
>>> 
>>> 
>>> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at 
>>> tu-dresden.de> wrote:
>>> 
>>>> Okay, the outer KSP is as follows:
>>>> 
>>>> KSP Object:(ns_) 2 MPI processes
>>>>  type: fgmres
>>>>    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>>>> Orthogonalization with no iterative refinement
>>>>    GMRES: happy breakdown tolerance 1e-30
>>>>  maximum iterations=100, initial guess is zero
>>>>  tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>>>  right preconditioning
>>>>  has attached null space
>>>>  using UNPRECONDITIONED norm type for convergence test
>>>> PC Object:(ns_) 2 MPI processes
>>>>  type: fieldsplit
>>>>    FieldSplit with Schur preconditioner, factorization FULL
>>>>    Preconditioner for the Schur complement formed from the block diagonal 
>>>> part of A11
>>>>    Split info:
>>>>    Split number 0 Defined by IS
>>>>    Split number 1 Defined by IS
>>>>    KSP solver for A00 block
>>>>      KSP Object:      (velocity_)       2 MPI processes
>>>>        type: richardson
>>>>          Richardson: damping factor=1
>>>>        maximum iterations=1, initial guess is zero
>>>>        tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>        left preconditioning
>>>>        using PRECONDITIONED norm type for convergence test
>>>>      PC Object:      (velocity_)       2 MPI processes
>>>>        type: hypre
>>>>          HYPRE BoomerAMG preconditioning
>>>>          HYPRE BoomerAMG: Cycle type V
>>>>          HYPRE BoomerAMG: Maximum number of levels 25
>>>>          HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>          HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>          HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>          HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>          HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>          HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>          HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>          HYPRE BoomerAMG: Maximum row sums 0.9
>>>>          HYPRE BoomerAMG: Sweeps down         1
>>>>          HYPRE BoomerAMG: Sweeps up           1
>>>>          HYPRE BoomerAMG: Sweeps on coarse    1
>>>>          HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>          HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>          HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>          HYPRE BoomerAMG: Relax weight  (all)      1
>>>>          HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>          HYPRE BoomerAMG: Using CF-relaxation
>>>>          HYPRE BoomerAMG: Measure type        local
>>>>          HYPRE BoomerAMG: Coarsen type        Falgout
>>>>          HYPRE BoomerAMG: Interpolation type  classical
>>>>        linear system matrix = precond matrix:
>>>>        Matrix Object:         2 MPI processes
>>>>          type: mpiaij
>>>>          rows=2754, cols=2754
>>>>          total: nonzeros=25026, allocated nonzeros=25026
>>>>          total number of mallocs used during MatSetValues calls =0
>>>>            not using I-node (on process 0) routines
>>>>    KSP solver for S = A11 - A10 inv(A00) A01
>>>>      KSP Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>        type: preonly
>>>>        maximum iterations=10000, initial guess is zero
>>>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>        left preconditioning
>>>>        has attached null space
>>>>        using NONE norm type for convergence test
>>>>      PC Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>        type: shell
>>>>          Shell: no name
>>>>        linear system matrix followed by preconditioner matrix:
>>>>        Matrix Object:         2 MPI processes
>>>>          type: schurcomplement
>>>>          rows=369, cols=369
>>>>            Schur complement A11 - A10 inv(A00) A01
>>>>            A11
>>>>              Matrix Object:               2 MPI processes
>>>>                type: mpiaij
>>>>                rows=369, cols=369
>>>>                total: nonzeros=0, allocated nonzeros=0
>>>>                total number of mallocs used during MatSetValues calls =0
>>>>                  using I-node (on process 0) routines: found 33 nodes, 
>>>> limit used is 5
>>>>            A10
>>>>              Matrix Object:               2 MPI processes
>>>>                type: mpiaij
>>>>                rows=369, cols=2754
>>>>                total: nonzeros=8973, allocated nonzeros=8973
>>>>                total number of mallocs used during MatSetValues calls =0
>>>>                  not using I-node (on process 0) routines
>>>>            KSP of A00
>>>>              KSP Object: (velocity_)               2 MPI processes
>>>>                type: richardson
>>>>                  Richardson: damping factor=1
>>>>                maximum iterations=1, initial guess is zero
>>>>                tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>                left preconditioning
>>>>                using PRECONDITIONED norm type for convergence test
>>>>              PC Object: (velocity_)               2 MPI processes
>>>>                type: hypre
>>>>                  HYPRE BoomerAMG preconditioning
>>>>                  HYPRE BoomerAMG: Cycle type V
>>>>                  HYPRE BoomerAMG: Maximum number of levels 25
>>>>                  HYPRE BoomerAMG: Maximum number of iterations PER hypre 
>>>> call 1
>>>>                  HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>                  HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>                  HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>                  HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>                  HYPRE BoomerAMG: Number of levels of aggressive 
>>>> coarsening 0
>>>>                  HYPRE BoomerAMG: Number of paths for aggressive 
>>>> coarsening 1
>>>>                  HYPRE BoomerAMG: Maximum row sums 0.9
>>>>                  HYPRE BoomerAMG: Sweeps down         1
>>>>                  HYPRE BoomerAMG: Sweeps up           1
>>>>                  HYPRE BoomerAMG: Sweeps on coarse    1
>>>>                  HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>                  HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>                  HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>                  HYPRE BoomerAMG: Relax weight  (all)      1
>>>>                  HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>                  HYPRE BoomerAMG: Using CF-relaxation
>>>>                  HYPRE BoomerAMG: Measure type        local
>>>>                  HYPRE BoomerAMG: Coarsen type        Falgout
>>>>                  HYPRE BoomerAMG: Interpolation type classical
>>>>                linear system matrix = precond matrix:
>>>>                Matrix Object:                 2 MPI processes
>>>>                  type: mpiaij
>>>>                  rows=2754, cols=2754
>>>>                  total: nonzeros=25026, allocated nonzeros=25026
>>>>                  total number of mallocs used during MatSetValues calls =0
>>>>                    not using I-node (on process 0) routines
>>>>            A01
>>>>              Matrix Object:               2 MPI processes
>>>>                type: mpiaij
>>>>                rows=2754, cols=369
>>>>                total: nonzeros=7883, allocated nonzeros=7883
>>>>                total number of mallocs used during MatSetValues calls =0
>>>>                  not using I-node (on process 0) routines
>>>>        Matrix Object:         2 MPI processes
>>>>          type: mpiaij
>>>>          rows=369, cols=369
>>>>          total: nonzeros=0, allocated nonzeros=0
>>>>          total number of mallocs used during MatSetValues calls =0
>>>>            using I-node (on process 0) routines: found 33 nodes, limit 
>>>> used is 5
>>>>  linear system matrix = precond matrix:
>>>>  Matrix Object:   2 MPI processes
>>>>    type: mpiaij
>>>>    rows=3123, cols=3123
>>>>    total: nonzeros=41882, allocated nonzeros=52732
>>>>    total number of mallocs used during MatSetValues calls =0
>>>>      not using I-node (on process 0) routines
>>>> 
>>>> 
>>>> 
>>>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of 
>>>> two KSP objects "mass_" and "laplace_"
>>>> 
>>>> 
>>>> 
>>>> KSP Object:(mass_) 2 MPI processes
>>>>  type: cg
>>>>  maximum iterations=2
>>>>  tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>  left preconditioning
>>>>  using nonzero initial guess
>>>>  using PRECONDITIONED norm type for convergence test
>>>> PC Object:(mass_) 2 MPI processes
>>>>  type: jacobi
>>>>  linear system matrix = precond matrix:
>>>>  Matrix Object:   2 MPI processes
>>>>    type: mpiaij
>>>>    rows=369, cols=369
>>>>    total: nonzeros=2385, allocated nonzeros=2506
>>>>    total number of mallocs used during MatSetValues calls =0
>>>>      not using I-node (on process 0) routines
>>>> 
>>>> 
>>>> 
>>>> AND
>>>> 
>>>> 
>>>> 
>>>> KSP Object:(laplace_) 2 MPI processes
>>>>  type: richardson
>>>>    Richardson: damping factor=1
>>>>  maximum iterations=1
>>>>  tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>  left preconditioning
>>>>  has attached null space
>>>>  using nonzero initial guess
>>>>  using PRECONDITIONED norm type for convergence test
>>>> PC Object:(laplace_) 2 MPI processes
>>>>  type: hypre
>>>>    HYPRE BoomerAMG preconditioning
>>>>    HYPRE BoomerAMG: Cycle type V
>>>>    HYPRE BoomerAMG: Maximum number of levels 25
>>>>    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>    HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>    HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>    HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>    HYPRE BoomerAMG: Maximum row sums 0.9
>>>>    HYPRE BoomerAMG: Sweeps down         1
>>>>    HYPRE BoomerAMG: Sweeps up           1
>>>>    HYPRE BoomerAMG: Sweeps on coarse    1
>>>>    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>    HYPRE BoomerAMG: Relax weight  (all)      1
>>>>    HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>    HYPRE BoomerAMG: Using CF-relaxation
>>>>    HYPRE BoomerAMG: Measure type        local
>>>>    HYPRE BoomerAMG: Coarsen type        Falgout
>>>>    HYPRE BoomerAMG: Interpolation type  classical
>>>>  linear system matrix = precond matrix:
>>>>  Matrix Object:   2 MPI processes
>>>>    type: mpiaij
>>>>    rows=369, cols=369
>>>>    total: nonzeros=1745, allocated nonzeros=2506
>>>>    total number of mallocs used during MatSetValues calls =0
>>>>      not using I-node (on process 0) routines
>>>> 
>>>> 
>>>> The outer iteration count is now influenced when adding 
>>>> "-laplace_ksp_monitor" to the command line options.
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>> Am 07.11.2012 19:49, schrieb Barry Smith:
>>>>>     This is normally not expected but might happen under some combination 
>>>>> of solver options. Please send the output of -ksp_view and the options 
>>>>> you use and we'll try to understand the situation.
>>>>> 
>>>>>    Barry
>>>>> 
>>>>> 
>>>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at 
>>>>> tu-dresden.de> wrote:
>>>>> 
>>>>>> I have a very curious behavior in one of my codes: Whenever I enable a 
>>>>>> KSP Monitor for an inner solver, the outer iteration count goes down 
>>>>>> from 25 to 18! Okay, this is great :) I like it so see iteration counts 
>>>>>> decreasing, but I would like to know what's going on, and eventually, a 
>>>>>> KSP monitor should not influence the whole game. An to answer your first 
>>>>>> question, I run the code through valgrind and its free of any errors. 
>>>>>> Any idea what to check next? Thanks for any advice.
>>>>>> 
>>>>>> Thomas
>>>>>> 
>> 
>

[petsc-dev] Iteration counter depends on KSP monitor!

Reply via email to