[petsc-dev] Iteration counter depends on KSP monitor!

Thomas Witkowski Fri, 09 Nov 2012 20:20:23 +0100

Thanks you guys for clarifying this!

Thomas


Am 08.11.2012 21:21, schrieb Barry Smith:
> On Nov 8, 2012, at 2:48 AM, Thomas Witkowski <thomas.witkowski at 
> tu-dresden.de> wrote:
>
>> Maybe the following information is also useful to find the problem or/and to 
>> understand what's going on: The inner solver solves with a Laplace matrix 
>> (discretized with free boundaries). So I set the constant nullspace to the 
>> KSP object. This inner solver is used inside the Schur complement solver of 
>> a PCFIELDSPLIT object. The Schur complement solver (KSPPREONLY and PCSHELL) 
>> is also set to have a constant null space. But the vector, on which the 
>> PCSHELL is applied, is not orthogonal to the constant null space. Can you 
>> help me to understand why this is still the case?
>>
>> To go back to the problem of the influence of the KSP monitor to the 
>> solution process: When I project out the constant null space before calling 
>> the KSPRICHARDSON with PCHYPRE, the monitor has no influence anymore on the 
>> solution process.
>      Jed figured this out. Our KSP_PCApply() projects out the given null 
> space, when hypre is given the job of applying the preconditioner it doesn't 
> know about the null space and hence does not remove it; hence it is like you 
> didn't provide a null space.
>
>       The fix is that when a null space has been attached to the KSP we 
> should always do the standard richardson and not call the "short cut" routine.
>
>      So now if you are using petsc-dev you should see no difference between 
> monitoring or not
>
>     Barry
>
>
>> Thomas
>>
>>
>> Am 08.11.2012 08:27, schrieb Thomas Witkowski:
>>> I run the code in the debugger, and you are right with your conjecture. For 
>>> the case without monitoring the richardson KSP, PCApply_HYPRE is called 
>>> once per outer iteration with the following call stack:
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:69)
>>> PCApplyRichardson (precon.c:780)
>>> PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>>
>>> When the monitor is enabled, PCApply_HYPRE is called twice per outer 
>>> iteration from two different places in KSPSolve_Richardson:
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:124)
>>> PCApply (precon.c:384)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>> and
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:147)
>>> PCApply (precon.c:384)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>>
>>>
>>> By the way, regardless of the options I provide to the inner solver which 
>>> we are talking about, I cannot reproduce the small outer iteration number I 
>>> see when using the KSP monitor. Can this be due to the tolerances that are 
>>> set in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the 
>>> options before calling PCApply_HYPRE, and than the same options are reseted 
>>> before leaving this function?
>>>
>>> Thomas
>>>
>>> Am 07.11.2012 22:28, schrieb Barry Smith:
>>>>    Thomas,
>>>>
>>>>       I don't have a complete explanation why in this case it changes but 
>>>> I can point you in the right direction of how this happens. You may need 
>>>> to put breakpoints in the debugger to see exactly what goes different with 
>>>> and without that option.
>>>>
>>>>       1) When richardson is used and no monitoring is done then 
>>>> PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG 
>>>> v-cycle.  Note that it changes the tolerance and its before calling 
>>>> PCApply_HYPRE()
>>>>
>>>>       2) When monitoring is turned on we need to compute the residual norm 
>>>> at each iteration so PCApply_HYPRE() is instead called directly by 
>>>> KSPSolve_Richardson() once for each iteration.
>>>>
>>>>      Now since you are trying to use just one smoothing step inside 
>>>> richardson the two approaches (I think) should be identical. Somehow when 
>>>> KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner 
>>>> iterations (iterations on the monitored thing) must be happening, thus 
>>>> leading to a stronger preconditioner and hence less iterations on the 
>>>> entire thing.
>>>>
>>>>      You can run (for example on one process but two is ok also) both 
>>>> cases with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and 
>>>> then when it gets to that function do where to see how it is being called. 
>>>> Continue repeatedly to see why the one case triggers more (of these inner 
>>>> calls) then the other case.
>>>>
>>>>      Barry
>>>>
>>>>
>>>>    Depending on the outcome (reason for the difference) I might call this 
>>>> issue a bug or a strange feature. I am leaning toward bug.
>>>>
>>>>
>>>> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at 
>>>> tu-dresden.de> wrote:
>>>>
>>>>> Okay, the outer KSP is as follows:
>>>>>
>>>>> KSP Object:(ns_) 2 MPI processes
>>>>>   type: fgmres
>>>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>>>>> Orthogonalization with no iterative refinement
>>>>>     GMRES: happy breakdown tolerance 1e-30
>>>>>   maximum iterations=100, initial guess is zero
>>>>>   tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>>>>   right preconditioning
>>>>>   has attached null space
>>>>>   using UNPRECONDITIONED norm type for convergence test
>>>>> PC Object:(ns_) 2 MPI processes
>>>>>   type: fieldsplit
>>>>>     FieldSplit with Schur preconditioner, factorization FULL
>>>>>     Preconditioner for the Schur complement formed from the block 
>>>>> diagonal part of A11
>>>>>     Split info:
>>>>>     Split number 0 Defined by IS
>>>>>     Split number 1 Defined by IS
>>>>>     KSP solver for A00 block
>>>>>       KSP Object:      (velocity_)       2 MPI processes
>>>>>         type: richardson
>>>>>           Richardson: damping factor=1
>>>>>         maximum iterations=1, initial guess is zero
>>>>>         tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>         left preconditioning
>>>>>         using PRECONDITIONED norm type for convergence test
>>>>>       PC Object:      (velocity_)       2 MPI processes
>>>>>         type: hypre
>>>>>           HYPRE BoomerAMG preconditioning
>>>>>           HYPRE BoomerAMG: Cycle type V
>>>>>           HYPRE BoomerAMG: Maximum number of levels 25
>>>>>           HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>           HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>           HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>           HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>           HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>           HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>           HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>           HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>           HYPRE BoomerAMG: Sweeps down         1
>>>>>           HYPRE BoomerAMG: Sweeps up           1
>>>>>           HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>           HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>>           HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>>           HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>>           HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>           HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>           HYPRE BoomerAMG: Using CF-relaxation
>>>>>           HYPRE BoomerAMG: Measure type        local
>>>>>           HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>           HYPRE BoomerAMG: Interpolation type  classical
>>>>>         linear system matrix = precond matrix:
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=2754, cols=2754
>>>>>           total: nonzeros=25026, allocated nonzeros=25026
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             not using I-node (on process 0) routines
>>>>>     KSP solver for S = A11 - A10 inv(A00) A01
>>>>>       KSP Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>>         type: preonly
>>>>>         maximum iterations=10000, initial guess is zero
>>>>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>         left preconditioning
>>>>>         has attached null space
>>>>>         using NONE norm type for convergence test
>>>>>       PC Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>>         type: shell
>>>>>           Shell: no name
>>>>>         linear system matrix followed by preconditioner matrix:
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: schurcomplement
>>>>>           rows=369, cols=369
>>>>>             Schur complement A11 - A10 inv(A00) A01
>>>>>             A11
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=369, cols=369
>>>>>                 total: nonzeros=0, allocated nonzeros=0
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   using I-node (on process 0) routines: found 33 nodes, 
>>>>> limit used is 5
>>>>>             A10
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=369, cols=2754
>>>>>                 total: nonzeros=8973, allocated nonzeros=8973
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   not using I-node (on process 0) routines
>>>>>             KSP of A00
>>>>>               KSP Object: (velocity_)               2 MPI processes
>>>>>                 type: richardson
>>>>>                   Richardson: damping factor=1
>>>>>                 maximum iterations=1, initial guess is zero
>>>>>                 tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>                 left preconditioning
>>>>>                 using PRECONDITIONED norm type for convergence test
>>>>>               PC Object: (velocity_)               2 MPI processes
>>>>>                 type: hypre
>>>>>                   HYPRE BoomerAMG preconditioning
>>>>>                   HYPRE BoomerAMG: Cycle type V
>>>>>                   HYPRE BoomerAMG: Maximum number of levels 25
>>>>>                   HYPRE BoomerAMG: Maximum number of iterations PER hypre 
>>>>> call 1
>>>>>                   HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>                   HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>                   HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>                   HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>                   HYPRE BoomerAMG: Number of levels of aggressive 
>>>>> coarsening 0
>>>>>                   HYPRE BoomerAMG: Number of paths for aggressive 
>>>>> coarsening 1
>>>>>                   HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>                   HYPRE BoomerAMG: Sweeps down         1
>>>>>                   HYPRE BoomerAMG: Sweeps up           1
>>>>>                   HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>                   HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>>                   HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>>                   HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>>                   HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>                   HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>                   HYPRE BoomerAMG: Using CF-relaxation
>>>>>                   HYPRE BoomerAMG: Measure type        local
>>>>>                   HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>                   HYPRE BoomerAMG: Interpolation type classical
>>>>>                 linear system matrix = precond matrix:
>>>>>                 Matrix Object:                 2 MPI processes
>>>>>                   type: mpiaij
>>>>>                   rows=2754, cols=2754
>>>>>                   total: nonzeros=25026, allocated nonzeros=25026
>>>>>                   total number of mallocs used during MatSetValues calls 
>>>>> =0
>>>>>                     not using I-node (on process 0) routines
>>>>>             A01
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=2754, cols=369
>>>>>                 total: nonzeros=7883, allocated nonzeros=7883
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   not using I-node (on process 0) routines
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=369, cols=369
>>>>>           total: nonzeros=0, allocated nonzeros=0
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             using I-node (on process 0) routines: found 33 nodes, limit 
>>>>> used is 5
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=3123, cols=3123
>>>>>     total: nonzeros=41882, allocated nonzeros=52732
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>>
>>>>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of 
>>>>> two KSP objects "mass_" and "laplace_"
>>>>>
>>>>>
>>>>>
>>>>> KSP Object:(mass_) 2 MPI processes
>>>>>   type: cg
>>>>>   maximum iterations=2
>>>>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>   left preconditioning
>>>>>   using nonzero initial guess
>>>>>   using PRECONDITIONED norm type for convergence test
>>>>> PC Object:(mass_) 2 MPI processes
>>>>>   type: jacobi
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=369, cols=369
>>>>>     total: nonzeros=2385, allocated nonzeros=2506
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>>
>>>>> AND
>>>>>
>>>>>
>>>>>
>>>>> KSP Object:(laplace_) 2 MPI processes
>>>>>   type: richardson
>>>>>     Richardson: damping factor=1
>>>>>   maximum iterations=1
>>>>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>   left preconditioning
>>>>>   has attached null space
>>>>>   using nonzero initial guess
>>>>>   using PRECONDITIONED norm type for convergence test
>>>>> PC Object:(laplace_) 2 MPI processes
>>>>>   type: hypre
>>>>>     HYPRE BoomerAMG preconditioning
>>>>>     HYPRE BoomerAMG: Cycle type V
>>>>>     HYPRE BoomerAMG: Maximum number of levels 25
>>>>>     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>     HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>     HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>     HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>     HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>     HYPRE BoomerAMG: Sweeps down         1
>>>>>     HYPRE BoomerAMG: Sweeps up           1
>>>>>     HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>>     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>>     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>>     HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>     HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>     HYPRE BoomerAMG: Using CF-relaxation
>>>>>     HYPRE BoomerAMG: Measure type        local
>>>>>     HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>     HYPRE BoomerAMG: Interpolation type  classical
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=369, cols=369
>>>>>     total: nonzeros=1745, allocated nonzeros=2506
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>> The outer iteration count is now influenced when adding 
>>>>> "-laplace_ksp_monitor" to the command line options.
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> Am 07.11.2012 19:49, schrieb Barry Smith:
>>>>>>      This is normally not expected but might happen under some 
>>>>>> combination of solver options. Please send the output of -ksp_view and 
>>>>>> the options you use and we'll try to understand the situation.
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>>
>>>>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at 
>>>>>> tu-dresden.de> wrote:
>>>>>>
>>>>>>> I have a very curious behavior in one of my codes: Whenever I enable a 
>>>>>>> KSP Monitor for an inner solver, the outer iteration count goes down 
>>>>>>> from 25 to 18! Okay, this is great :) I like it so see iteration counts 
>>>>>>> decreasing, but I would like to know what's going on, and eventually, a 
>>>>>>> KSP monitor should not influence the whole game. An to answer your 
>>>>>>> first question, I run the code through valgrind and its free of any 
>>>>>>> errors. Any idea what to check next? Thanks for any advice.
>>>>>>>
>>>>>>> Thomas
>>>>>>>

[petsc-dev] Iteration counter depends on KSP monitor!

Reply via email to