On Nov 8, 2012, at 2:48 AM, Thomas Witkowski <thomas.witkowski at
tu-dresden.de> wrote:
> Maybe the following information is also useful to find the problem or/and to
> understand what's going on: The inner solver solves with a Laplace matrix
> (discretized with free boundaries). So I set the constant nullspace to the
> KSP object. This inner solver is used inside the Schur complement solver of a
> PCFIELDSPLIT object. The Schur complement solver (KSPPREONLY and PCSHELL) is
> also set to have a constant null space. But the vector, on which the PCSHELL
> is applied, is not orthogonal to the constant null space. Can you help me to
> understand why this is still the case?
>
> To go back to the problem of the influence of the KSP monitor to the solution
> process: When I project out the constant null space before calling the
> KSPRICHARDSON with PCHYPRE, the monitor has no influence anymore on the
> solution process.
Jed figured this out. Our KSP_PCApply() projects out the given null space,
when hypre is given the job of applying the preconditioner it doesn't know
about the null space and hence does not remove it; hence it is like you didn't
provide a null space.
The fix is that when a null space has been attached to the KSP we should
always do the standard richardson and not call the "short cut" routine.
So now if you are using petsc-dev you should see no difference between
monitoring or not
Barry
>
> Thomas
>
>
> Am 08.11.2012 08:27, schrieb Thomas Witkowski:
>> I run the code in the debugger, and you are right with your conjecture. For
>> the case without monitoring the richardson KSP, PCApply_HYPRE is called once
>> per outer iteration with the following call stack:
>>
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:69)
>> PCApplyRichardson (precon.c:780)
>> PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597)
>> PCApply_HYPRE (hypre.c:151)
>>
>>
>> When the monitor is enabled, PCApply_HYPRE is called twice per outer
>> iteration from two different places in KSPSolve_Richardson:
>>
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:124)
>> PCApply (precon.c:384)
>> PCApply_HYPRE (hypre.c:151)
>>
>> and
>>
>> KSPSolve (itfunc.c:446)
>> KSPSolve_Richardson (rich.c:147)
>> PCApply (precon.c:384)
>> PCApply_HYPRE (hypre.c:151)
>>
>>
>>
>> By the way, regardless of the options I provide to the inner solver which we
>> are talking about, I cannot reproduce the small outer iteration number I see
>> when using the KSP monitor. Can this be due to the tolerances that are set
>> in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the options
>> before calling PCApply_HYPRE, and than the same options are reseted before
>> leaving this function?
>>
>> Thomas
>>
>> Am 07.11.2012 22:28, schrieb Barry Smith:
>>> Thomas,
>>>
>>> I don't have a complete explanation why in this case it changes but I
>>> can point you in the right direction of how this happens. You may need to
>>> put breakpoints in the debugger to see exactly what goes different with and
>>> without that option.
>>>
>>> 1) When richardson is used and no monitoring is done then
>>> PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG
>>> v-cycle. Note that it changes the tolerance and its before calling
>>> PCApply_HYPRE()
>>>
>>> 2) When monitoring is turned on we need to compute the residual norm
>>> at each iteration so PCApply_HYPRE() is instead called directly by
>>> KSPSolve_Richardson() once for each iteration.
>>>
>>> Now since you are trying to use just one smoothing step inside
>>> richardson the two approaches (I think) should be identical. Somehow when
>>> KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner
>>> iterations (iterations on the monitored thing) must be happening, thus
>>> leading to a stronger preconditioner and hence less iterations on the
>>> entire thing.
>>>
>>> You can run (for example on one process but two is ok also) both cases
>>> with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then
>>> when it gets to that function do where to see how it is being called.
>>> Continue repeatedly to see why the one case triggers more (of these inner
>>> calls) then the other case.
>>>
>>> Barry
>>>
>>>
>>> Depending on the outcome (reason for the difference) I might call this
>>> issue a bug or a strange feature. I am leaning toward bug.
>>>
>>>
>>> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at
>>> tu-dresden.de> wrote:
>>>
>>>> Okay, the outer KSP is as follows:
>>>>
>>>> KSP Object:(ns_) 2 MPI processes
>>>> type: fgmres
>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>>> Orthogonalization with no iterative refinement
>>>> GMRES: happy breakdown tolerance 1e-30
>>>> maximum iterations=100, initial guess is zero
>>>> tolerances: relative=1e-06, absolute=1e-08, divergence=10000
>>>> right preconditioning
>>>> has attached null space
>>>> using UNPRECONDITIONED norm type for convergence test
>>>> PC Object:(ns_) 2 MPI processes
>>>> type: fieldsplit
>>>> FieldSplit with Schur preconditioner, factorization FULL
>>>> Preconditioner for the Schur complement formed from the block diagonal
>>>> part of A11
>>>> Split info:
>>>> Split number 0 Defined by IS
>>>> Split number 1 Defined by IS
>>>> KSP solver for A00 block
>>>> KSP Object: (velocity_) 2 MPI processes
>>>> type: richardson
>>>> Richardson: damping factor=1
>>>> maximum iterations=1, initial guess is zero
>>>> tolerances: relative=0, absolute=1e-14, divergence=10000
>>>> left preconditioning
>>>> using PRECONDITIONED norm type for convergence test
>>>> PC Object: (velocity_) 2 MPI processes
>>>> type: hypre
>>>> HYPRE BoomerAMG preconditioning
>>>> HYPRE BoomerAMG: Cycle type V
>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>> HYPRE BoomerAMG: Sweeps down 1
>>>> HYPRE BoomerAMG: Sweeps up 1
>>>> HYPRE BoomerAMG: Sweeps on coarse 1
>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>> HYPRE BoomerAMG: Relax weight (all) 1
>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>> HYPRE BoomerAMG: Measure type local
>>>> HYPRE BoomerAMG: Coarsen type Falgout
>>>> HYPRE BoomerAMG: Interpolation type classical
>>>> linear system matrix = precond matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=2754, cols=2754
>>>> total: nonzeros=25026, allocated nonzeros=25026
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>> KSP solver for S = A11 - A10 inv(A00) A01
>>>> KSP Object: (ns_fieldsplit_pressure_) 2 MPI processes
>>>> type: preonly
>>>> maximum iterations=10000, initial guess is zero
>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>>>> left preconditioning
>>>> has attached null space
>>>> using NONE norm type for convergence test
>>>> PC Object: (ns_fieldsplit_pressure_) 2 MPI processes
>>>> type: shell
>>>> Shell: no name
>>>> linear system matrix followed by preconditioner matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: schurcomplement
>>>> rows=369, cols=369
>>>> Schur complement A11 - A10 inv(A00) A01
>>>> A11
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=369, cols=369
>>>> total: nonzeros=0, allocated nonzeros=0
>>>> total number of mallocs used during MatSetValues calls =0
>>>> using I-node (on process 0) routines: found 33 nodes,
>>>> limit used is 5
>>>> A10
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=369, cols=2754
>>>> total: nonzeros=8973, allocated nonzeros=8973
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>> KSP of A00
>>>> KSP Object: (velocity_) 2 MPI processes
>>>> type: richardson
>>>> Richardson: damping factor=1
>>>> maximum iterations=1, initial guess is zero
>>>> tolerances: relative=0, absolute=1e-14, divergence=10000
>>>> left preconditioning
>>>> using PRECONDITIONED norm type for convergence test
>>>> PC Object: (velocity_) 2 MPI processes
>>>> type: hypre
>>>> HYPRE BoomerAMG preconditioning
>>>> HYPRE BoomerAMG: Cycle type V
>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre
>>>> call 1
>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>> HYPRE BoomerAMG: Number of levels of aggressive
>>>> coarsening 0
>>>> HYPRE BoomerAMG: Number of paths for aggressive
>>>> coarsening 1
>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>> HYPRE BoomerAMG: Sweeps down 1
>>>> HYPRE BoomerAMG: Sweeps up 1
>>>> HYPRE BoomerAMG: Sweeps on coarse 1
>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>> HYPRE BoomerAMG: Relax weight (all) 1
>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>> HYPRE BoomerAMG: Measure type local
>>>> HYPRE BoomerAMG: Coarsen type Falgout
>>>> HYPRE BoomerAMG: Interpolation type classical
>>>> linear system matrix = precond matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=2754, cols=2754
>>>> total: nonzeros=25026, allocated nonzeros=25026
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>> A01
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=2754, cols=369
>>>> total: nonzeros=7883, allocated nonzeros=7883
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=369, cols=369
>>>> total: nonzeros=0, allocated nonzeros=0
>>>> total number of mallocs used during MatSetValues calls =0
>>>> using I-node (on process 0) routines: found 33 nodes, limit
>>>> used is 5
>>>> linear system matrix = precond matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=3123, cols=3123
>>>> total: nonzeros=41882, allocated nonzeros=52732
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>>
>>>>
>>>>
>>>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of
>>>> two KSP objects "mass_" and "laplace_"
>>>>
>>>>
>>>>
>>>> KSP Object:(mass_) 2 MPI processes
>>>> type: cg
>>>> maximum iterations=2
>>>> tolerances: relative=0, absolute=1e-14, divergence=10000
>>>> left preconditioning
>>>> using nonzero initial guess
>>>> using PRECONDITIONED norm type for convergence test
>>>> PC Object:(mass_) 2 MPI processes
>>>> type: jacobi
>>>> linear system matrix = precond matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=369, cols=369
>>>> total: nonzeros=2385, allocated nonzeros=2506
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>>
>>>>
>>>>
>>>> AND
>>>>
>>>>
>>>>
>>>> KSP Object:(laplace_) 2 MPI processes
>>>> type: richardson
>>>> Richardson: damping factor=1
>>>> maximum iterations=1
>>>> tolerances: relative=0, absolute=1e-14, divergence=10000
>>>> left preconditioning
>>>> has attached null space
>>>> using nonzero initial guess
>>>> using PRECONDITIONED norm type for convergence test
>>>> PC Object:(laplace_) 2 MPI processes
>>>> type: hypre
>>>> HYPRE BoomerAMG preconditioning
>>>> HYPRE BoomerAMG: Cycle type V
>>>> HYPRE BoomerAMG: Maximum number of levels 25
>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>> HYPRE BoomerAMG: Interpolation truncation factor 0
>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>> HYPRE BoomerAMG: Maximum row sums 0.9
>>>> HYPRE BoomerAMG: Sweeps down 1
>>>> HYPRE BoomerAMG: Sweeps up 1
>>>> HYPRE BoomerAMG: Sweeps on coarse 1
>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>> HYPRE BoomerAMG: Relax weight (all) 1
>>>> HYPRE BoomerAMG: Outer relax weight (all) 1
>>>> HYPRE BoomerAMG: Using CF-relaxation
>>>> HYPRE BoomerAMG: Measure type local
>>>> HYPRE BoomerAMG: Coarsen type Falgout
>>>> HYPRE BoomerAMG: Interpolation type classical
>>>> linear system matrix = precond matrix:
>>>> Matrix Object: 2 MPI processes
>>>> type: mpiaij
>>>> rows=369, cols=369
>>>> total: nonzeros=1745, allocated nonzeros=2506
>>>> total number of mallocs used during MatSetValues calls =0
>>>> not using I-node (on process 0) routines
>>>>
>>>>
>>>> The outer iteration count is now influenced when adding
>>>> "-laplace_ksp_monitor" to the command line options.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> Am 07.11.2012 19:49, schrieb Barry Smith:
>>>>> This is normally not expected but might happen under some combination
>>>>> of solver options. Please send the output of -ksp_view and the options
>>>>> you use and we'll try to understand the situation.
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at
>>>>> tu-dresden.de> wrote:
>>>>>
>>>>>> I have a very curious behavior in one of my codes: Whenever I enable a
>>>>>> KSP Monitor for an inner solver, the outer iteration count goes down
>>>>>> from 25 to 18! Okay, this is great :) I like it so see iteration counts
>>>>>> decreasing, but I would like to know what's going on, and eventually, a
>>>>>> KSP monitor should not influence the whole game. An to answer your first
>>>>>> question, I run the code through valgrind and its free of any errors.
>>>>>> Any idea what to check next? Thanks for any advice.
>>>>>>
>>>>>> Thomas
>>>>>>
>>
>