Thanks you guys for clarifying this! Thomas
Am 08.11.2012 21:21, schrieb Barry Smith: > On Nov 8, 2012, at 2:48 AM, Thomas Witkowski <thomas.witkowski at > tu-dresden.de> wrote: > >> Maybe the following information is also useful to find the problem or/and to >> understand what's going on: The inner solver solves with a Laplace matrix >> (discretized with free boundaries). So I set the constant nullspace to the >> KSP object. This inner solver is used inside the Schur complement solver of >> a PCFIELDSPLIT object. The Schur complement solver (KSPPREONLY and PCSHELL) >> is also set to have a constant null space. But the vector, on which the >> PCSHELL is applied, is not orthogonal to the constant null space. Can you >> help me to understand why this is still the case? >> >> To go back to the problem of the influence of the KSP monitor to the >> solution process: When I project out the constant null space before calling >> the KSPRICHARDSON with PCHYPRE, the monitor has no influence anymore on the >> solution process. > Jed figured this out. Our KSP_PCApply() projects out the given null > space, when hypre is given the job of applying the preconditioner it doesn't > know about the null space and hence does not remove it; hence it is like you > didn't provide a null space. > > The fix is that when a null space has been attached to the KSP we > should always do the standard richardson and not call the "short cut" routine. > > So now if you are using petsc-dev you should see no difference between > monitoring or not > > Barry > > >> Thomas >> >> >> Am 08.11.2012 08:27, schrieb Thomas Witkowski: >>> I run the code in the debugger, and you are right with your conjecture. For >>> the case without monitoring the richardson KSP, PCApply_HYPRE is called >>> once per outer iteration with the following call stack: >>> >>> KSPSolve (itfunc.c:446) >>> KSPSolve_Richardson (rich.c:69) >>> PCApplyRichardson (precon.c:780) >>> PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597) >>> PCApply_HYPRE (hypre.c:151) >>> >>> >>> When the monitor is enabled, PCApply_HYPRE is called twice per outer >>> iteration from two different places in KSPSolve_Richardson: >>> >>> KSPSolve (itfunc.c:446) >>> KSPSolve_Richardson (rich.c:124) >>> PCApply (precon.c:384) >>> PCApply_HYPRE (hypre.c:151) >>> >>> and >>> >>> KSPSolve (itfunc.c:446) >>> KSPSolve_Richardson (rich.c:147) >>> PCApply (precon.c:384) >>> PCApply_HYPRE (hypre.c:151) >>> >>> >>> >>> By the way, regardless of the options I provide to the inner solver which >>> we are talking about, I cannot reproduce the small outer iteration number I >>> see when using the KSP monitor. Can this be due to the tolerances that are >>> set in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the >>> options before calling PCApply_HYPRE, and than the same options are reseted >>> before leaving this function? >>> >>> Thomas >>> >>> Am 07.11.2012 22:28, schrieb Barry Smith: >>>> Thomas, >>>> >>>> I don't have a complete explanation why in this case it changes but >>>> I can point you in the right direction of how this happens. You may need >>>> to put breakpoints in the debugger to see exactly what goes different with >>>> and without that option. >>>> >>>> 1) When richardson is used and no monitoring is done then >>>> PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG >>>> v-cycle. Note that it changes the tolerance and its before calling >>>> PCApply_HYPRE() >>>> >>>> 2) When monitoring is turned on we need to compute the residual norm >>>> at each iteration so PCApply_HYPRE() is instead called directly by >>>> KSPSolve_Richardson() once for each iteration. >>>> >>>> Now since you are trying to use just one smoothing step inside >>>> richardson the two approaches (I think) should be identical. Somehow when >>>> KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner >>>> iterations (iterations on the monitored thing) must be happening, thus >>>> leading to a stronger preconditioner and hence less iterations on the >>>> entire thing. >>>> >>>> You can run (for example on one process but two is ok also) both >>>> cases with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and >>>> then when it gets to that function do where to see how it is being called. >>>> Continue repeatedly to see why the one case triggers more (of these inner >>>> calls) then the other case. >>>> >>>> Barry >>>> >>>> >>>> Depending on the outcome (reason for the difference) I might call this >>>> issue a bug or a strange feature. I am leaning toward bug. >>>> >>>> >>>> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at >>>> tu-dresden.de> wrote: >>>> >>>>> Okay, the outer KSP is as follows: >>>>> >>>>> KSP Object:(ns_) 2 MPI processes >>>>> type: fgmres >>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>> Orthogonalization with no iterative refinement >>>>> GMRES: happy breakdown tolerance 1e-30 >>>>> maximum iterations=100, initial guess is zero >>>>> tolerances: relative=1e-06, absolute=1e-08, divergence=10000 >>>>> right preconditioning >>>>> has attached null space >>>>> using UNPRECONDITIONED norm type for convergence test >>>>> PC Object:(ns_) 2 MPI processes >>>>> type: fieldsplit >>>>> FieldSplit with Schur preconditioner, factorization FULL >>>>> Preconditioner for the Schur complement formed from the block >>>>> diagonal part of A11 >>>>> Split info: >>>>> Split number 0 Defined by IS >>>>> Split number 1 Defined by IS >>>>> KSP solver for A00 block >>>>> KSP Object: (velocity_) 2 MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1 >>>>> maximum iterations=1, initial guess is zero >>>>> tolerances: relative=0, absolute=1e-14, divergence=10000 >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: (velocity_) 2 MPI processes >>>>> type: hypre >>>>> HYPRE BoomerAMG preconditioning >>>>> HYPRE BoomerAMG: Cycle type V >>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 >>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>> HYPRE BoomerAMG: Measure type local >>>>> HYPRE BoomerAMG: Coarsen type Falgout >>>>> HYPRE BoomerAMG: Interpolation type classical >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=2754, cols=2754 >>>>> total: nonzeros=25026, allocated nonzeros=25026 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> KSP solver for S = A11 - A10 inv(A00) A01 >>>>> KSP Object: (ns_fieldsplit_pressure_) 2 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> has attached null space >>>>> using NONE norm type for convergence test >>>>> PC Object: (ns_fieldsplit_pressure_) 2 MPI processes >>>>> type: shell >>>>> Shell: no name >>>>> linear system matrix followed by preconditioner matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: schurcomplement >>>>> rows=369, cols=369 >>>>> Schur complement A11 - A10 inv(A00) A01 >>>>> A11 >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=369, cols=369 >>>>> total: nonzeros=0, allocated nonzeros=0 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 33 nodes, >>>>> limit used is 5 >>>>> A10 >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=369, cols=2754 >>>>> total: nonzeros=8973, allocated nonzeros=8973 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> KSP of A00 >>>>> KSP Object: (velocity_) 2 MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1 >>>>> maximum iterations=1, initial guess is zero >>>>> tolerances: relative=0, absolute=1e-14, divergence=10000 >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: (velocity_) 2 MPI processes >>>>> type: hypre >>>>> HYPRE BoomerAMG preconditioning >>>>> HYPRE BoomerAMG: Cycle type V >>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre >>>>> call 1 >>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 >>>>> HYPRE BoomerAMG: Number of levels of aggressive >>>>> coarsening 0 >>>>> HYPRE BoomerAMG: Number of paths for aggressive >>>>> coarsening 1 >>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>> HYPRE BoomerAMG: Measure type local >>>>> HYPRE BoomerAMG: Coarsen type Falgout >>>>> HYPRE BoomerAMG: Interpolation type classical >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=2754, cols=2754 >>>>> total: nonzeros=25026, allocated nonzeros=25026 >>>>> total number of mallocs used during MatSetValues calls >>>>> =0 >>>>> not using I-node (on process 0) routines >>>>> A01 >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=2754, cols=369 >>>>> total: nonzeros=7883, allocated nonzeros=7883 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=369, cols=369 >>>>> total: nonzeros=0, allocated nonzeros=0 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 33 nodes, limit >>>>> used is 5 >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=3123, cols=3123 >>>>> total: nonzeros=41882, allocated nonzeros=52732 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> >>>>> >>>>> >>>>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of >>>>> two KSP objects "mass_" and "laplace_" >>>>> >>>>> >>>>> >>>>> KSP Object:(mass_) 2 MPI processes >>>>> type: cg >>>>> maximum iterations=2 >>>>> tolerances: relative=0, absolute=1e-14, divergence=10000 >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object:(mass_) 2 MPI processes >>>>> type: jacobi >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=369, cols=369 >>>>> total: nonzeros=2385, allocated nonzeros=2506 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> >>>>> >>>>> >>>>> AND >>>>> >>>>> >>>>> >>>>> KSP Object:(laplace_) 2 MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1 >>>>> maximum iterations=1 >>>>> tolerances: relative=0, absolute=1e-14, divergence=10000 >>>>> left preconditioning >>>>> has attached null space >>>>> using nonzero initial guess >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object:(laplace_) 2 MPI processes >>>>> type: hypre >>>>> HYPRE BoomerAMG preconditioning >>>>> HYPRE BoomerAMG: Cycle type V >>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 >>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>> HYPRE BoomerAMG: Measure type local >>>>> HYPRE BoomerAMG: Coarsen type Falgout >>>>> HYPRE BoomerAMG: Interpolation type classical >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=369, cols=369 >>>>> total: nonzeros=1745, allocated nonzeros=2506 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> >>>>> >>>>> The outer iteration count is now influenced when adding >>>>> "-laplace_ksp_monitor" to the command line options. >>>>> >>>>> Thomas >>>>> >>>>> >>>>> Am 07.11.2012 19:49, schrieb Barry Smith: >>>>>> This is normally not expected but might happen under some >>>>>> combination of solver options. Please send the output of -ksp_view and >>>>>> the options you use and we'll try to understand the situation. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at >>>>>> tu-dresden.de> wrote: >>>>>> >>>>>>> I have a very curious behavior in one of my codes: Whenever I enable a >>>>>>> KSP Monitor for an inner solver, the outer iteration count goes down >>>>>>> from 25 to 18! Okay, this is great :) I like it so see iteration counts >>>>>>> decreasing, but I would like to know what's going on, and eventually, a >>>>>>> KSP monitor should not influence the whole game. An to answer your >>>>>>> first question, I run the code through valgrind and its free of any >>>>>>> errors. Any idea what to check next? Thanks for any advice. >>>>>>> >>>>>>> Thomas >>>>>>>
