I am sorry, I cannot reproduce what you describe. I am using
src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as release for
this purpose).
No change to the code I get
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg
-pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6
-mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type
richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm
1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm
5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm
1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm
6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm
3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm
4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm
7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm
1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
Linear solve converged due to CONVERGED_RTOL iterations 7
KSP Object: 1 MPI process
type: richardson
damping factor=1.
maximum iterations=10000, nonzero initial guess
tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: mg
type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level 0 -------------------------------
KSP Object: (mg_coarse_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (mg_coarse_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=8, cols=8
total: nonzeros=32, allocated nonzeros=32
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Residual norm 1.33388e-08
~/Src/petsc/src/ksp/ksp/tutorials (main=) arch-main
$
Now change code with
if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz
- 1) {
barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
} else {
barray[k][j][i] = 1; //Hx * Hy * Hz;
}
I do not understand where I am suppose to change the dimension to 33 so I
ignore that statement. Same command line with change above gives
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg
-pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6
-mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type
richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm
1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm
3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02
2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm
1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03
3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm
5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05
4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm
4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06
5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm
7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07
6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm
1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07
7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm
2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08
8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm
4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09
Linear solve converged due to CONVERGED_RTOL iterations 8
KSP Object: 1 MPI process
type: richardson
damping factor=1.
maximum iterations=10000, nonzero initial guess
tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: mg
type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level 0 -------------------------------
KSP Object: (mg_coarse_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (mg_coarse_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=8, cols=8
total: nonzeros=32, allocated nonzeros=32
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Residual norm 4.06326e-08
~/Src/petsc/src/ksp/ksp/tutorials (main *=) arch-main
$
In neither case is it taking 25 iterations. What am I doing wrong?
Normally one expects only trivial changes in the convergence of multigrid
methods when one changes values in the right hand side as with the run above.
Barry
> On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley
> <[email protected]> wrote:
>
> The scaling might be the problem, especially since I don't know what you mean
> by scaling it according to FE.
>
> For reproducing the issue with a smaller problem:
> Change the ComputeRHS function in ex45.c
>
> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) {
> barray[k][j][i] = 0.0;
> } else {
> barray[k][j][i] = 1.0;
> }
>
> Change the dimensions to e.g. 33 (I scaled it down, so it goes quick without
> a GPU) instead of 7 and then run with
>
> -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg
> -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6
> -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type
> richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
> -mg_coarse_pc_type jacobi
>
> You will find that it takes 145 iterations instead of 25 for the original
> ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 iterations.
>
> To what do I have to change the diagonal entries of the matrix for the
> boundary according to FE? Right now the diagonal is completely constant.
>
> Paul
>
> On Tuesday, February 28, 2023 00:23 CET, Barry Smith <[email protected]> wrote:
>
>>
>>
>> I have not seen explicitly including, or excluding, the Dirichlet boundary
>> values in the system having a significant affect on the convergence so long
>> as you SCALE the diagonal rows (of those Dirichlet points) by a value
>> similar to the other entries along the diagonal. If they are scaled
>> completely differently, that can screw up the convergence. For
>> src/ksp/ksp/ex45.c I see that the appropriate scaling is used (note the
>> scaling should come from a finite element view of the discretization even if
>> the discretization is finite differences as is done in ex45.c)
>>
>> Are you willing to share the two codes so we can take a look with
>> experienced eyes to try to figure out the difference?
>>
>> Barry
>>
>>
>>
>>
>> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley
>> > <[email protected]> wrote:
>> >
>> > Hi Barry,
>> >
>> > the reason why I wanted to change to ghost boundaries is that I was
>> > worrying about the effect of PCMGs coarsening on these boundary values.
>> >
>> > As mentioned before, I am trying to reproduce results from the hpgmg-cuda
>> > benchmark (a modified version of it, e.g. using 2nd order instead of 4th
>> > etc.).
>> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on
>> > the boundary with rtol=1e-9. While my MG solver implemented in hpgmg
>> > solves this in 40 V-cycles (I weakened it a lot by only doing smooths at
>> > the coarse level instead of CG). When I run the "same" MG solver built in
>> > PETSc on this problem, it starts out reducing the residual norm as fast or
>> > even faster for the first 20-30 iterations. But for the last order of
>> > magnitude in the residual norm it needs more than 300 V-cycles, i.e. it
>> > gets very slow. At this point I am pretty much out of ideas about what is
>> > the cause, especially since e.g. adding back cg at the coarsest level
>> > doesn't seem to change the number of iterations at all. Therefore I am
>> > suspecting the discretization to be the problem. HPGMG uses an even number
>> > of points per dimension (e.g. 256), while PCMG wants an odd number (e.g.
>> > 257). So I also tried adding another layer of boundary values for the
>> > discretization to effectively use only 254 points per dimension. This
>> > caused the solver to get even slightly worse.
>> >
>> > So can the explicit boundary values screw with the coarsening, especially
>> > when they are not finite? Because with the problem as stated in ex45 with
>> > finite (i.e. non-zero) boundary values, the MG solver takes only 18
>> > V-cycles.
>> >
>> > Best,
>> > Paul
>> >
>> >
>> >
>> > On Monday, February 27, 2023 18:17 CET, Barry Smith <[email protected]>
>> > wrote:
>> >
>> >>
>> >> Paul,
>> >>
>> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the
>> >> local vectors (obtained with DMCreateLocalVector() but they will not
>> >> appear in the global vectors obtained with DMCreateGlobalVector();
>> >> perhaps this is the issue? Since they do not appear in the global vector
>> >> they will not appear in the linear system so there will be no diagonal
>> >> entries for you to set since those rows/columns do not exist in the
>> >> linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to
>> >> avoid needing to put the Dirichlet values explicitly into the system
>> >> being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear
>> >> systems than linear systems.
>> >>
>> >> Barry
>> >>
>> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley
>> >> > <[email protected]> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement
>> >> > Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of
>> >> > using DM_BOUNDARY_NONE and explicitly implementing the boundary by
>> >> > adding diagnonal-only rows.
>> >> >
>> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that
>> >> > DM have the extra memory for the ghost entries and that I can basically
>> >> > use DMDAGetGhostCorners instead of DMDAGetCorners to access the array
>> >> > gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) =
>> >> > (-1,-1,-1) I get a segmentation fault. When looking at the
>> >> > implementation of DMDAVecGetArray it looked to me as if accessing (-1,
>> >> > -1, -1) should work as DMDAVecGetArray passes the ghost corners to
>> >> > VecGetArray3d which then adds the right offsets.
>> >> >
>> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then
>> >> > actually accessing the ghost/boundary elements. Can I assume that they
>> >> > are set to zero for the solution vector, i.e. the u=0 on \del\Omega and
>> >> > I do not need to access them at all?
>> >> >
>> >> > Best,
>> >> > Paul Große-Bley
>> >>
>>