I am sorry, I cannot reproduce what you describe. I am using
src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as
release for this purpose).
No change to the code I get
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09
-pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm
1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm
5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm
1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm
6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm
3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm
4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm
7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm
1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
Linear solve converged due to CONVERGED_RTOL iterations 7
KSP Object: 1 MPI process
type: richardson
damping factor=1.
maximum iterations=10000, nonzero initial guess
tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: mg
type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level 0 -------------------------------
KSP Object: (mg_coarse_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (mg_coarse_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=8, cols=8
total: nonzeros=32, allocated nonzeros=32
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Residual norm 1.33388e-08
~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main
$
Now change code with
if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1
|| k == mz - 1) {
barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
} else {
barray[k][j][i] = 1; //Hx * Hy * Hz;
}
I do not understand where I am suppose to change the dimension to 33
so I ignore that statement. Same command line with change above gives
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09
-pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm
1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm
3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02
2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm
1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03
3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm
5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05
4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm
4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06
5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm
7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07
6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm
1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07
7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm
2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08
8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm
4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09
Linear solve converged due to CONVERGED_RTOL iterations 8
KSP Object: 1 MPI process
type: richardson
damping factor=1.
maximum iterations=10000, nonzero initial guess
tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
type: mg
type is MULTIPLICATIVE, levels=3 cycles=v
Cycles per PCApply=1
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level 0 -------------------------------
KSP Object: (mg_coarse_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (mg_coarse_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=8, cols=8
total: nonzeros=32, allocated nonzeros=32
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI process
type: richardson
damping factor=1.
maximum iterations=6, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI process
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Residual norm 4.06326e-08
~/Src/petsc/src/ksp/ksp/tutorials*(main *=)*arch-main
$
In neither case is it taking 25 iterations. What am I doing wrong?
Normally one expects only trivial changes in the convergence of
multigrid methods when one changes values in the right hand side as
with the run above.
Barry
On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley
<[email protected]> wrote:
The scaling might be the problem, especially since I don't know what
you mean by scaling it according to FE.
For reproducing the issue with a smaller problem:
Change the ComputeRHS function in ex45.c
if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k ==
mz - 1) {
barray[k][j][i] = 0.0;
} else {
barray[k][j][i] = 1.0;
}
Change the dimensions to e.g. 33 (I scaled it down, so it goes quick
without a GPU) instead of 7 and then run with
-ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type
mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
-mg_coarse_pc_type jacobi
You will find that it takes 145 iterations instead of 25 for the
original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41
iterations.
To what do I have to change the diagonal entries of the matrix for
the boundary according to FE? Right now the diagonal is completely
constant.
Paul
On Tuesday, February 28, 2023 00:23 CET, Barry Smith
<[email protected]> wrote:
I have not seen explicitly including, or excluding, the Dirichlet
boundary values in the system having a significant affect on the
convergence so long as you SCALE the diagonal rows (of those
Dirichlet points) by a value similar to the other entries along the
diagonal. If they are scaled completely differently, that can screw
up the convergence. For src/ksp/ksp/ex45.c I see that the
appropriate scaling is used (note the scaling should come from a
finite element view of the discretization even if the discretization
is finite differences as is done in ex45.c)
Are you willing to share the two codes so we can take a look with
experienced eyes to try to figure out the difference?
Barry
> On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley
<[email protected]> wrote:
>
> Hi Barry,
>
> the reason why I wanted to change to ghost boundaries is that I
was worrying about the effect of PCMGs coarsening on these boundary
values.
>
> As mentioned before, I am trying to reproduce results from the
hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order
instead of 4th etc.).
> I am trying to solve the Poisson equation -\nabla^2 u = 1 with u =
0 on the boundary with rtol=1e-9. While my MG solver implemented in
hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing
smooths at the coarse level instead of CG). When I run the "same" MG
solver built in PETSc on this problem, it starts out reducing the
residual norm as fast or even faster for the first 20-30 iterations.
But for the last order of magnitude in the residual norm it needs
more than 300 V-cycles, i.e. it gets very slow. At this point I am
pretty much out of ideas about what is the cause, especially since
e.g. adding back cg at the coarsest level doesn't seem to change the
number of iterations at all. Therefore I am suspecting the
discretization to be the problem. HPGMG uses an even number of
points per dimension (e.g. 256), while PCMG wants an odd number
(e.g. 257). So I also tried adding another layer of boundary values
for the discretization to effectively use only 254 points per
dimension. This caused the solver to get even slightly worse.
>
> So can the explicit boundary values screw with the coarsening,
especially when they are not finite? Because with the problem as
stated in ex45 with finite (i.e. non-zero) boundary values, the MG
solver takes only 18 V-cycles.
>
> Best,
> Paul
>
>
>
> On Monday, February 27, 2023 18:17 CET, Barry Smith
<[email protected]> wrote:
>
>>
>> Paul,
>>
>> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in
the local vectors (obtained with DMCreateLocalVector() but they will
not appear in the global vectors obtained with
DMCreateGlobalVector(); perhaps this is the issue? Since they do not
appear in the global vector they will not appear in the linear
system so there will be no diagonal entries for you to set since
those rows/columns do not exist in the linear system. In other
words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put
the Dirichlet values explicitly into the system being solved;
DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems
than linear systems.
>>
>> Barry
>>
>> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley
<[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > I would like to modify src/ksp/ksp/tutorials/ex45.c to
implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED
instead of using DM_BOUNDARY_NONE and explicitly implementing the
boundary by adding diagnonal-only rows.
>> >
>> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors
from that DM have the extra memory for the ghost entries and that I
can basically use DMDAGetGhostCorners instead of DMDAGetCorners to
access the array gotten via DMDAVecGetArray. But when I access (gxs,
gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at
the implementation of DMDAVecGetArray it looked to me as if
accessing (-1, -1, -1) should work as DMDAVecGetArray passes the
ghost corners to VecGetArray3d which then adds the right offsets.
>> >
>> > I could not find any example using DM_BOUNDARY_GHOSTED and then
actually accessing the ghost/boundary elements. Can I assume that
they are set to zero for the solution vector, i.e. the u=0 on
\del\Omega and I do not need to access them at all?
>> >
>> > Best,
>> > Paul Große-Bley
>>