Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

Große-Bley , Paul Tue, 28 Feb 2023 11:05:54 -0800

Sorry, I should have made myself more clear. I changed the three 7passed to DMDACreate3d to 33 to make the example a bit more realistic,as I also use "U-cycles", i.e. my coarsest level is still big enough tomake use of some GPU parallelism. I should have just put that into thegiven command line argument string with -da_grid_x 33 -da_grid_y 33-da_grid_z 33


On 2/28/23 18:43, Barry Smith wrote:

I am sorry, I cannot reproduce what you describe. I am usingsrc/ksp/ksp/tutorials/ex45.c in the main branch (should be same asrelease for this purpose).
   No change to the code I get
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09-pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
Linear solve converged due to CONVERGED_RTOL iterations 7

KSP Object: 1 MPI process

  type: richardson

damping factor=1.

maximum iterations=10000, nonzero initial guess

tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.

  left preconditioning

  using PRECONDITIONED norm type for convergence test

PC Object: 1 MPI process

  type: mg

    type is MULTIPLICATIVE, levels=3 cycles=v

Cycles per PCApply=1

Not using Galerkin computed coarse grid matrices

  Coarse grid solver -- level 0 -------------------------------

    KSP Object: (mg_coarse_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using PRECONDITIONED norm type for convergence test

    PC Object: (mg_coarse_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=8, cols=8

total: nonzeros=32, allocated nonzeros=32

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Down solver (pre-smoother) on level 1 -------------------------------

    KSP Object: (mg_levels_1_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_1_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=64, cols=64

total: nonzeros=352, allocated nonzeros=352

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

  Down solver (pre-smoother) on level 2 -------------------------------

    KSP Object: (mg_levels_2_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_2_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=343, cols=343

total: nonzeros=2107, allocated nonzeros=2107

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

  linear system matrix = precond matrix:

  Mat Object: 1 MPI process

type: seqaij

rows=343, cols=343

total: nonzeros=2107, allocated nonzeros=2107

total number of mallocs used during MatSetValues calls=0

not using I-node routines

Residual norm 1.33388e-08

~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main

$



   Now change code with
if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1|| k == mz - 1) {
          barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
        } else {
          barray[k][j][i] = 1; //Hx * Hy * Hz;
        }
I do not understand where I am suppose to change the dimension to 33so I ignore that statement. Same command line with change above gives
$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09-pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02
2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03
3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05
4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06
5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07
6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07
7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08
8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09
Linear solve converged due to CONVERGED_RTOL iterations 8

KSP Object: 1 MPI process

type: richardson

damping factor=1.

maximum iterations=10000, nonzero initial guess

tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.

  left preconditioning

using PRECONDITIONED norm type for convergence test

PC Object: 1 MPI process

type: mg

type is MULTIPLICATIVE, levels=3 cycles=v

Cycles per PCApply=1

Not using Galerkin computed coarse grid matrices

Coarse grid solver -- level 0 -------------------------------

KSP Object: (mg_coarse_) 1 MPI process

type: richardson

  damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using PRECONDITIONED norm type for convergence test

    PC Object: (mg_coarse_) 1 MPI process

type: jacobi

  type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

  type: seqaij

  rows=8, cols=8

  total: nonzeros=32, allocated nonzeros=32

  total number of mallocs used during MatSetValues calls=0

    not using I-node routines

  Down solver (pre-smoother) on level 1 -------------------------------

KSP Object: (mg_levels_1_) 1 MPI process

type: richardson

  damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_1_) 1 MPI process

type: jacobi

  type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

  type: seqaij

  rows=64, cols=64

  total: nonzeros=352, allocated nonzeros=352

  total number of mallocs used during MatSetValues calls=0

    not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

  Down solver (pre-smoother) on level 2 -------------------------------

KSP Object: (mg_levels_2_) 1 MPI process

type: richardson

  damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_2_) 1 MPI process

type: jacobi

  type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

  type: seqaij

  rows=343, cols=343

  total: nonzeros=2107, allocated nonzeros=2107

  total number of mallocs used during MatSetValues calls=0

    not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

linear system matrix = precond matrix:

  Mat Object: 1 MPI process

type: seqaij

rows=343, cols=343

total: nonzeros=2107, allocated nonzeros=2107

total number of mallocs used during MatSetValues calls=0

not using I-node routines

Residual norm 4.06326e-08

~/Src/petsc/src/ksp/ksp/tutorials*(main *=)*arch-main

$


In neither case is it taking 25 iterations. What am I doing wrong?
Normally one expects only trivial changes in the convergence ofmultigrid methods when one changes values in the right hand side aswith the run above.
Barry
On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley<[email protected]> wrote:
The scaling might be the problem, especially since I don't know whatyou mean by scaling it according to FE.
For reproducing the issue with a smaller problem:
Change the ComputeRHS function in ex45.c
if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k ==mz - 1) {
  barray[k][j][i] = 0.0;
} else {
  barray[k][j][i] = 1.0;
}
Change the dimensions to e.g. 33 (I scaled it down, so it goes quickwithout a GPU) instead of 7 and then run with
-ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_typemg -pc_mg_levels 3 -mg_levels_ksp_type richardson-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits-mg_coarse_pc_type jacobi
You will find that it takes 145 iterations instead of 25 for theoriginal ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41iterations.
To what do I have to change the diagonal entries of the matrix forthe boundary according to FE? Right now the diagonal is completelyconstant.
Paul
On Tuesday, February 28, 2023 00:23 CET, Barry Smith<[email protected]> wrote:
I have not seen explicitly including, or excluding, the Dirichletboundary values in the system having a significant affect on theconvergence so long as you SCALE the diagonal rows (of thoseDirichlet points) by a value similar to the other entries along thediagonal. If they are scaled completely differently, that can screwup the convergence. For src/ksp/ksp/ex45.c I see that theappropriate scaling is used (note the scaling should come from afinite element view of the discretization even if the discretizationis finite differences as is done in ex45.c)
Are you willing to share the two codes so we can take a look withexperienced eyes to try to figure out the difference?
Barry
> On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley<[email protected]> wrote:
>
> Hi Barry,
>
> the reason why I wanted to change to ghost boundaries is that Iwas worrying about the effect of PCMGs coarsening on these boundaryvalues.
>
> As mentioned before, I am trying to reproduce results from thehpgmg-cuda benchmark (a modified version of it, e.g. using 2nd orderinstead of 4th etc.).> I am trying to solve the Poisson equation -\nabla^2 u = 1 with u =0 on the boundary with rtol=1e-9. While my MG solver implemented inhpgmg solves this in 40 V-cycles (I weakened it a lot by only doingsmooths at the coarse level instead of CG). When I run the "same" MGsolver built in PETSc on this problem, it starts out reducing theresidual norm as fast or even faster for the first 20-30 iterations.But for the last order of magnitude in the residual norm it needsmore than 300 V-cycles, i.e. it gets very slow. At this point I ampretty much out of ideas about what is the cause, especially sincee.g. adding back cg at the coarsest level doesn't seem to change thenumber of iterations at all. Therefore I am suspecting thediscretization to be the problem. HPGMG uses an even number ofpoints per dimension (e.g. 256), while PCMG wants an odd number(e.g. 257). So I also tried adding another layer of boundary valuesfor the discretization to effectively use only 254 points perdimension. This caused the solver to get even slightly worse.
>
> So can the explicit boundary values screw with the coarsening,especially when they are not finite? Because with the problem asstated in ex45 with finite (i.e. non-zero) boundary values, the MGsolver takes only 18 V-cycles.
>
> Best,
> Paul
>
>
>
> On Monday, February 27, 2023 18:17 CET, Barry Smith<[email protected]> wrote:
>
>>
>> Paul,
>>
>> DM_BOUNDARY_GHOSTED would result in the extra ghost locations inthe local vectors (obtained with DMCreateLocalVector() but they willnot appear in the global vectors obtained withDMCreateGlobalVector(); perhaps this is the issue? Since they do notappear in the global vector they will not appear in the linearsystem so there will be no diagonal entries for you to set sincethose rows/columns do not exist in the linear system. In otherwords, using DM_BOUNDARY_GHOSTED is a way to avoid needing to putthe Dirichlet values explicitly into the system being solved;DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systemsthan linear systems.
>>
>> Barry
>>
>> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley<[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > I would like to modify src/ksp/ksp/tutorials/ex45.c toimplement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTEDinstead of using DM_BOUNDARY_NONE and explicitly implementing theboundary by adding diagnonal-only rows.
>> >
>> > My assumption was that with DM_BOUNDARY_GHOSTED all vectorsfrom that DM have the extra memory for the ghost entries and that Ican basically use DMDAGetGhostCorners instead of DMDAGetCorners toaccess the array gotten via DMDAVecGetArray. But when I access (gxs,gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking atthe implementation of DMDAVecGetArray it looked to me as ifaccessing (-1, -1, -1) should work as DMDAVecGetArray passes theghost corners to VecGetArray3d which then adds the right offsets.
>> >
>> > I could not find any example using DM_BOUNDARY_GHOSTED and thenactually accessing the ghost/boundary elements. Can I assume thatthey are set to zero for the solution vector, i.e. the u=0 on\del\Omega and I do not need to access them at all?
>> >
>> > Best,
>> > Paul Große-Bley
>>

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

Reply via email to