Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-03-01 Thread Paul Grosse-Bley

I previously thought that a custom Red-Black GS solution would be to specific 
for PETSc as it will only work for star-shaped stencils of width 1.

For that specific operation, a custom kernel would probably be able to achieve 
significantly better performance as it would not have to load a mask from 
memory and one can make use of the very regular structure of the problem. We 
still wanted to use cuSPARSE here because it seemed to be more in line with the 
present GPU support of PETSc and using custom kernels in one library might be 
seen as unfair when comparing it to others.

But if you are interested anyway, I will share my implementation if/when I get 
to implementing that.

Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 16:51 CET, Barry Smith  wrote:
   On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley 
 wrote: Thank you for the detailed 
answer, Barry. I had hit a deadend on my side.
If you wish to compare, for example, ex45.c with a code that does not 
incorporate the Dirichlet boundary nodes in the linear system you can just use 
0 boundary conditions for both codes.Do you mean to implement the boundary 
conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for 
them?    I don't know anything about hpgmg-cuda and what it means by "ghosted 
cells".  I am just saying I think it is reasonable to use the style of ex45.c 
that retains Dirichlet unknowns in the global matrix (with zero on the those 
points) in PETSc to compare with other codes that may or may not do something 
different. But you need to use zero for Dirichlet points to ensure that the 
"funny" convergence rates do not happen at the beginning making the comparison 
unbalanced between the two codes. 
Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric 
information) will cause the boundary condition on the coarser grids to be 
finite (>0)?   In general even if the Dirichlet points on the fine grid are 
zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the 
coarser grid may end up during the iterative process as non-zero.  But this is 
"harmless", just part of the algorithm but it does mean the convergence with a 
code that does not include those points will be different (not necessarily 
better or worse, just different). 
Ideally I would like to just use some kind of GPU-parallel (colored) 
SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement 
Red-Black GS using cuSPARSE's masked matrix vector products, but I have not 
found any information on implementing a custom preconditioner in PETSc.   You 
can use https://petsc.org/release/docs/manualpages/PC/PCSHELL/ You can look at 
src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a 
preconditioner object.  If you write such a GPU-parallel (colored) 
SOR/Gauss-Seidel  we would love to include it in PETSc. Note also 
https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/ and potentially 
other "polynomial preconditioners" are an alternative approach for having more 
powerful parallel smoothers.
Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 05:38 CET, Barry Smith  wrote:
  Ok, here is the situation. The command line options as given do not 
result in multigrid quality convergence in any of the runs; the error 
contraction factor is around .94 (meaning that for the modes that the multigrid 
algorithm does the worst on it only removes about 6 percent of them per 
iteration).     But this is hidden by the initial right hand side for the 
linear system as written in ex45.c which has O(h) values on the boundary nodes 
and O(h^3) values on the interior nodes. The first iterations are largely 
working on the boundary residual and making great progress attacking that so 
that it looks like the one has a good error contraction factor. One then sees 
the error contraction factor start to get worse and worse for the later 
iterations. With the 0 on the boundary the iterations quickly get to the bad 
regime where the error contraction factor is near one. One can see this by 
using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for 
each iteration. Thought it appears the 0 boundary condition one converges much 
slower (since it requires many more iterations) if you factor out the huge 
advantage of the nonzero boundary condition case at the beginning (in terms of 
decreasing the residual) you see they both have an asymptotic error contraction 
factor of around .94 (which is horrible for multigrid).    I now add 
-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9 and rerun 
the two cases (nonzero and zero boundary right hand side) they take 35 and 41 
iterations (much better) initial residual norm 14.6993next residual norm 
0.84167 0.0572591next residual norm 0.0665392 0.00452668next residual norm 
0.0307273 0.00209039next residual norm 0.0158949 0.00108134next residual norm 
0.00825189 0.000561378next residual norm 0.00428474 

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-03-01 Thread Barry Smith


> On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley 
>  wrote:
> 
> Thank you for the detailed answer, Barry. I had hit a deadend on my side.
> If you wish to compare, for example, ex45.c with a code that does not 
> incorporate the Dirichlet boundary nodes in the linear system you can just 
> use 0 boundary conditions for both codes.
> 
> Do you mean to implement the boundary conditions explicitly in e.g. 
> hpgmg-cuda instead of using the ghosted cells for them?

   I don't know anything about hpgmg-cuda and what it means by "ghosted cells". 
 I am just saying I think it is reasonable to use the style of ex45.c that 
retains Dirichlet unknowns in the global matrix (with zero on the those points) 
in PETSc to compare with other codes that may or may not do something 
different. But you need to use zero for Dirichlet points to ensure that the 
"funny" convergence rates do not happen at the beginning making the comparison 
unbalanced between the two codes.

> 
> Do I go right in the assumption that the PCMG coarsening (using DMDAs 
> geometric information) will cause the boundary condition on the coarser grids 
> to be finite (>0)?

  In general even if the Dirichlet points on the fine grid are zero, during 
PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may 
end up during the iterative process as non-zero.  But this is "harmless", just 
part of the algorithm but it does mean the convergence with a code that does 
not include those points will be different (not necessarily better or worse, 
just different). 
> 
> Ideally I would like to just use some kind of GPU-parallel (colored) 
> SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement 
> Red-Black GS using cuSPARSE's masked matrix vector products, but I have not 
> found any information on implementing a custom preconditioner in PETSc.

  You can use https://petsc.org/release/docs/manualpages/PC/PCSHELL/ You can 
look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a 
preconditioner object.

 If you write such a GPU-parallel (colored) SOR/Gauss-Seidel  we would love to 
include it in PETSc. Note also 
https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/ and potentially 
other "polynomial preconditioners" are an alternative approach for having more 
powerful parallel smoothers.




> 
> Best,
> Paul Grosse-Bley
> 
> On Wednesday, March 01, 2023 05:38 CET, Barry Smith  wrote:
>  
>> 
>>  
>  
>Ok, here is the situation. The command line options as given do not result 
> in multigrid quality convergence in any of the runs; the error contraction 
> factor is around .94 (meaning that for the modes that the multigrid algorithm 
> does the worst on it only removes about 6 percent of them per iteration). 
>  
>But this is hidden by the initial right hand side for the linear system as 
> written in ex45.c which has O(h) values on the boundary nodes and O(h^3) 
> values on the interior nodes. The first iterations are largely working on the 
> boundary residual and making great progress attacking that so that it looks 
> like the one has a good error contraction factor. One then sees the error 
> contraction factor start to get worse and worse for the later iterations. 
> With the 0 on the boundary the iterations quickly get to the bad regime where 
> the error contraction factor is near one. One can see this by using a 
> -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each 
> iteration. Thought it appears the 0 boundary condition one converges much 
> slower (since it requires many more iterations) if you factor out the huge 
> advantage of the nonzero boundary condition case at the beginning (in terms 
> of decreasing the residual) you see they both have an asymptotic error 
> contraction factor of around .94 (which is horrible for multigrid).
>  
>I now add -mg_levels_ksp_richardson_scale .9 
> -mg_coarse_ksp_richardson_scale .9 and rerun the two cases (nonzero and zero 
> boundary right hand side) they take 35 and 41 iterations (much better)
>  
> initial residual norm 14.6993
> next residual norm 0.84167 0.0572591
> next residual norm 0.0665392 0.00452668
> next residual norm 0.0307273 0.00209039
> next residual norm 0.0158949 0.00108134
> next residual norm 0.00825189 0.000561378
> next residual norm 0.00428474 0.000291492
> next residual norm 0.00222482 0.000151355
> next residual norm 0.00115522 7.85898e-05
> next residual norm 0.000599836 4.0807e-05
> next residual norm 0.000311459 2.11887e-05
> next residual norm 0.000161722 1.1002e-05
> next residual norm 8.39727e-05 5.71269e-06
> next residual norm 4.3602e-05 2.96626e-06
> next residual norm 2.26399e-05 1.5402e-06
> next residual norm 1.17556e-05 7.99735e-07
> next residual norm 6.10397e-06 4.15255e-07
> next residual norm 3.16943e-06 2.15617e-07
> next residual norm 1.64569e-06 1.11957e-07
> next residual norm 8.54511e-07 5.81326e-08
> next residual norm 4.43697e-07 3.01848e-08
> next residual 

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-03-01 Thread Paul Grosse-Bley

Thank you for the detailed answer, Barry. I had hit a deadend on my side.
If you wish to compare, for example, ex45.c with a code that does not 
incorporate the Dirichlet boundary nodes in the linear system you can just use 
0 boundary conditions for both codes.Do you mean to implement the boundary 
conditions explicitly in e.g. hpgmg-cuda instead of using the ghosted cells for 
them?

Do I go right in the assumption that the PCMG coarsening (using DMDAs geometric 
information) will cause the boundary condition on the coarser grids to be 
finite (>0)?

Ideally I would like to just use some kind of GPU-parallel (colored) 
SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement 
Red-Black GS using cuSPARSE's masked matrix vector products, but I have not 
found any information on implementing a custom preconditioner in PETSc.

Best,
Paul Grosse-Bley

On Wednesday, March 01, 2023 05:38 CET, Barry Smith  wrote:
  Ok, here is the situation. The command line options as given do not 
result in multigrid quality convergence in any of the runs; the error 
contraction factor is around .94 (meaning that for the modes that the multigrid 
algorithm does the worst on it only removes about 6 percent of them per 
iteration).     But this is hidden by the initial right hand side for the 
linear system as written in ex45.c which has O(h) values on the boundary nodes 
and O(h^3) values on the interior nodes. The first iterations are largely 
working on the boundary residual and making great progress attacking that so 
that it looks like the one has a good error contraction factor. One then sees 
the error contraction factor start to get worse and worse for the later 
iterations. With the 0 on the boundary the iterations quickly get to the bad 
regime where the error contraction factor is near one. One can see this by 
using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for 
each iteration. Thought it appears the 0 boundary condition one converges much 
slower (since it requires many more iterations) if you factor out the huge 
advantage of the nonzero boundary condition case at the beginning (in terms of 
decreasing the residual) you see they both have an asymptotic error contraction 
factor of around .94 (which is horrible for multigrid).    I now add 
-mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9 and rerun 
the two cases (nonzero and zero boundary right hand side) they take 35 and 41 
iterations (much better) 
initial residual norm 14.6993
next residual norm 0.84167 0.0572591
next residual norm 0.0665392 0.00452668
next residual norm 0.0307273 0.00209039
next residual norm 0.0158949 0.00108134
next residual norm 0.00825189 0.000561378
next residual norm 0.00428474 0.000291492
next residual norm 0.00222482 0.000151355
next residual norm 0.00115522 7.85898e-05
next residual norm 0.000599836 4.0807e-05
next residual norm 0.000311459 2.11887e-05
next residual norm 0.000161722 1.1002e-05
next residual norm 8.39727e-05 5.71269e-06
next residual norm 4.3602e-05 2.96626e-06
next residual norm 2.26399e-05 1.5402e-06
next residual norm 1.17556e-05 7.99735e-07
next residual norm 6.10397e-06 4.15255e-07
next residual norm 3.16943e-06 2.15617e-07
next residual norm 1.64569e-06 1.11957e-07
next residual norm 8.54511e-07 5.81326e-08
next residual norm 4.43697e-07 3.01848e-08
next residual norm 2.30385e-07 1.56732e-08
next residual norm 1.19625e-07 8.13815e-09
next residual norm 6.21143e-08 4.22566e-09
next residual norm 3.22523e-08 2.19413e-09
next residual norm 1.67467e-08 1.13928e-09
next residual norm 8.69555e-09 5.91561e-10
next residual norm 4.51508e-09 3.07162e-10
next residual norm 2.34441e-09 1.59491e-10
next residual norm 1.21731e-09 8.28143e-11
next residual norm 6.32079e-10 4.30005e-11
next residual norm 3.28201e-10 2.23276e-11
next residual norm 1.70415e-10 1.15934e-11
next residual norm 8.84865e-11 6.01976e-12
next residual norm 4.59457e-11 3.1257e-12
next residual norm 2.38569e-11 1.62299e-12
next residual norm 1.23875e-11 8.42724e-13
Linear solve converged due to CONVERGED_RTOL iterations 35
Residual norm 1.23875e-11 
initial residual norm 172.601
next residual norm 154.803 0.896887
next residual norm 66.9409 0.387837
next residual norm 34.4572 0.199636
next residual norm 17.8836 0.103612
next residual norm 9.28582 0.0537995
next residual norm 4.82161 0.027935
next residual norm 2.50358 0.014505
next residual norm 1.29996 0.0075316
next residual norm 0.674992 0.00391071
next residual norm 0.350483 0.0020306
next residual norm 0.181985 0.00105437
next residual norm 0.094494 0.000547472
next residual norm 0.0490651 0.000284269
next residual norm 0.0254766 0.000147604
next residual norm 0.0132285 7.6642e-05
next residual norm 0.00686876 3.97956e-05
next residual norm 0.00356654 2.06635e-05
next residual norm 0.00185189 1.07293e-05
next residual norm 0.000961576 5.5711e-06
next residual norm 0.000499289 2.89274e-06
next residual norm 0.000259251 1.50203e-06

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-28 Thread Große-Bley , Paul
Sorry, I should have made myself more clear. I changed the three 7 
passed to DMDACreate3d to 33 to make the example a bit more realistic, 
as I also use "U-cycles", i.e. my coarsest level is still big enough to 
make use of some GPU parallelism. I should have just put that into the 
given command line argument string with -da_grid_x 33 -da_grid_y 33 
-da_grid_z 33


On 2/28/23 18:43, Barry Smith wrote:


   I am sorry, I cannot reproduce what you describe. I am using 
src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as 
release for this purpose).


   No change to the code I get

$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 
-pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits 
-mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson 
-mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view


  0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm 
1.476491378857e+01 ||r(i)||/||b|| 1.e+00


  1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm 
5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03


  2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm 
1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04


  3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm 
6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06


  4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm 
3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07


  5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm 
4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08


  6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm 
7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09


  7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm 
1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10


Linear solve converged due to CONVERGED_RTOL iterations 7

KSP Object: 1 MPI process

  type: richardson

damping factor=1.

maximum iterations=1, nonzero initial guess

tolerances:  relative=1e-09, absolute=1e-50, divergence=1.

  left preconditioning

  using PRECONDITIONED norm type for convergence test

PC Object: 1 MPI process

  type: mg

    type is MULTIPLICATIVE, levels=3 cycles=v

Cycles per PCApply=1

Not using Galerkin computed coarse grid matrices

  Coarse grid solver -- level 0 ---

    KSP Object: (mg_coarse_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=1.

left preconditioning

using PRECONDITIONED norm type for convergence test

    PC Object: (mg_coarse_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=8, cols=8

total: nonzeros=32, allocated nonzeros=32

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Down solver (pre-smoother) on level 1 ---

    KSP Object: (mg_levels_1_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=1.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_1_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=64, cols=64

total: nonzeros=352, allocated nonzeros=352

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

  Down solver (pre-smoother) on level 2 ---

    KSP Object: (mg_levels_2_) 1 MPI process

type: richardson

damping factor=1.

maximum iterations=6, nonzero initial guess

tolerances:  relative=1e-05, absolute=1e-50, divergence=1.

left preconditioning

using NONE norm type for convergence test

    PC Object: (mg_levels_2_) 1 MPI process

type: jacobi

type DIAGONAL

linear system matrix = precond matrix:

Mat Object: 1 MPI process

type: seqaij

rows=343, cols=343

total: nonzeros=2107, allocated nonzeros=2107

total number of mallocs used during MatSetValues calls=0

  not using I-node routines

  Up solver (post-smoother) same as down solver (pre-smoother)

  linear system matrix = precond matrix:

  Mat Object: 1 MPI process

type: seqaij

rows=343, cols=343

total: nonzeros=2107, allocated nonzeros=2107

total number of mallocs used during MatSetValues calls=0

not using I-node routines

Residual norm 1.33388e-08

~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main

$



   Now change code with

        if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 
|| k == mz - 1) {

          barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
        } else {
  

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-28 Thread Barry Smith

   I am sorry, I cannot reproduce what you describe. I am using 
src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as release for 
this purpose).

   No change to the code I get 

$ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg 
-pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 
-mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type 
richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
-mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
  0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm 
1.476491378857e+01 ||r(i)||/||b|| 1.e+00
  1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm 
5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
  2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm 
1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
  3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm 
6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
  4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm 
3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
  5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm 
4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
  6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm 
7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
  7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm 
1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
Linear solve converged due to CONVERGED_RTOL iterations 7
KSP Object: 1 MPI process
  type: richardson
damping factor=1.
  maximum iterations=1, nonzero initial guess
  tolerances:  relative=1e-09, absolute=1e-50, divergence=1.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
  type: mg
type is MULTIPLICATIVE, levels=3 cycles=v
  Cycles per PCApply=1
  Not using Galerkin computed coarse grid matrices
  Coarse grid solver -- level 0 ---
KSP Object: (mg_coarse_) 1 MPI process
  type: richardson
damping factor=1.
  maximum iterations=6, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (mg_coarse_) 1 MPI process
  type: jacobi
type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
type: seqaij
rows=8, cols=8
total: nonzeros=32, allocated nonzeros=32
total number of mallocs used during MatSetValues calls=0
  not using I-node routines
  Down solver (pre-smoother) on level 1 ---
KSP Object: (mg_levels_1_) 1 MPI process
  type: richardson
damping factor=1.
  maximum iterations=6, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI process
  type: jacobi
type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
type: seqaij
rows=64, cols=64
total: nonzeros=352, allocated nonzeros=352
total number of mallocs used during MatSetValues calls=0
  not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
KSP Object: (mg_levels_2_) 1 MPI process
  type: richardson
damping factor=1.
  maximum iterations=6, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI process
  type: jacobi
type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
  not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls=0
  not using I-node routines
Residual norm 1.33388e-08
~/Src/petsc/src/ksp/ksp/tutorials (main=) arch-main
$ 


   Now change code with 

if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz 
- 1) {
  barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
} else {
  barray[k][j][i] = 1; //Hx * Hy * Hz;
}

  I do not understand where I am suppose to change the dimension to 33 so I 
ignore that 

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-27 Thread Paul Grosse-Bley

The scaling might be the problem, especially since I don't know what you mean 
by scaling it according to FE.

For reproducing the issue with a smaller problem:
Change the ComputeRHS function in ex45.c

if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) {
  barray[k][j][i] = 0.0;
} else {
  barray[k][j][i] = 1.0;
}

Change the dimensions to e.g. 33 (I scaled it down, so it goes quick without a 
GPU) instead of 7 and then run with

-ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg 
-pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 
-mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type 
richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
-mg_coarse_pc_type jacobi

You will find that it takes 145 iterations instead of 25 for the original ex45 
RHS. My hpgmg-cuda implementation (using 32^3) takes 41 iterations.

To what do I have to change the diagonal entries of the matrix for the boundary 
according to FE? Right now the diagonal is completely constant.

Paul

On Tuesday, February 28, 2023 00:23 CET, Barry Smith  wrote:
 
I have not seen explicitly including, or excluding, the Dirichlet boundary 
values in the system having a significant affect on the convergence so long as 
you SCALE the diagonal rows (of those Dirichlet points) by a value similar to 
the other entries along the diagonal. If they are scaled completely 
differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see 
that the appropriate scaling is used (note the scaling should come from a 
finite element view of the discretization even if the discretization is finite 
differences as is done in ex45.c)

Are you willing to share the two codes so we can take a look with experienced 
eyes to try to figure out the difference?

Barry




> On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley 
>  wrote:
>
> Hi Barry,
>
> the reason why I wanted to change to ghost boundaries is that I was worrying 
> about the effect of PCMGs coarsening on these boundary values.
>
> As mentioned before, I am trying to reproduce results from the hpgmg-cuda 
> benchmark (a modified version of it, e.g. using 2nd order instead of 4th 
> etc.).
> I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the 
> boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this 
> in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level 
> instead of CG). When I run the "same" MG solver built in PETSc on this 
> problem, it starts out reducing the residual norm as fast or even faster for 
> the first 20-30 iterations. But for the last order of magnitude in the 
> residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At 
> this point I am pretty much out of ideas about what is the cause, especially 
> since e.g. adding back cg at the coarsest level doesn't seem to change the 
> number of iterations at all. Therefore I am suspecting the discretization to 
> be the problem. HPGMG uses an even number of points per dimension (e.g. 256), 
> while PCMG wants an odd number (e.g. 257). So I also tried adding another 
> layer of boundary values for the discretization to effectively use only 254 
> points per dimension. This caused the solver to get even slightly worse.
>
> So can the explicit boundary values screw with the coarsening, especially 
> when they are not finite? Because with the problem as stated in ex45 with 
> finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles.
>
> Best,
> Paul
>
>
>
> On Monday, February 27, 2023 18:17 CET, Barry Smith  wrote:
>
>>
>> Paul,
>>
>> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local 
>> vectors (obtained with DMCreateLocalVector() but they will not appear in the 
>> global vectors obtained with DMCreateGlobalVector(); perhaps this is the 
>> issue? Since they do not appear in the global vector they will not appear in 
>> the linear system so there will be no diagonal entries for you to set since 
>> those rows/columns do not exist in the linear system. In other words, using 
>> DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values 
>> explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally 
>> more helpful for nonlinear systems than linear systems.
>>
>> Barry
>>
>> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley 
>> >  wrote:
>> >
>> > Hi,
>> >
>> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet 
>> > boundary conditions using DM_BOUNDARY_GHOSTED instead of using 
>> > DM_BOUNDARY_NONE and explicitly implementing the boundary by adding 
>> > diagnonal-only rows.
>> >
>> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM 
>> > have the extra memory for the ghost entries and that I can basically use 
>> > DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten 
>> > via DMDAVecGetArray. But when I access 

Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-27 Thread Barry Smith


  I have not seen explicitly including, or excluding, the Dirichlet boundary 
values in the system having a significant affect on the convergence so long as 
you SCALE the diagonal rows (of those Dirichlet points) by a value similar to 
the other entries along the diagonal. If they are scaled completely 
differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see 
that the appropriate scaling is used (note the scaling should come from a 
finite element view of the discretization even if the discretization is finite 
differences as is done in ex45.c)

   Are you willing to share the two codes so we can take a look with 
experienced eyes to try to figure out the difference?

  Barry




> On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley 
>  wrote:
> 
> Hi Barry,
> 
> the reason why I wanted to change to ghost boundaries is that I was worrying 
> about the effect of PCMGs coarsening on these boundary values.
> 
> As mentioned before, I am trying to reproduce results from the hpgmg-cuda 
> benchmark (a modified version of it, e.g. using 2nd order instead of 4th 
> etc.).
> I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the 
> boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this 
> in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level 
> instead of CG). When I run the "same" MG solver built in PETSc on this 
> problem, it starts out reducing the residual norm as fast or even faster for 
> the first 20-30 iterations. But for the last order of magnitude in the 
> residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At 
> this point I am pretty much out of ideas about what is the cause, especially 
> since e.g. adding back cg at the coarsest level doesn't seem to change the 
> number of iterations at all. Therefore I am suspecting the discretization to 
> be the problem. HPGMG uses an even number of points per dimension (e.g. 256), 
> while PCMG wants an odd number (e.g. 257). So I also tried adding another 
> layer of boundary values for the discretization to effectively use only 254 
> points per dimension. This caused the solver to get even slightly worse.
> 
> So can the explicit boundary values screw with the coarsening, especially 
> when they are not finite? Because with the problem as stated in ex45 with 
> finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles.
> 
> Best,
> Paul
> 
> 
> 
> On Monday, February 27, 2023 18:17 CET, Barry Smith  wrote:
>  
>> 
>> Paul,
>> 
>> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local 
>> vectors (obtained with DMCreateLocalVector() but they will not appear in the 
>> global vectors obtained with DMCreateGlobalVector(); perhaps this is the 
>> issue? Since they do not appear in the global vector they will not appear in 
>> the linear system so there will be no diagonal entries for you to set since 
>> those rows/columns do not exist in the linear system. In other words, using 
>> DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values 
>> explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally 
>> more helpful for nonlinear systems than linear systems.
>> 
>> Barry
>> 
>> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley 
>> >  wrote:
>> >
>> > Hi,
>> >
>> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet 
>> > boundary conditions using DM_BOUNDARY_GHOSTED instead of using 
>> > DM_BOUNDARY_NONE and explicitly implementing the boundary by adding 
>> > diagnonal-only rows.
>> >
>> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM 
>> > have the extra memory for the ghost entries and that I can basically use 
>> > DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten 
>> > via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get 
>> > a segmentation fault. When looking at the implementation of 
>> > DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work 
>> > as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then 
>> > adds the right offsets.
>> >
>> > I could not find any example using DM_BOUNDARY_GHOSTED and then actually 
>> > accessing the ghost/boundary elements. Can I assume that they are set to 
>> > zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need 
>> > to access them at all?
>> >
>> > Best,
>> > Paul Große-Bley
>>  



Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-27 Thread Paul Grosse-Bley

Hi Barry,

the reason why I wanted to change to ghost boundaries is that I was worrying 
about the effect of PCMGs coarsening on these boundary values.

As mentioned before, I am trying to reproduce results from the hpgmg-cuda 
benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.).
I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the 
boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 
40 V-cycles (I weakened it a lot by only doing smooths at the coarse level 
instead of CG). When I run the "same" MG solver built in PETSc on this problem, 
it starts out reducing the residual norm as fast or even faster for the first 
20-30 iterations. But for the last order of magnitude in the residual norm it 
needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty 
much out of ideas about what is the cause, especially since e.g. adding back cg 
at the coarsest level doesn't seem to change the number of iterations at all. 
Therefore I am suspecting the discretization to be the problem. HPGMG uses an 
even number of points per dimension (e.g. 256), while PCMG wants an odd number 
(e.g. 257). So I also tried adding another layer of boundary values for the 
discretization to effectively use only 254 points per dimension. This caused 
the solver to get even slightly worse.

So can the explicit boundary values screw with the coarsening, especially when 
they are not finite? Because with the problem as stated in ex45 with finite 
(i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles.

Best,
Paul



On Monday, February 27, 2023 18:17 CET, Barry Smith  wrote:
 Paul,

DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local 
vectors (obtained with DMCreateLocalVector() but they will not appear in the 
global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? 
Since they do not appear in the global vector they will not appear in the 
linear system so there will be no diagonal entries for you to set since those 
rows/columns do not exist in the linear system. In other words, using 
DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values 
explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more 
helpful for nonlinear systems than linear systems.

Barry

> On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley 
>  wrote:
>
> Hi,
>
> I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet 
> boundary conditions using DM_BOUNDARY_GHOSTED instead of using 
> DM_BOUNDARY_NONE and explicitly implementing the boundary by adding 
> diagnonal-only rows.
>
> My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have 
> the extra memory for the ghost entries and that I can basically use 
> DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via 
> DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a 
> segmentation fault. When looking at the implementation of DMDAVecGetArray it 
> looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray 
> passes the ghost corners to VecGetArray3d which then adds the right offsets.
>
> I could not find any example using DM_BOUNDARY_GHOSTED and then actually 
> accessing the ghost/boundary elements. Can I assume that they are set to zero 
> for the solution vector, i.e. the u=0 on \del\Omega and I do not need to 
> access them at all?
>
> Best,
> Paul Große-Bley
 


Re: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-27 Thread Barry Smith
Paul,

DM_BOUNDARY_GHOSTED  would result in the extra ghost locations in the local 
vectors (obtained with DMCreateLocalVector() but they will not appear in the 
global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? 
Since they do not appear in the global vector they will not appear in the 
linear system so there will be no diagonal entries for you to set since those 
rows/columns do not exist in the linear system.  In other words, using 
DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values 
explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more 
helpful for nonlinear systems than linear systems.

Barry

> On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley 
>  wrote:
> 
> Hi,
> 
> I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet 
> boundary conditions using DM_BOUNDARY_GHOSTED instead of using 
> DM_BOUNDARY_NONE and explicitly implementing the boundary by adding 
> diagnonal-only rows.
> 
> My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have 
> the extra memory for the ghost entries and that I can basically use 
> DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via 
> DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a 
> segmentation fault. When looking at the implementation of DMDAVecGetArray it 
> looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray 
> passes the ghost corners to VecGetArray3d which then adds the right offsets.
> 
> I could not find any example using DM_BOUNDARY_GHOSTED and then actually 
> accessing the ghost/boundary elements. Can I assume that they are set to zero 
> for the solution vector, i.e. the u=0 on \del\Omega and I do not need to 
> access them at all?
> 
> Best,
> Paul Große-Bley



[petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

2023-02-27 Thread Paul Grosse-Bley

Hi,

I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet 
boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE 
and explicitly implementing the boundary by adding diagnonal-only rows.

My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have 
the extra memory for the ghost entries and that I can basically use 
DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via 
DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a 
segmentation fault. When looking at the implementation of DMDAVecGetArray it 
looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes 
the ghost corners to VecGetArray3d which then adds the right offsets.

I could not find any example using DM_BOUNDARY_GHOSTED and then actually 
accessing the ghost/boundary elements. Can I assume that they are set to zero 
for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access 
them at all?

Best,
Paul Große-Bley