> On Jun 22, 2017, at 3:52 PM, Barry Smith <[email protected]> wrote:
> 
> 
>   Try running the -pc_type mg   case with the additional argument 
> -pc_mg_galerkin does that decrease the number of iterations?

Yes, adding that option yields a considerable improvement:

$ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
-ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 
-mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale 
-mg_levels_ksp_max_it 5 -pc_mg_galerkin
right hand side 2 norm: 512.
right hand side infinity norm: 0.999097
building operator with Dirichlet boundary conditions, global grid size: 128 x 
128 x 128
  0 KSP preconditioned resid norm 9.822067063977e-01 true resid norm 
5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 2.212533971477e-01 true resid norm 
3.294699127279e+02 ||r(i)||/||b|| 6.434959232968e-01
  2 KSP preconditioned resid norm 9.110625065462e-02 true resid norm 
1.674425012051e+02 ||r(i)||/||b|| 3.270361351662e-01
  3 KSP preconditioned resid norm 3.418653611088e-02 true resid norm 
7.052946665869e+01 ||r(i)||/||b|| 1.377528645678e-01
  4 KSP preconditioned resid norm 1.120326622247e-02 true resid norm 
2.513164585178e+01 ||r(i)||/||b|| 4.908524580425e-02
  5 KSP preconditioned resid norm 3.171704458717e-03 true resid norm 
7.887015942877e+00 ||r(i)||/||b|| 1.540432801343e-02
  6 KSP preconditioned resid norm 7.376165339467e-04 true resid norm 
2.126203120360e+00 ||r(i)||/||b|| 4.152740469453e-03
  7 KSP preconditioned resid norm 1.599952443901e-04 true resid norm 
4.455888786560e-01 ||r(i)||/||b|| 8.702907786250e-04
  8 KSP preconditioned resid norm 5.280374315084e-05 true resid norm 
9.902193817384e-02 ||r(i)||/||b|| 1.934022229958e-04
  9 KSP preconditioned resid norm 3.284982207258e-05 true resid norm 
3.046564806731e-02 ||r(i)||/||b|| 5.950321888146e-05
 10 KSP preconditioned resid norm 1.630367968481e-05 true resid norm 
3.157470827473e-02 ||r(i)||/||b|| 6.166935209908e-05
 11 KSP preconditioned resid norm 4.546559816411e-06 true resid norm 
9.115958407015e-03 ||r(i)||/||b|| 1.780460626370e-05
KSP Object: 16 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     16 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     16 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 16 PCs follows
        KSP Object:        (mg_coarse_redundant_)         1 MPI processes
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_redundant_)         1 MPI processes
          type: lu
            LU: out-of-place factorization
            tolerance for zero pivot 2.22045e-14
            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
            matrix ordering: nd
            factor fill ratio given 5., needed 7.56438
              Factored matrix follows:
                Mat Object:                 1 MPI processes
                  type: seqaij
                  rows=512, cols=512
                  package used to perform factorization: petsc
                  total: nonzeros=24206, allocated nonzeros=24206
                  total number of mallocs used during MatSetValues calls =0
                    not using I-node routines
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=512, cols=512
            total: nonzeros=3200, allocated nonzeros=3200
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=512, cols=512
        total: nonzeros=3200, allocated nonzeros=3200
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=4096, cols=4096
        total: nonzeros=27136, allocated nonzeros=27136
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=32768, cols=32768
        total: nonzeros=223232, allocated nonzeros=223232
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=262144, cols=262144
        total: nonzeros=1810432, allocated nonzeros=1810432
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=2097152, cols=2097152
        total: nonzeros=14581760, allocated nonzeros=14581760
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   16 MPI processes
    type: mpiaij
    rows=2097152, cols=2097152
    total: nonzeros=14581760, allocated nonzeros=14581760
    total number of mallocs used during MatSetValues calls =0
Residual 2 norm 0.00911596
Residual infinity norm 4.90052e-05

> 
>> On Jun 22, 2017, at 3:20 PM, Jason Lefley <[email protected]> wrote:
>> 
>> Thanks for the prompt replies. I ran with gamg and the results look more 
>> promising. I tried the suggested -mg_* options and did not see improvement.
> 
>   Yes, the 5 iterations you get below is pretty much the best you can expect. 
> No reasonably tuning of smoother options is likely to have much affect (it is 
> very difficult to improve from 5).
> 

We are quite happy with 5 iterations. The suggested mg options did not improve 
the runs involving -pc_type mg. I’d like to know how to get the same 
convergence we observe with gamg when using mg, if possible.

>   Barry
> 
>> The -ksp_view and -ksp_monitor_true_residual output from those tests and the 
>> solver_test source (modified ex34.c) follow:
>> 
>> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>> -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg
>> right hand side 2 norm: 512.
>> right hand side infinity norm: 0.999097
>> building operator with Dirichlet boundary conditions, global grid size: 128 
>> x 128 x 128
>>  0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 
>> 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
>>  1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 
>> 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00
>>  2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 
>> 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02
>>  3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 
>> 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03
>>  4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 
>> 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04
>>  5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 
>> 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05
>> KSP Object: 16 MPI processes
> 
>    Y
>>  type: cg
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 16 MPI processes
>>  type: gamg
>>    MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>      Cycles per PCApply=1
>>      Using Galerkin computed coarse grid matrices
>>      GAMG specific options
>>        Threshold for dropping small values from graph 0.
>>        AGG specific options
>>          Symmetric graph false
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     16 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     16 MPI processes
>>      type: bjacobi
>>        block Jacobi: number of blocks = 16
>>        Local solve is same for all blocks, in the following KSP and PC 
>> objects:
>>      KSP Object:      (mg_coarse_sub_)       1 MPI processes
>>        type: preonly
>>        maximum iterations=1, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object:      (mg_coarse_sub_)       1 MPI processes
>>        type: lu
>>          LU: out-of-place factorization
>>          tolerance for zero pivot 2.22045e-14
>>          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>          matrix ordering: nd
>>          factor fill ratio given 5., needed 1.
>>            Factored matrix follows:
>>              Mat Object:               1 MPI processes
>>                type: seqaij
>>                rows=13, cols=13
>>                package used to perform factorization: petsc
>>                total: nonzeros=169, allocated nonzeros=169
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 3 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object:         1 MPI processes
>>          type: seqaij
>>          rows=13, cols=13
>>          total: nonzeros=169, allocated nonzeros=169
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 3 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=13, cols=13
>>        total: nonzeros=169, allocated nonzeros=169
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 3 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.136516, max = 1.50168
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 
>> 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_1_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=467, cols=467
>>        total: nonzeros=68689, allocated nonzeros=68689
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.148872, max = 1.63759
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 
>> 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_2_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=14893, cols=14893
>>        total: nonzeros=1856839, allocated nonzeros=1856839
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object:    (mg_levels_3_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.135736, max = 1.49309
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 
>> 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_3_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_3_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=190701, cols=190701
>>        total: nonzeros=6209261, allocated nonzeros=6209261
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object:    (mg_levels_4_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.140039, max = 1.54043
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 
>> 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_4_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_4_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=2097152, cols=2097152
>>        total: nonzeros=14581760, allocated nonzeros=14581760
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   16 MPI processes
>>    type: mpiaij
>>    rows=2097152, cols=2097152
>>    total: nonzeros=14581760, allocated nonzeros=14581760
>>    total number of mallocs used during MatSetValues calls =0
>> Residual 2 norm 0.0230953
>> Residual infinity norm 0.000240174
>> 
>> 
>> 
>> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>> -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 
>> 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale 
>> -mg_levels_ksp_max_it 5
>> right hand side 2 norm: 512.
>> right hand side infinity norm: 0.999097
>> building operator with Dirichlet boundary conditions, global grid size: 128 
>> x 128 x 128
>> building operator with Dirichlet boundary conditions, global grid size: 16 x 
>> 16 x 16
>> building operator with Dirichlet boundary conditions, global grid size: 32 x 
>> 32 x 32
>> building operator with Dirichlet boundary conditions, global grid size: 64 x 
>> 64 x 64
>> building operator with Dirichlet boundary conditions, global grid size: 8 x 
>> 8 x 8
>>  0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 
>> 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
>>  1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 
>> 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01
>>  2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 
>> 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01
>>  3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 
>> 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01
>>  4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 
>> 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01
>>  5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 
>> 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01
>>  6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 
>> 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01
>>  7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 
>> 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01
>>  8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 
>> 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01
>>  9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 
>> 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01
>> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 
>> 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01
>> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 
>> 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01
>> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 
>> 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01
>> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 
>> 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01
>> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 
>> 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01
>> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 
>> 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01
>> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 
>> 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01
>> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 
>> 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01
>> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 
>> 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01
>> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 
>> 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01
>> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 
>> 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01
>> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 
>> 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01
>> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 
>> 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01
>> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 
>> 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01
>> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 
>> 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02
>> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 
>> 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02
>> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 
>> 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02
>> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 
>> 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02
>> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 
>> 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02
>> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 
>> 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02
>> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 
>> 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02
>> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 
>> 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02
>> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 
>> 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02
>> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 
>> 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02
>> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 
>> 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02
>> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 
>> 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02
>> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 
>> 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02
>> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 
>> 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02
>> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 
>> 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02
>> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 
>> 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02
>> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 
>> 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02
>> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 
>> 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02
>> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 
>> 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02
>> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 
>> 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02
>> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 
>> 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02
>> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 
>> 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02
>> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 
>> 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02
>> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 
>> 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02
>> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 
>> 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02
>> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 
>> 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02
>> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 
>> 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02
>> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 
>> 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02
>> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 
>> 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03
>> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 
>> 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03
>> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 
>> 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03
>> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 
>> 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03
>> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 
>> 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03
>> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 
>> 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03
>> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 
>> 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03
>> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 
>> 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03
>> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 
>> 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03
>> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 
>> 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03
>> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 
>> 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03
>> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 
>> 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03
>> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 
>> 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03
>> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 
>> 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03
>> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 
>> 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03
>> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 
>> 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03
>> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 
>> 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03
>> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 
>> 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03
>> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 
>> 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03
>> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 
>> 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03
>> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 
>> 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03
>> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 
>> 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03
>> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 
>> 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03
>> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 
>> 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03
>> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 
>> 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03
>> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 
>> 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04
>> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 
>> 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04
>> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 
>> 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04
>> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 
>> 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04
>> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 
>> 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04
>> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 
>> 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04
>> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 
>> 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04
>> KSP Object: 16 MPI processes
>>  type: cg
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 16 MPI processes
>>  type: mg
>>    MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>      Cycles per PCApply=1
>>      Not using Galerkin computed coarse grid matrices
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     16 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     16 MPI processes
>>      type: redundant
>>        Redundant preconditioner: First (color=0) of 16 PCs follows
>>        KSP Object:        (mg_coarse_redundant_)         1 MPI processes
>>          type: preonly
>>          maximum iterations=10000, initial guess is zero
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using NONE norm type for convergence test
>>        PC Object:        (mg_coarse_redundant_)         1 MPI processes
>>          type: lu
>>            LU: out-of-place factorization
>>            tolerance for zero pivot 2.22045e-14
>>            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>            matrix ordering: nd
>>            factor fill ratio given 5., needed 7.56438
>>              Factored matrix follows:
>>                Mat Object:                 1 MPI processes
>>                  type: seqaij
>>                  rows=512, cols=512
>>                  package used to perform factorization: petsc
>>                  total: nonzeros=24206, allocated nonzeros=24206
>>                  total number of mallocs used during MatSetValues calls =0
>>                    not using I-node routines
>>          linear system matrix = precond matrix:
>>          Mat Object:           1 MPI processes
>>            type: seqaij
>>            rows=512, cols=512
>>            total: nonzeros=3200, allocated nonzeros=3200
>>            total number of mallocs used during MatSetValues calls =0
>>              not using I-node routines
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=512, cols=512
>>        total: nonzeros=3200, allocated nonzeros=3200
>>        total number of mallocs used during MatSetValues calls =0
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=4096, cols=4096
>>        total: nonzeros=27136, allocated nonzeros=27136
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=32768, cols=32768
>>        total: nonzeros=223232, allocated nonzeros=223232
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object:    (mg_levels_3_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_3_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=262144, cols=262144
>>        total: nonzeros=1810432, allocated nonzeros=1810432
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object:    (mg_levels_4_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_4_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=2097152, cols=2097152
>>        total: nonzeros=14581760, allocated nonzeros=14581760
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   16 MPI processes
>>    type: mpiaij
>>    rows=2097152, cols=2097152
>>    total: nonzeros=14581760, allocated nonzeros=14581760
>>    total number of mallocs used during MatSetValues calls =0
>> Residual 2 norm 0.290082
>> Residual infinity norm 0.00192869
>> 
>> 
>> 
>> 
>> 
>> solver_test.c:
>> 
>> // modified version of ksp/ksp/examples/tutorials/ex34.c
>> // related: ksp/ksp/examples/tutorials/ex29.c
>> //          ksp/ksp/examples/tutorials/ex32.c
>> //          ksp/ksp/examples/tutorials/ex50.c
>> 
>> #include <petscdm.h>
>> #include <petscdmda.h>
>> #include <petscksp.h>
>> 
>> extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*);
>> extern PetscErrorCode ComputeRHS(KSP,Vec,void*);
>> 
>> typedef enum
>> {
>>    DIRICHLET,
>>    NEUMANN
>> } BCType;
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "main"
>> int main(int argc,char **argv)
>> {
>>    KSP            ksp;
>>    DM             da;
>>    PetscReal      norm;
>>    PetscErrorCode ierr;
>> 
>>    PetscInt    i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;
>>    PetscScalar Hx,Hy,Hz;
>>    PetscScalar ***array;
>>    Vec         x,b,r;
>>    Mat         J;
>>    const char* bcTypes[2] = { "dirichlet", "neumann" };
>>    PetscInt    bcType = (PetscInt)DIRICHLET;
>> 
>>    PetscInitialize(&argc,&argv,(char*)0,0);
>> 
>>    ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr);
>>    ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", 
>> bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr);
>>    ierr = PetscOptionsEnd();CHKERRQ(ierr);
>> 
>>    ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>    ierr = 
>> DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);
>>    ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr);
>> 
>>    ierr = KSPSetDM(ksp,da);CHKERRQ(ierr);
>> 
>>    ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr);
>>    ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr);
>>    ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>    ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr);
>>    ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr);
>>    ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr);
>>    ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr);
>>    ierr = VecDuplicate(b,&r);CHKERRQ(ierr);
>> 
>>    ierr = MatMult(J,x,r);CHKERRQ(ierr);
>>    ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr);
>>    ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr);
>>    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm 
>> %g\n",(double)norm);CHKERRQ(ierr);
>>    ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr);
>>    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm 
>> %g\n",(double)norm);CHKERRQ(ierr);
>> 
>>    ierr = VecDestroy(&r);CHKERRQ(ierr);
>>    ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
>>    ierr = DMDestroy(&da);CHKERRQ(ierr);
>>    ierr = PetscFinalize();
>>    return 0;
>> }
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "ComputeRHS"
>> PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx)
>> {
>>    PetscErrorCode ierr;
>>    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;
>>    PetscScalar    Hx,Hy,Hz;
>>    PetscScalar    ***array;
>>    DM             da;
>>    BCType bcType = *(BCType*)ctx;
>> 
>>    PetscFunctionBeginUser;
>>    ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr);
>>    ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);
>>    Hx   = 1.0 / (PetscReal)(mx);
>>    Hy   = 1.0 / (PetscReal)(my);
>>    Hz   = 1.0 / (PetscReal)(mz);
>>    ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);
>>    ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr);
>>    for (k = zs; k < zs + zm; k++)
>>    {
>>        for (j = ys; j < ys + ym; j++)
>>        {
>>            for (i = xs; i < xs + xm; i++)
>>            {
>>                PetscReal x = ((PetscReal)i + 0.5) * Hx;
>>                PetscReal y = ((PetscReal)j + 0.5) * Hy;
>>                PetscReal z = ((PetscReal)k + 0.5) * Hz;
>>                array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * 
>> PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI);
>>            }
>>        }
>>    }
>>    ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr);
>>    ierr = VecAssemblyBegin(b);CHKERRQ(ierr);
>>    ierr = VecAssemblyEnd(b);CHKERRQ(ierr);
>> 
>>    PetscReal norm;
>>    VecNorm(b, NORM_2, &norm);
>>    PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", 
>> (double)norm);
>>    VecNorm(b, NORM_INFINITY, &norm);
>>    PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", 
>> (double)norm);
>> 
>>    /* force right hand side to be consistent for singular matrix */
>>    /* note this is really a hack, normally the model would provide you with 
>> a consistent right handside */
>> 
>>    if (bcType == NEUMANN)
>>    {
>>        MatNullSpace nullspace;
>>        ierr = 
>> MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);
>>        ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr);
>>        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);
>>    }
>>    PetscFunctionReturn(0);
>> }
>> 
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "ComputeMatrix"
>> PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx)
>> {
>>    PetscErrorCode ierr;
>>    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk;
>>    PetscScalar    v[7],Hx,Hy,Hz;
>>    MatStencil     row, col[7];
>>    DM             da;
>>    BCType bcType = *(BCType*)ctx;
>> 
>>    PetscFunctionBeginUser;
>> 
>>    if (bcType == DIRICHLET)
>>        PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet 
>> boundary conditions, ");
>>    else if (bcType == NEUMANN)
>>        PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann 
>> boundary conditions, ");
>>    else
>>        SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary 
>> condition type\n");
>> 
>>    ierr    = KSPGetDM(ksp,&da);CHKERRQ(ierr);
>>    ierr    = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);
>> 
>>    PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, 
>> mz);
>> 
>>    Hx      = 1.0 / (PetscReal)(mx);
>>    Hy      = 1.0 / (PetscReal)(my);
>>    Hz      = 1.0 / (PetscReal)(mz);
>> 
>>    PetscReal Hx2 = Hx * Hx;
>>    PetscReal Hy2 = Hy * Hy;
>>    PetscReal Hz2 = Hz * Hz;
>> 
>>    PetscReal scaleX = 1.0 / Hx2;
>>    PetscReal scaleY = 1.0 / Hy2;
>>    PetscReal scaleZ = 1.0 / Hz2;
>> 
>>    ierr    = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);
>>    for (k = zs; k < zs + zm; k++)
>>    {
>>        for (j = ys; j < ys + ym; j++)
>>        {
>>            for (i = xs; i < xs + xm; i++)
>>            {
>>                row.i = i;
>>                row.j = j;
>>                row.k = k;
>>                if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 
>> || k == mz - 1)
>>                {
>>                    num = 0;
>>                    numi = 0;
>>                    numj = 0;
>>                    numk = 0;
>>                    if (k != 0)
>>                    {
>>                        v[num] = -scaleZ;
>>                        col[num].i = i;
>>                        col[num].j = j;
>>                        col[num].k = k - 1;
>>                        num++;
>>                        numk++;
>>                    }
>>                    if (j != 0)
>>                    {
>>                        v[num] = -scaleY;
>>                        col[num].i = i;
>>                        col[num].j = j - 1;
>>                        col[num].k = k;
>>                        num++;
>>                        numj++;
>>                    }
>>                    if (i != 0)
>>                    {
>>                        v[num] = -scaleX;
>>                        col[num].i = i - 1;
>>                        col[num].j = j;
>>                        col[num].k = k;
>>                        num++;
>>                        numi++;
>>                    }
>>                    if (i != mx - 1)
>>                    {
>>                        v[num] = -scaleX;
>>                        col[num].i = i + 1;
>>                        col[num].j = j;
>>                        col[num].k = k;
>>                        num++;
>>                        numi++;
>>                    }
>>                    if (j != my - 1)
>>                    {
>>                        v[num] = -scaleY;
>>                        col[num].i = i;
>>                        col[num].j = j + 1;
>>                        col[num].k = k;
>>                        num++;
>>                        numj++;
>>                    }
>>                    if (k != mz - 1)
>>                    {
>>                        v[num] = -scaleZ;
>>                        col[num].i = i;
>>                        col[num].j = j;
>>                        col[num].k = k + 1;
>>                        num++;
>>                        numk++;
>>                    }
>> 
>>                    if (bcType == NEUMANN)
>>                    {
>>                        v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) 
>> (numj) * scaleY + (PetscReal) (numi) * scaleX;
>>                    }
>>                    else if (bcType == DIRICHLET)
>>                    {
>>                        v[num] = 2.0 * (scaleX + scaleY + scaleZ);
>>                    }
>> 
>>                    col[num].i = i;
>>                    col[num].j = j;
>>                    col[num].k = k;
>>                    num++;
>>                    ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, 
>> INSERT_VALUES);
>>                    CHKERRQ(ierr);
>>                }
>>                else
>>                {
>>                    v[0] = -scaleZ;
>>                    col[0].i = i;
>>                    col[0].j = j;
>>                    col[0].k = k - 1;
>>                    v[1] = -scaleY;
>>                    col[1].i = i;
>>                    col[1].j = j - 1;
>>                    col[1].k = k;
>>                    v[2] = -scaleX;
>>                    col[2].i = i - 1;
>>                    col[2].j = j;
>>                    col[2].k = k;
>>                    v[3] = 2.0 * (scaleX + scaleY + scaleZ);
>>                    col[3].i = i;
>>                    col[3].j = j;
>>                    col[3].k = k;
>>                    v[4] = -scaleX;
>>                    col[4].i = i + 1;
>>                    col[4].j = j;
>>                    col[4].k = k;
>>                    v[5] = -scaleY;
>>                    col[5].i = i;
>>                    col[5].j = j + 1;
>>                    col[5].k = k;
>>                    v[6] = -scaleZ;
>>                    col[6].i = i;
>>                    col[6].j = j;
>>                    col[6].k = k + 1;
>>                    ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, 
>> INSERT_VALUES);
>>                    CHKERRQ(ierr);
>>                }
>>            }
>>        }
>>    }
>>    ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>    ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>    if (bcType == NEUMANN)
>>    {
>>        MatNullSpace   nullspace;
>>        ierr = 
>> MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);
>>        ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr);
>>        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);
>>    }
>>    PetscFunctionReturn(0);
>> }
>> 
>> 
>>> On Jun 22, 2017, at 9:23 AM, Matthew Knepley <[email protected]> wrote:
>>> 
>>> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley <[email protected]> 
>>> wrote:
>>> Hello,
>>> 
>>> We are attempting to use the PETSc KSP solver framework in a fluid dynamics 
>>> simulation we developed. The solution is part of a pressure projection and 
>>> solves a Poisson problem. We use a cell-centered layout with a regular grid 
>>> in 3d. We started with ex34.c from the KSP tutorials since it has the 
>>> correct calls for the 3d DMDA, uses a cell-centered layout, and states that 
>>> it works with multi-grid. We modified the operator construction function to 
>>> match the coefficients and Dirichlet boundary conditions used in our 
>>> problem (we’d also like to support Neumann but left those out for now to 
>>> keep things simple). As a result of the modified boundary conditions, our 
>>> version does not perform a null space removal on the right hand side or 
>>> operator as the original did. We also modified the right hand side to 
>>> contain a sinusoidal pattern for testing. Other than these changes, our 
>>> code is the same as the original ex34.c
>>> 
>>> With the default KSP options and using CG with the default pre-conditioner 
>>> and without a pre-conditioner, we see good convergence. However, we’d like 
>>> to accelerate the time to solution further and target larger problem sizes 
>>> (>= 1024^3) if possible. Given these objectives, multi-grid as a 
>>> pre-conditioner interests us. To understand the improvement that multi-grid 
>>> provides, we ran ex45 from the KSP tutorials. ex34 with CG and no 
>>> pre-conditioner appears to converge in a single iteration and we wanted to 
>>> compare against a problem that has similar convergence patterns to our 
>>> problem. Here’s the tests we ran with ex45:
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129
>>>        time in KSPSolve(): 7.0178e+00
>>>        solver iterations: 157
>>>        KSP final norm of residual: 3.16874e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type 
>>> cg -pc_type none
>>>        time in KSPSolve(): 4.1072e+00
>>>        solver iterations: 213
>>>        KSP final norm of residual: 0.000138866
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type 
>>> cg
>>>        time in KSPSolve(): 3.3962e+00
>>>        solver iterations: 88
>>>        KSP final norm of residual: 6.46242e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type 
>>> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 
>>> -mg_levels_pc_type bjacobi
>>>        time in KSPSolve(): 1.3201e+00
>>>        solver iterations: 4
>>>        KSP final norm of residual: 8.13339e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type 
>>> mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 
>>> -mg_levels_pc_type bjacobi -ksp_type cg
>>>        time in KSPSolve(): 1.3008e+00
>>>        solver iterations: 4
>>>        KSP final norm of residual: 2.21474e-05
>>> 
>>> We found the multi-grid pre-conditioner options in the KSP tutorials 
>>> makefile. These results make sense; both the default GMRES and CG solvers 
>>> converge and CG without a pre-conditioner takes more iterations. The 
>>> multi-grid pre-conditioned runs are pretty dramatically accelerated and 
>>> require only a handful of iterations.
>>> 
>>> We ran our code (modified ex34.c as described above) with the same 
>>> parameters:
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
>>>        time in KSPSolve(): 5.3729e+00
>>>        solver iterations: 123
>>>        KSP final norm of residual: 0.00595066
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>>> -ksp_type cg -pc_type none
>>>        time in KSPSolve(): 3.6154e+00
>>>        solver iterations: 188
>>>        KSP final norm of residual: 0.00505943
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>>> -ksp_type cg
>>>        time in KSPSolve(): 3.5661e+00
>>>        solver iterations: 98
>>>        KSP final norm of residual: 0.00967462
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson 
>>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi
>>>        time in KSPSolve(): 4.5606e+00
>>>        solver iterations: 44
>>>        KSP final norm of residual: 949.553
>>> 
>>> 1) Dave is right
>>> 
>>> 2) In order to see how many iterates to expect, first try using algebraic 
>>> multigrid
>>> 
>>>  -pc_type gamg
>>> 
>>> This should work out of the box for Poisson
>>> 
>>> 3) For questions like this, we really need to see
>>> 
>>>  -ksp_view -ksp_monitor_true_residual
>>> 
>>> 4) It sounds like you smoother is not strong enough. You could try
>>> 
>>>  -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale 
>>> -mg_levels_ksp_max_it 5
>>> 
>>> or maybe GMRES until it works.
>>> 
>>> Thanks,
>>> 
>>>    Matt
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 
>>> -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson 
>>> -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg
>>>        time in KSPSolve(): 1.5481e+01
>>>        solver iterations: 198
>>>        KSP final norm of residual: 0.916558
>>> 
>>> We performed all tests with petsc-3.7.6.
>>> 
>>> The trends with CG and GMRES seem consistent with the results from ex45. 
>>> However, with multi-grid, something doesn’t seem right. Convergence seems 
>>> poor and the solves run for many more iterations than ex45 with multi-grid 
>>> as a pre-conditioner. I extensively validated the code that builds the 
>>> matrix and also confirmed that the solution produced by CG, when evaluated 
>>> with the system of equations elsewhere in our simulation, produces the same 
>>> residual as indicated by PETSc. Given that we only made minimal 
>>> modifications to the original example code, it seems likely that the 
>>> operators constructed for the multi-grid levels are ok.
>>> 
>>> We also tried a variety of other suggested parameters for the multi-grid 
>>> pre-conditioner as suggested in related mailing list posts but we didn’t 
>>> observe any significant improvements over the results above.
>>> 
>>> Is there anything we can do to check the validity of the coefficient 
>>> matrices built for the different multi-grid levels? Does it look like there 
>>> could be problems there? Or any other suggestions to achieve better results 
>>> with multi-grid? I have the -log_view, -ksp_view, and convergence monitor 
>>> output from the above tests and can post any of it if it would assist.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> http://www.caam.rice.edu/~mk51/
>> 
> 

Reply via email to