On 21 Jul 2014, at 18:29, Jed Brown <j...@jedbrown.org> wrote:

> Lawrence Mitchell <lawrence.mitch...@imperial.ac.uk> writes:
> 
>> Below I show output from a run on 1 process and then two (along with 
>> ksp_view) for the following options:
>> 
>> -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type 
>> sor -ksp_monitor
>> 
>> On 1 process:
>>  0 KSP Residual norm 5.865090856053e+02 
>>  1 KSP Residual norm 1.293159126247e+01 
>>  2 KSP Residual norm 5.181199296299e-01 
>>  3 KSP Residual norm 1.268870802643e-02 
>>  4 KSP Residual norm 5.116058930806e-04 
>>  5 KSP Residual norm 3.735036960550e-05 
>>  6 KSP Residual norm 1.755288530515e-06 
>> KSP Object: 1 MPI processes
>>  type: gmres
>>    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>    GMRES: happy breakdown tolerance 1e-30
>>  maximum iterations=6, initial guess is zero
>>  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>  type: mg
>>    MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>      Cycles per PCApply=1
>>      Not using Galerkin computed coarse grid matrices
> 
> How are you sure the rediscretized matrices are correct in parallel?

I computed the leading few (10 or so) largest and smallest eigenvalues of the 
operators on each level, which agree in serial and parallel, so I'm reasonably 
happy that I'm solving the same problem.

> I would stick with the redundant coarse solve and use
> 
>  -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi 
> -ksp_monitor_true_residual

Chebyshev + jacobi appears not to be an effective smoother at all (in serial 
and parallel).  For example, for a two-level cycle:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 2 -pc_mg_levels 2 
-ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type 
jacobi -mg_levels_ksp_max_it 3 -mg_levels_ksp_monitor_true_residual

    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 1.362115349221e+02 true resid norm 1.602648950381e+02 
||r(i)||/||b|| 5.718551585231e-01
    1 KSP none resid norm 3.635392745636e+01 true resid norm 8.483271949491e+01 
||r(i)||/||b|| 3.026990298979e-01
    2 KSP none resid norm 2.480718743635e+01 true resid norm 5.297234693113e+01 
||r(i)||/||b|| 1.890152540545e-01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 8.050829954680e+00 true resid norm 2.187812818821e+01 
||r(i)||/||b|| 7.806525852269e-02
    1 KSP none resid norm 9.600041408511e+00 true resid norm 3.264655957783e+01 
||r(i)||/||b|| 1.164890383398e-01
    2 KSP none resid norm 2.246360204969e+01 true resid norm 7.338512212979e+01 
||r(i)||/||b|| 2.618518586918e-01
  0 KSP preconditioned resid norm 5.699274467568e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 3.268903240705e-01 true resid norm 7.956741597771e-01 
||r(i)||/||b|| 1.322008721235e+00
    1 KSP none resid norm 6.842996420984e-01 true resid norm 2.304432657016e+00 
||r(i)||/||b|| 3.828803578247e+00
    2 KSP none resid norm 1.941611552825e+00 true resid norm 6.611568865604e+00 
||r(i)||/||b|| 1.098508930317e+01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 1.063939051808e+01 true resid norm 3.664397107500e+01 
||r(i)||/||b|| 6.088377855000e+01
    1 KSP none resid norm 3.311047978681e+01 true resid norm 1.157817230412e+02 
||r(i)||/||b|| 1.923707660218e+02
    2 KSP none resid norm 9.601250167498e+01 true resid norm 3.374117934071e+02 
||r(i)||/||b|| 5.606080429414e+02
  1 KSP preconditioned resid norm 5.693881869205e+02 true resid norm 
2.803424438322e+02 ||r(i)||/||b|| 1.000314339708e+00
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 4.296484723989e+00 true resid norm 1.520508908495e+01 
||r(i)||/||b|| 2.232562919535e+00
    1 KSP none resid norm 1.365259124519e+01 true resid norm 4.847930361817e+01 
||r(i)||/||b|| 7.118215159284e+00
    2 KSP none resid norm 4.000954569964e+01 true resid norm 1.424890347852e+02 
||r(i)||/||b|| 2.092166206488e+01
    Residual norms for mg_levels_1_ solve.
    0 KSP none resid norm 2.315970399809e+02 true resid norm 8.307550866976e+02 
||r(i)||/||b|| 1.219797523982e+02
    1 KSP none resid norm 7.371848722680e+02 true resid norm 2.669017652749e+03 
||r(i)||/||b|| 3.918918073954e+02
    2 KSP none resid norm 2.174699749867e+03 true resid norm 7.893909006175e+03 
||r(i)||/||b|| 1.159062498016e+03
  2 KSP preconditioned resid norm 5.682205494231e+02 true resid norm 
2.798247056626e+02 ||r(i)||/||b|| 9.984669529615e-01


> Use of Jacobi here is to make the smoother the same in parallel as
> serial.

If I run the above in parallel I get the same behaviour, (I guess as expected).

>  (Usually SOR is a bit stronger, though I think the Cheby/SOR
> combination is somewhat peculiar and usually overkill.)

So I have noticed that I only see this problem of poor convergence on some 
meshes/decompositions.  It also goes away if I apply more than one SOR sweep at 
each level.

For example:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 6      -pc_mg_levels 2 
-ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type 
sor  -mg_levels_pc_sor_omega 1 -mg_levels_pc_sor_its 2

produces on 1 process:
  0 KSP preconditioned resid norm 5.883693224294e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.309534073571e+01 true resid norm 
6.584338590739e+00 ||r(i)||/||b|| 2.349415314990e-02
  2 KSP preconditioned resid norm 1.910365687382e-01 true resid norm 
1.577734674543e-01 ||r(i)||/||b|| 5.629652783310e-04
  3 KSP preconditioned resid norm 3.277350963687e-03 true resid norm 
4.094543394403e-03 ||r(i)||/||b|| 1.461009762200e-05
  4 KSP preconditioned resid norm 7.348080207899e-05 true resid norm 
1.069159047866e-04 ||r(i)||/||b|| 3.814959705670e-07
  5 KSP preconditioned resid norm 2.830825689894e-06 true resid norm 
4.561049005693e-06 ||r(i)||/||b|| 1.627467700623e-08
  6 KSP preconditioned resid norm 4.363171978244e-08 true resid norm 
7.242601519032e-08 ||r(i)||/||b|| 2.584295855185e-10

and on 2:
  0 KSP preconditioned resid norm 5.836547633061e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.096256903154e+01 true resid norm 
6.241073205209e+00 ||r(i)||/||b|| 2.226931797055e-02
  2 KSP preconditioned resid norm 1.417636317296e-01 true resid norm 
2.236535290250e-01 ||r(i)||/||b|| 7.980376754651e-04
  3 KSP preconditioned resid norm 7.600523523911e-03 true resid norm 
9.340981660469e-03 ||r(i)||/||b|| 3.333037186303e-05
  4 KSP preconditioned resid norm 2.109594208660e-04 true resid norm 
3.284856206888e-04 ||r(i)||/||b|| 1.172098210571e-06
  5 KSP preconditioned resid norm 4.640884789807e-06 true resid norm 
1.425912597007e-05 ||r(i)||/||b|| 5.087923178735e-08
  6 KSP preconditioned resid norm 2.250186110144e-07 true resid norm 
5.621190025947e-07 ||r(i)||/||b|| 2.005745870056e-09


If I drop the number of sor its to 1, I reduce the residual by 10^8 in serial 
but 10^3 in parallel, but only on two processes, on 3 and more, I see 
reductions of around 10^8 as well.

> Compare convergence with and without -pc_mg_galerkin.

This was a little tricky, since I only have the action of the interpolation and 
restriction matrices.  I coded up a "default" MatMatMatMult using a shell that 
can compute the appropriate matrix-vector multiply.  Since I then don't have an 
explicit operator on each level, I restricted to a two-level method (where the 
fine grid operator is assembled).

I then run with:

-pc_type mg  -ksp_rtol 1e-10 -ksp_max_it 6 -pc_mg_levels 2 
-ksp_monitor_true_residual  -mg_levels_ksp_type chebyshev  -mg_levels_pc_type 
sor  -mg_levels_pc_sor_omega 1 -mg_levels_pc_sor_its 1 -pc_mg_galerkin 
-mg_coarse_ksp_type cg -mg_coarse_pc_type none -mg_coarse_ksp_max_it 20

On one process I then have:

  0 KSP preconditioned resid norm 5.658166234044e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 4.380224616829e+00 true resid norm 
5.942695957168e+00 ||r(i)||/||b|| 2.120465207202e-02
  2 KSP preconditioned resid norm 1.076966810659e-01 true resid norm 
2.620990648687e-01 ||r(i)||/||b|| 9.352185471038e-04
  3 KSP preconditioned resid norm 1.067766492637e-03 true resid norm 
6.007393822966e-03 ||r(i)||/||b|| 2.143550617324e-05
  4 KSP preconditioned resid norm 3.721153649133e-05 true resid norm 
4.831899242645e-03 ||r(i)||/||b|| 1.724112137417e-05
  5 KSP preconditioned resid norm 9.011620703273e-07 true resid norm 
4.821878295079e-03 ||r(i)||/||b|| 1.720536475662e-05
  6 KSP preconditioned resid norm 2.069209381617e-08 true resid norm 
4.821642641207e-03 ||r(i)||/||b|| 1.720452389947e-05


On two:

  0 KSP preconditioned resid norm 5.662903120832e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 7.643743888525e+00 true resid norm 
1.378172146897e+01 ||r(i)||/||b|| 4.917576312322e-02
  2 KSP preconditioned resid norm 6.976625171734e+00 true resid norm 
1.228350334724e+01 ||r(i)||/||b|| 4.382984029151e-02
  3 KSP preconditioned resid norm 6.830161637428e+00 true resid norm 
1.239530026533e+01 ||r(i)||/||b|| 4.422875263161e-02
  4 KSP preconditioned resid norm 1.043365955762e+00 true resid norm 
7.550284766088e+00 ||r(i)||/||b|| 2.694082999761e-02
  5 KSP preconditioned resid norm 7.461488225071e-01 true resid norm 
5.937639295074e+00 ||r(i)||/||b|| 2.118660895470e-02
  6 KSP preconditioned resid norm 7.370072731475e-01 true resid norm 
5.668303344690e+00 ||r(i)||/||b|| 2.022556784482e-02


while on three I go back to decent convergence again:

  0 KSP preconditioned resid norm 5.692627905911e+02 true resid norm 
2.802543487620e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 5.768424422846e+00 true resid norm 
8.666039768542e+00 ||r(i)||/||b|| 3.092205279534e-02
  2 KSP preconditioned resid norm 1.308480914038e-01 true resid norm 
1.498332207355e-01 ||r(i)||/||b|| 5.346329910577e-04
  3 KSP preconditioned resid norm 1.029363320021e-03 true resid norm 
3.954753776706e-03 ||r(i)||/||b|| 1.411130208747e-05
  4 KSP preconditioned resid norm 2.495903055408e-05 true resid norm 
6.469448501163e-04 ||r(i)||/||b|| 2.308420379466e-06
  5 KSP preconditioned resid norm 8.019626928214e-07 true resid norm 
6.361612256862e-04 ||r(i)||/||b|| 2.269942388036e-06
  6 KSP preconditioned resid norm 1.669414544567e-08 true resid norm 
6.360133846808e-04 ||r(i)||/||b|| 2.269414863642e-06

So I'm sort of none-the-wiser.  I'm a little bit at a loss as to why this 
occurs, but either switching to Richardson+SOR or Cheby/SOR with more that one 
SOR sweep appears to fix the problems, so I might just punt for now.

Cheers,

Lawrence

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to