Re: [petsc-users] An issue about pipelined CG and Gropp's CG

Barry Smith Wed, 17 Feb 2021 21:10:00 -0800

  Here are the important operations from the -log_view (use a fixed sized font 
for easy reading).


No pipeline

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             
 --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

MatMult             5398 1.0 9.4707e+0012.6 1.05e+09 1.1 3.6e+07 6.9e+02 
0.0e+00  3 52100100  0  10 52100100  0 124335
VecTDot            10796 1.0 1.4993e+01 8.3 3.23e+08 1.1 0.0e+00 0.0e+00 
1.1e+04 16 16  0  0 67  55 16  0  0 67 24172
VecNorm             5399 1.0 6.2343e+00 4.4 1.61e+08 1.1 0.0e+00 0.0e+00 
5.4e+03 10  8  0  0 33  33  8  0  0 33 29073
VecAXPY            10796 1.0 1.1721e-01 1.4 3.23e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 16  0  0  0   1 16  0  0  0 3092074
VecAYPX             5397 1.0 5.4340e-02 1.4 1.61e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0  8  0  0  0   0  8  0  0  0 3334231
VecScatterBegin     5398 1.0 5.4152e-02 3.3 0.00e+00 0.0 3.6e+07 6.9e+02 
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5398 1.0 8.6881e+00489.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   6  0  0  0  0     0
KSPSolve               1 1.0 1.7389e+01 1.0 2.02e+09 1.1 3.6e+07 6.9e+02 
1.6e+04 29100100100100 100100100100100 130242

Gropp pipeline

MatMult             5399 1.0 9.5593e+0011.7 1.05e+09 1.1 3.6e+07 6.9e+02 
0.0e+00  3 45100100  0   7 45100100  0 123207
VecNorm                1 1.0 8.8549e-0417.4 2.99e+04 1.1 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  4   0  0  0  0 20 37912
VecAXPY            16194 1.0 1.6522e-01 1.4 4.84e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 21  0  0  0   0 21  0  0  0 3290407
VecAYPX            10794 1.0 1.9903e-01 1.5 3.23e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 14  0  0  0   1 14  0  0  0 1820606
VecScatterBegin     5399 1.0 6.2281e-02 3.6 0.00e+00 0.0 3.6e+07 6.9e+02 
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5399 1.0 8.7194e+00380.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   4  0  0  0  0     0
VecReduceArith     16195 1.0 2.2674e-01 3.7 4.84e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 21  0  0  0   0 21  0  0  0 2397678
VecReduceBegin     10797 1.0 3.4089e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd       10797 1.0 2.6197e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 37  0  0  0  0  91  0  0  0  0     0
SFBcastOpBegin      5399 1.0 6.0051e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 
0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5399 1.0 8.7167e+00440.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   4  0  0  0  0     0
KSPSolve               1 1.0 2.7477e+01 1.0 2.34e+09 1.1 3.6e+07 6.9e+02 
1.0e+00 41100100100  4 100100100100 20 95623

pipeline cg

MatMult             5400 1.0 1.5915e+00 1.8 1.05e+09 1.1 3.6e+07 6.9e+02 
0.0e+00  2 37100100  0   6 37100100  0 740161
VecAXPY            21592 1.0 2.3194e-01 1.4 6.45e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 23  0  0  0   1 23  0  0  0 3125164
VecAYPX            21588 1.0 5.5059e-01 1.7 6.45e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  1 23  0  0  0   2 23  0  0  0 1316272
VecScatterBegin     5400 1.0 7.0132e-02 3.7 0.00e+00 0.0 3.6e+07 6.9e+02 
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5400 1.0 6.5329e-0122.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecReduceArith     16197 1.0 3.1135e-01 4.7 4.84e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0 17  0  0  0   1 17  0  0  0 1746339
VecReduceBegin      5400 1.0 3.1471e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd        5400 1.0 1.7226e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 28  0  0  0  0  90  0  0  0  0     0
SFBcastOpBegin      5400 1.0 6.6228e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 
0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5400 1.0 6.5000e-0124.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSolve               1 1.0 1.8893e+01 1.0 2.82e+09 1.1 3.6e+07 6.9e+02 
0.0e+00 32100100100  0 100100100100  0 167860

With pipelined methods the TDot and Vec norm are replaced with VecReduceArith, 
VecReduceBegin, and VecReduceEnd. The important numbers are 
the %T in the stage. 

In particular look at VecTDot and VecNorm and compare to VecReduceEnd in the 
pipeline methods. Note that both pipelined methods, especially the gropp method 
spend an enormous time in VecReduceEnd and hence end up taking more time than 
the non-pipelined method. So basically any advantage the pipeline methods may 
have is lost waiting for the previous reduction operation to arrive. I do not 
know why, if it is the MPI implementation or something else. 

If you are serious about understanding pipeline methods for Krylov methods you 
will need to dig deep into the details of the machine hardware and MPI 
software. It is not a trivial subject with easy answers. I would say that the 
PETSc community are not experts on the topic, you will need to read in detail 
the publications on pipelined methods and consult with the authors on 
technical, machine specific details. There is a difference between the academic 
"pipelining as a theoretical construct"  and actually dramatic improvement on 
real machines while using pipelining. One small implementation detail can 
dramatically change performance so theoretical papers alone are not the 
complete story.


  Barry







------------------------------------------------------------------------------------------------------------------------

> On Feb 17, 2021, at 10:31 PM, 赵刚 <[email protected]> wrote:
> 
> Dear Barry,
> 
> 
> 
> Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add 
> PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using 
> other numerical software and have called PETSc only when solving linear 
> system through PETSc interface supported by the software, but I'm not sure if 
> I have added it correctly. I put the result and info into attachment, please 
> check it.
> 
> 
> 
> 
> 
> Thanks,
> 
> Gang
> 
> 
> 
> -----原始邮件-----
> 发件人:"Barry Smith" <[email protected]>
> 发送时间:2021-02-18 10:52:11 (星期四)
> 收件人: "赵刚" <[email protected]>
> 抄送: PETSc <[email protected]>
> 主题: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
> 
> 
>   First please see 
> https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined 
> <https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined> and verify 
> that the MPI you are using satisfies the requirements and you have 
> appropriate MPI environmental variables set (if needed). 
> 
> 
>   Then please add a stage around the actual computation to get a more useful 
> summary. 
> 
>   Organize your code like so
> 
>   ...
>   KSPSetUp()
>   PetscLogStagePush(a stage you created)
>   KSPSolve()
>   PetscLogStagePop()
>   ...
> 
>   It is unclear where much of the time of your code is being spent, by adding 
> the stage we'll have a clear picture of the time in the actual solver. There 
> are examples of using PetscLogStagePush() in the source.
> 
>   With the new -log_view files you generate with these two changes we can get 
> a handle on where the time is being spent and why the pipelining is or is not 
> helping.
> 
>   Barry
> 
>> On Feb 17, 2021, at 8:31 PM, 赵刚 <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Dear Barry,
>> 
>> Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per 
>> node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm 
>> puzzled and haven't figured out why. Put the performance output into 
>> attachment, please check it.
>> 
>> 
>> 
>> Thanks,
>> Gang
>> 
>> 
>> &gt; -----原始邮件-----
>> &gt; 发件人: "Barry Smith" <[email protected] <mailto:[email protected]>>
>> &gt; 发送时间: 2021-02-18 09:17:17 (星期四)
>> &gt; 收件人: "赵刚" <[email protected] <mailto:[email protected]>>
>> &gt; 抄送: PETSc <[email protected] <mailto:[email protected]>>
>> &gt; 主题: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
>> &gt; 
>> &gt; 
>> &gt; 
>> &gt; &gt; On Feb 17, 2021, at 6:47 PM, 赵刚 <[email protected] 
>> <mailto:[email protected]>> wrote:
>> &gt; &gt; 
>> &gt; &gt; Dear PETSc team,
>> &gt; &gt; 
>> &gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG 
>> (-ksp_type groppcg), it is expected that this iterative method with 
>> pipelined has advantages over traditional CG in the case of multiple 
>> processes. So I'd like to ask for Poisson problem, how many computing nodes 
>> do I need to show the advantages of pipelined CG or Gropp's CG over CG (No 
>> preconditioner is used)?
>> &gt; &gt; 
>> &gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at 
>> most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage 
>> over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in 
>> [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too 
>> few computing nodes.
>> &gt; 
>> &gt;   900 cores (assuming they are not memory bandwidth bound) might be 
>> enough to see some differences but the differences are likely so small 
>> compared to other parallel issues that affect performance that you see no 
>> consistently measurable difference.
>> &gt; 
>> &gt;    Run with -log_view three cases, no pipeline and the two pipelines 
>> and send the output. By studying where the time is spent in the different 
>> regions of the code with this output one may be able to say something about 
>> the pipeline affect.
>> &gt; 
>> &gt;   Barry
>> &gt; 
>> &gt; 
>> &gt; &gt; 
>> &gt; &gt; Because I am calling PETSc via other numerical software, if need, 
>> I would mail related performance information to you by using command line 
>> options suggested by PETSc. Thank you.
>> &gt; &gt; 
>> &gt; &gt; 
>> &gt; &gt; Thanks,
>> &gt; &gt; Gang
>> </[email protected] 
>> <mailto:[email protected]>></[email protected] 
>> <mailto:[email protected]>></[email protected] 
>> <mailto:[email protected]>></[email protected] 
>> <mailto:[email protected]>><cg.out><groppcg.out><pipecg.out>
> 
> <cg.out><groppcg.out><pipecg.out>

Re: [petsc-users] An issue about pipelined CG and Gropp's CG

Reply via email to