First please see 
https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined 
<https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined> and verify 
that the MPI you are using satisfies the requirements and you have appropriate 
MPI environmental variables set (if needed). 


  Then please add a stage around the actual computation to get a more useful 
summary. 

  Organize your code like so

  ...
  KSPSetUp()
  PetscLogStagePush(a stage you created)
  KSPSolve()
  PetscLogStagePop()
  ...

  It is unclear where much of the time of your code is being spent, by adding 
the stage we'll have a clear picture of the time in the actual solver. There 
are examples of using PetscLogStagePush() in the source.

  With the new -log_view files you generate with these two changes we can get a 
handle on where the time is being spent and why the pipelining is or is not 
helping.

  Barry

> On Feb 17, 2021, at 8:31 PM, 赵刚 <[email protected]> wrote:
> 
> Dear Barry,
> 
> Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per 
> node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm 
> puzzled and haven't figured out why. Put the performance output into 
> attachment, please check it.
> 
> 
> 
> Thanks,
> Gang
> 
> 
> &gt; -----原始邮件-----
> &gt; 发件人: "Barry Smith" <[email protected]>
> &gt; 发送时间: 2021-02-18 09:17:17 (星期四)
> &gt; 收件人: "赵刚" <[email protected]>
> &gt; 抄送: PETSc <[email protected]>
> &gt; 主题: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
> &gt; 
> &gt; 
> &gt; 
> &gt; &gt; On Feb 17, 2021, at 6:47 PM, 赵刚 <[email protected]> wrote:
> &gt; &gt; 
> &gt; &gt; Dear PETSc team,
> &gt; &gt; 
> &gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG 
> (-ksp_type groppcg), it is expected that this iterative method with pipelined 
> has advantages over traditional CG in the case of multiple processes. So I'd 
> like to ask for Poisson problem, how many computing nodes do I need to show 
> the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is 
> used)?
> &gt; &gt; 
> &gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at 
> most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage 
> over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 
> 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few 
> computing nodes.
> &gt; 
> &gt;   900 cores (assuming they are not memory bandwidth bound) might be 
> enough to see some differences but the differences are likely so small 
> compared to other parallel issues that affect performance that you see no 
> consistently measurable difference.
> &gt; 
> &gt;    Run with -log_view three cases, no pipeline and the two pipelines and 
> send the output. By studying where the time is spent in the different regions 
> of the code with this output one may be able to say something about the 
> pipeline affect.
> &gt; 
> &gt;   Barry
> &gt; 
> &gt; 
> &gt; &gt; 
> &gt; &gt; Because I am calling PETSc via other numerical software, if need, I 
> would mail related performance information to you by using command line 
> options suggested by PETSc. Thank you.
> &gt; &gt; 
> &gt; &gt; 
> &gt; &gt; Thanks,
> &gt; &gt; Gang
> </[email protected]></[email protected]></[email protected]></[email protected]><cg.out><groppcg.out><pipecg.out>

Reply via email to