[petsc-users] PETSc parallel scalability

Jinlei Shen Wed, 07 Sep 2016 18:38:02 -0700

Hi,

I am trying to test the parallel scalablity of iterative solver (CG with
BJacobi preconditioner) in PETSc.


Since the iteration number increases with more processors, I calculated the
single iteration time by dividing the total KSPSolve time by number of
iteration in this test.

The linear system I'm solving has 315342 unknowns. Only KSPSolve cost is
analyzed.

The results show that the parallelism works well with small number of
processes (less than 32 in my case), and is almost perfect parallel within
first 10 processors.

However, the effect of parallelization degrades if I use more processors.
The wired thing is that with more than 100 processors, the single iteration
cost is slightly increasing.

To investigate this issue, I then looked into the composition of KSPSolve
time.
It seems KSPSolve consists of MatMult, VecTDot(min),VecNorm(min),
VecAXPY(max),VecAXPX(max),ApplyPC. Please correct me if I'm wrong.

And I found for small number of processors, all these components scale
well.
However, using more processors(roughly larger than 40), MatMult,
VecTDot(min),VecNorm(min) behaves worse, and even increasing after 100
processors, while the other three parts parallel well even for 1000
processors.
Since MatMult composed major cost in single iteration, the total single
iteration cost increases as well.(See the below figure).

My question:
1. Is such situation reasonable? Could anyone explain why MatMult scales
poor after certain number of processors? I heard some about different
algorithms for matrix multiplication. Is that the bottleneck?

2. Is the parallelism dependent of matrix size? If I use larger system
size,e.g. million , can the solver scale better?

3. Do you have any idea to improve the parallel performance?

Thank you very much.

JInlei

[image: Inline image 1]

[petsc-users] PETSc parallel scalability

Reply via email to