Hi,

The pipelined CG (or Gropp's CG) recently implemented in PETSc is very 
attractive since it has the ability of hiding the collective communication in 
vector dot product by overlapping it with the application of preconditioner 
and/or SpMV. 

However, there is an issue that may seriously degrade the performance. In the 
pipelined CG, the asynchronous MPI_Iallreduce is called before the application 
of preconditioner and/or SpMV, and then ended by MPI_Wait. In the application 
of preconditioner and/or SpMV, communication may also be required (such as halo 
updating), which I find is often slowed down by the unfinished MPI_Iallreduce 
in the background. 

As far as I know, the current MPI doesn't provide prioritized communication. 
Therefore, it's highly possible that the performance of the pipelined CG may be 
even worse than a classic one due to the slowdown of preconditioner and SpMV. 
Is there a way to avoid this?

Any suggestion would be high appreciated. Thanks in advance!

Best wishes,
Chao

Reply via email to