Chao Yang <[email protected]> writes: > The pipelined CG (or Gropp's CG) recently implemented in PETSc is very > attractive since it has the ability of hiding the collective > communication in vector dot product by overlapping it with the > application of preconditioner and/or SpMV. > > However, there is an issue that may seriously degrade the > performance. In the pipelined CG, the asynchronous MPI_Iallreduce is > called before the application of preconditioner and/or SpMV, and then > ended by MPI_Wait. In the application of preconditioner and/or SpMV, > communication may also be required (such as halo updating), which I > find is often slowed down by the unfinished MPI_Iallreduce in the > background. > > As far as I know, the current MPI doesn't provide prioritized > communication.
No, and there is not much interest in adding it because it adds complication and tends to create starvation situations in which raising the priority actually makes it slower. > Therefore, it's highly possible that the performance of the pipelined > CG may be even worse than a classic one due to the slowdown of > preconditioner and SpMV. Is there a way to avoid this? This is an MPI quality-of-implementation issue and there isn't much we can do about it. There may be MPI tuning parameters that can help, but the nature of these methods is that in exchange for creating latency-tolerance in the reduction, it now overlaps the neighbor communication in MatMult/PCApply.
pgp40zyC2IS2X.pgp
Description: PGP signature
