Jed, Thanks ... aren't you sleep in the night? ;-)
Chao > Chao Yang <[email protected]> writes: > >> The pipelined CG (or Gropp's CG) recently implemented in PETSc is very >> attractive since it has the ability of hiding the collective >> communication in vector dot product by overlapping it with the >> application of preconditioner and/or SpMV. >> >> However, there is an issue that may seriously degrade the >> performance. In the pipelined CG, the asynchronous MPI_Iallreduce is >> called before the application of preconditioner and/or SpMV, and then >> ended by MPI_Wait. In the application of preconditioner and/or SpMV, >> communication may also be required (such as halo updating), which I >> find is often slowed down by the unfinished MPI_Iallreduce in the >> background. >> >> As far as I know, the current MPI doesn't provide prioritized >> communication. > > No, and there is not much interest in adding it because it adds > complication and tends to create starvation situations in which raising > the priority actually makes it slower. > >> Therefore, it's highly possible that the performance of the pipelined >> CG may be even worse than a classic one due to the slowdown of >> preconditioner and SpMV. Is there a way to avoid this? > > This is an MPI quality-of-implementation issue and there isn't much we > can do about it. There may be MPI tuning parameters that can help, but > the nature of these methods is that in exchange for creating > latency-tolerance in the reduction, it now overlaps the neighbor > communication in MatMult/PCApply.
