Hi,
>> When using good preconditioners, spMV is essentially never the
bottleneck and hence I don't think a separate communication thread
should be implemented in PETSc. Instead, such a fallback should be part
of a good MPI implementation.
SpMV is an important part of most of those scalable preconditioners. In
multigrid, those are grid transfer operators, residuals, and Chebyshev
or Krylov-accelerated smoothers.
From the context I was referring to SpMV for a 'full system matrix' as
part of the outer Krylov solver as it was the topic in the paper (cf.
third paragraph). In preconditioners you may use different storage
formats, not communicate across nodes, etc.
Best regards,
Karli