Hi all,

(...)Since it looks like MPI endpoints are going to be a long time (or possibly forever) in coming, I think we need (a) stopgap plan(s) to support this crappy MPI + OpenMP model in the meantime. One possible approach is to do what Mark is trying with to do with MKL: Use a third party library that provides optimized OpenMP implementations of computationally expensive kernels. It might make sense to also consider using Karl's ViennaCL library in this manner, which we already use to support GPUs, but which I believe (Karl, please let me know if I am off-base here) we could also use to provide OpenMP-ized linear algebra operations on CPUs as well. Such approaches won't use threads for lots of the things that a PETSc code will do, but might be able to provide decent resource utilization for the most expensive parts for some codes.

A lot of tweaks for making GPUs run well immediately translate to making OpenMP code run well. At least in theory. In practice I've observed just the same issues as we've seen in the past: If we run with MPI+OpenMP instead of just plain MPI, performance is less reproducible, lower on average, etc.

Still, I think that injecting OpenMP kernels via a third-party library is probably the "best" way of offering OpenMP:
 - it keeps the PETSc code base clean
 - tuning the OpenMP-kernels is now somebody else's problem
 - it helps with providing GPU support, because plugin interfaces improve

Yes, "OpenMP to help GPU support" and vice versa feels like "running even faster in the wrong direction". At the same time, however, we have to acknowledge that nobody will listen to our opinions/experiences/facts if we don't offer something that works OK (not necessarily great) with whatever they start with - too often MPI+OpenMP.

Best regards,
Karli

Reply via email to