Title: RFC: sm Latency
RFC: sm Latency
WHAT: Introducing optimizations to reduce ping-pong
latencies over the sm BTL.
WHY: This is a visible benchmark of MPI performance.
We can improve shared-memory latencies from 30% (if hardware
latency is the limiting factor) to 2× or more (if MPI
s
First, the performance improvements look really nice.
A few questions:
- How much of an abstraction violation does this introduce ? This looks
like the btl needs to start “knowing” about MPI level semantics. Currently,
the btl purposefully is ulp agnostic. I ask for 2 reasons
- you ment