Hi, a performance question,
I have a distributed stencil loop that's sending several tens of
slightly larger messages every iteration, I post double buffered
receives at initialization and immediately after a receive request is
completed.
I can therefore prove that the receive is posted on the other side of a
send before it is sent, and would like to use the ready send mode to be
able to shave off the overhead of rendezvous.
Some other (setup) parts of the program use synchronous sends that I
can't prove this for.
My question is: is ready send mode "supported", that is to say does it
take advantage of the fact that I've proved it can be used and performs
an eager send every time? Or does this depend on the underlying component?
Follow-up question: if ready send mode can't force an eager protocol,
how could I do that? And can I verify which protocol is being used somehow?
We're using Open MPI 4.0.3 and UCX for GPUDirect RDMA communication on
Mellanox Infiniband.
Best,
Oskar
oskar.la...@abo.fi