Hi, a performance question,

I have a distributed stencil loop that's sending several tens of slightly larger messages every iteration, I post double buffered receives at initialization and immediately after a receive request is completed. I can therefore prove that the receive is posted on the other side of a send before it is sent, and would like to use the ready send mode to be able to shave off the overhead of rendezvous. Some other (setup) parts of the program use synchronous sends that I can't prove this for.

My question is: is ready send mode "supported", that is to say does it take advantage of the fact that I've proved it can be used and performs an eager send every time? Or does this depend on the underlying  component?

Follow-up question: if ready send mode can't force an eager protocol, how could I do that? And can I verify which protocol is being used somehow?

We're using Open MPI 4.0.3 and UCX for GPUDirect RDMA communication on Mellanox Infiniband.

Best,
Oskar

oskar.la...@abo.fi

Reply via email to