Hi, guys. As I understand, to send short MPI messages, OpenMPI copies the messages to preallocated buffer and then uses RDMA.
I was wondering if we can avoid the overhead of memory copy. If the user buffers for short messages are reused a lot, we can just register the user buffer instead of using preallocated buffer. Then we can do RDMA directly from the user buffer instead of the preallocated buffer. But if the user buffers are not reused, we will suffer from the overhead of memory registration. Besides the overhead of memory registration, is there any other reason that prevent you to do RDMA directly from the user buffer for short messages? Thank you.