Georges Markomanolis wrote:

I have some questions about the duration of the communication with MPI_Send and MPI_Recv. I am using either SkaMPI either my implementation to measure the pingpong (MPI_Send and MPI_Recv) time between two nodes for 1 byte and more. The timing of the pingpong is 106.8 microseconds. Although if I measure only the ping of the message (only the MPI_Send) the time is ~20 microseconds. Could anyone explain me why it is not the half? I would like to understand what is the difference inside to OpenMpi about MPI_Send and MPI_Recv.

The time for the MPI_Send is the time to move the data out of the user's send buffer. It is quite possible that the data has not yet gotten to the destination. If the message is short, it could be buffered somewhere by the MPI implementation.

The time for MPI_Recv probably includes some amount of waiting time.


More analytical the timings for pingpong between two nodes with a simple pingpong application, timings only for rank 0 (almost the same for rank 1):
1 byte, time for MPI_Send, 9 microsec, time for MPI_Recv, 86.4 microsec
1600 bytes, time for MPI_Send, 14.7 microsec, time for MPI_Recv, 197.07 microsec 3200 bytes, time for MPI_Send, 19.73 microsec, time for MPI_Recv, 227.6 microsec 518400 bytes, time for MPI_Send, 3536.5 microsec, time for MPI_Recv, 5739.6 microsec 1049760 bytes, time for MPI_Send, 8020.33 microsec, time for MPI_Recv, 10287 microsec

So the duration of the MPI_Send is till the buffer goes to the queue of the destination without the message to be saved in the memory or something like this, right?

It is possible that the data has not gone to the destination, but only some intermediate buffer, but yes it is possible that the message has not made it all the way to the receive buffer by the time the MPI_Send has finished.

So if I want to know the real time of sending one message to another node (taking the half of pingpoing seems that is not right)

It is not clear to me what "the real time" is. I don't think there is any well-defined answer. It depends on what you're really looking for, and that is unclear to me. You could send many sends to many receivers and see how fast a process can emit sends. You can use a profiler to send how the MPI implementation spends its time; I've had some success with using Oracle Studio Performance Analyzer on OMPI. You could use the PERUSE instrumentation inside of OMPI to get timestamps on particular internal events. You could try designing other experiments. But which one is "right" could be debated.

Why does it matter?  What are you really looking for?

should I use a program with other commands like  MPI_Fence, MPI_Put etc?

Those are a different set of calls (one-sided operations) that could be more or less efficient than Send/Recv. It varies.

Or is there any flag when I execute the application where MPI_Send behaves like I would expect? According to MPI standards what is MPI_Send measuring? If there is any article which explain all these please inform me.

MPI_Send completes when the data has left the send buffer and that buffer can be reused by the application. There are many implementation choices. Specifically, it is possible that the MPI_Send will complete even before the MPI_Recv has started. But it is also possible that the MPI_Send will not complete until after the MPI_Recv has completed. It depends on the implementation, which may choose a strategy based on the message size, the interconnect, and other factors.

Reply via email to