Georges Markomanolis wrote:
I have some questions about the duration of the communication with
MPI_Send and MPI_Recv. I am using either SkaMPI either my
implementation to measure the pingpong (MPI_Send and MPI_Recv) time
between two nodes for 1 byte and more. The timing of the pingpong is
106.8 microseconds. Although if I measure only the ping of the message
(only the MPI_Send) the time is ~20 microseconds. Could anyone explain
me why it is not the half? I would like to understand what is the
difference inside to OpenMpi about MPI_Send and MPI_Recv.
The time for the MPI_Send is the time to move the data out of the user's
send buffer. It is quite possible that the data has not yet gotten to
the destination. If the message is short, it could be buffered
somewhere by the MPI implementation.
The time for MPI_Recv probably includes some amount of waiting time.
More analytical the timings for pingpong between two nodes with a
simple pingpong application, timings only for rank 0 (almost the same
for rank 1):
1 byte, time for MPI_Send, 9 microsec, time for MPI_Recv, 86.4 microsec
1600 bytes, time for MPI_Send, 14.7 microsec, time for MPI_Recv,
197.07 microsec
3200 bytes, time for MPI_Send, 19.73 microsec, time for MPI_Recv,
227.6 microsec
518400 bytes, time for MPI_Send, 3536.5 microsec, time for MPI_Recv,
5739.6 microsec
1049760 bytes, time for MPI_Send, 8020.33 microsec, time for MPI_Recv,
10287 microsec
So the duration of the MPI_Send is till the buffer goes to the queue
of the destination without the message to be saved in the memory or
something like this, right?
It is possible that the data has not gone to the destination, but only
some intermediate buffer, but yes it is possible that the message has
not made it all the way to the receive buffer by the time the MPI_Send
has finished.
So if I want to know the real time of sending one message to another
node (taking the half of pingpoing seems that is not right)
It is not clear to me what "the real time" is. I don't think there is
any well-defined answer. It depends on what you're really looking for,
and that is unclear to me. You could send many sends to many receivers
and see how fast a process can emit sends. You can use a profiler to
send how the MPI implementation spends its time; I've had some success
with using Oracle Studio Performance Analyzer on OMPI. You could use
the PERUSE instrumentation inside of OMPI to get timestamps on
particular internal events. You could try designing other experiments.
But which one is "right" could be debated.
Why does it matter? What are you really looking for?
should I use a program with other commands like MPI_Fence, MPI_Put etc?
Those are a different set of calls (one-sided operations) that could be
more or less efficient than Send/Recv. It varies.
Or is there any flag when I execute the application where MPI_Send
behaves like I would expect? According to MPI standards what is
MPI_Send measuring? If there is any article which explain all these
please inform me.
MPI_Send completes when the data has left the send buffer and that
buffer can be reused by the application. There are many implementation
choices. Specifically, it is possible that the MPI_Send will complete
even before the MPI_Recv has started. But it is also possible that the
MPI_Send will not complete until after the MPI_Recv has completed. It
depends on the implementation, which may choose a strategy based on the
message size, the interconnect, and other factors.