Have a look at the FAQ; we discuss quite a few of these kinds of issues:
- http://www.open-mpi.org/faq/?category=tuning
- http://www.open-mpi.org/faq/?category=openfabrics
More specifically, what Eugene is saying is correct -- OMPI has made
tradeoffs for various, complicated reasons. One of the things that we
sacrificed in the common case was communication/computation overlap on
OpenFabrics networks (in the common case).
If you want good overlap, set the MCA parameter mpi_leave_pinned to 1
(on OpenFabrics networks). This will effectively move the bulk of the
message passing progress (but not all of it) down to the hardware.
Hence, when you sleep/do real computations while looping over
MPI_TEST, the message is probably actually being progressed in the
background. You won't see this kind of overlap with other transports
such as shared memory or TCP because we don't have hardware assist in
these cases.
On Nov 6, 2008, at 12:52 PM, Eugene Loh wrote:
vladimir marjanovic wrote:
In order to overlap communication and computation I don't want to
use MPI_Wait.
Right. One thing to keep in mind is that there are two ways of
overlapping communication and computation. One is you start a send
(MPI_Isend), you do a bunch of computation while the message is
being sent, and then after the message has been sent you call
MPI_Wait just to clean up. This assumes that the MPI implementation
can send a message while control of the program has been returned to
you. The experts can give you the fine print, but my simple
assertion is, "This doesn't usually happen."
Rather, the MPI implementation typically will send data only when
your code is in some MPI call. That's why you have to call MPI_Test
periodically... or some other MPI function.
For sure the message is being decomposed into chucks and the size
of chuck is probably defined by environment variable.
Maybe do you know how can I control size of chuck?
I don't. Try running "ompi_info -a" and looking through the
parameters. For the shared-memory BTL, it's
mca_btl_sm_max_frag_size. I also see something like
coll_sm_fragment_size. Maybe look at the parameters that have
"btl_openib_max" in their names.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems