Eugene Loh wrote:
Okay. Attached is a "little" note I wrote up illustrating memory
profiling with Sun tools. (It's "big" because I ended up including a
few screenshots.) The program has a bunch of one-way message traffic
and some user-code memory allocation. I then rerun with the receiver
sleeping before jumping into action. The messages back up and OMPI ends
up allocating a bunch of memory. The tools show you who (user or OMPI)
is allocating how much memory and how big of a message backlog develops
and how the sender starts stalling out (which is a good thing!).
Anyhow, a useful exercise for me and hopefully helpful for you.
Wow. Thanks, Eugene. I definitely have to look into the Sun HPC
ClusterTools. It looks as though it could be very informative.
What's the purpose of the 400 MB that MPI_Init has allocated?
The figure of in-flight messages vs time when the receiver sleeps is
particularly interesting. The sender appears to stop sending and block
once there are 30'000 in-flight messages. Has Open MPI detected the
situation of congestion and begun waiting for the receiver to catch
up? Or is it something simpler, such as the underlying write(2) call
to the TCP socket blocking? If it's the first case, perhaps I could
tune this threshold to behave better for my application.
Cheers,
Shaun