Hi Eugene,

Eugene Loh wrote:
At 2500 bytes, all messages will presumably be sent "eagerly" -- without waiting for the receiver to indicate that it's ready to receive that particular message. This would suggest congestion, if any, is on the receiver side. Some kind of congestion could, I suppose, still occur and back up on the sender side.

Can anyone chime in as to what the message size limit is for an `eager' transmission?

On the other hand, I assume the memory imbalance we're talking about is rather severe. Much more than 2500 bytes to be noticeable, I would think. Is that really the situation you're imagining?

The memory imbalance is drastic. I'm expecting 2 GB of memory use per process. The heaving processes (13/16) use the expected amount of memory; the remainder (3/16) misbehaving processes use more than twice as much memory. The specifics vary from run to run of course. So, yes, there is gigs of unexpected memory use to track down.

There are tracing tools to look at this sort of thing. The only one I have much familiarity with is Sun Studio / Sun HPC ClusterTools. Free download, available on Solaris or Linux, SPARC or x64, plays with OMPI. You can see a timeline with message lines on it to give you an idea if messages are being received/completed long after they were sent. Another interesting view is constructing a plot vs time of how many messages are in-flight at any moment (including as a function of receiver). Lots of similar tools out there... VampirTrace (tracing side only, need to analyze the data), Jumpshot, etc. Again, though, there's a question in my mind if you're really backing up 1000s or more of messages. (I'm assuming the memory imbalances are at least Mbytes.)

I'll check out Sun HPC ClusterTools. Thanks for the tip.

Assuming the problem is congestion and that messages are backing up, is there an accepted method of dealing with this situation? It seems to me the general approach would be

if (number of outstanding messages > high water mark)
    wait until (number of outstanding messages < low water mark)

where I suppose the `number of outstanding messages' is defined as the number of messages that have been sent and not yet received by the other side. Is there a way to get this number from MPI without having to code it at the application level?

Thanks,
Shaun

Reply via email to