On Mar 18, 2008, at 10:32 AM, George Bosilca wrote:
Jeff hinted the real problem in his email. Even if the program use the correct MPI functions, it is not 100% correct.
I think we disagree here -- the sample program is correct according to the MPI spec. It's an implementation artifact that makes it deadlock.
The upcoming v1.3 series doesn't suffer from this issue; we revamped our transport system to distinguish between early and normal completions. The pml_ob1_use_eager_completion MCA param was added to v1.2.6 to allow correct MPI apps to avoid this optimization -- a proper fix is coming in the v1.3 series.
It might pass in some situations, but can lead to fake "deadlocks" in others. The problem come from the flow control. If the messages are small (which is the case in the test example), Open MPI will send them eagerly. Without a flow control, these messages will be buffered by the receiver, which will exhaust the memory on the receiver. Once this happens, some of the messages may get dropped, but the most visible result, is that the progress will happens very (VERY) slowly.
Your text implies that we can actually *drop* (and retransmit) messages in the sm btl. That doesn't sound right to me -- is that what you meant?
-- Jeff Squyres Cisco Systems