Greetings everyone,

I have a scientific code using Open MPI (v3.1.3) that seems to work fine when 
MPI_Bcast() and MPI_Reduce() calls are well spaced out in time.  Yet if the 
time between these calls is short, eventually one of the nodes hangs at some 
random point, never returning from the broadcast or reduce call.  Is there some 
minimum time between calls that needs to be obeyed in order for Open MPI to 
process these reliably?

The reason this has come up is because I am trying to run in a multi-node 
environment some established acceptance tests in order to verify that the Open 
MPI configured version of the code yields the same baseline result as the 
original single node version of the code.  These acceptance tests must pass in 
order for the code to be considered validated and deliverable to the customer.  
One of these acceptance tests that hangs does involve 90 broadcasts and 90 
reduces in a short period of time (less than .01 cpu sec), as in:

 Broadcast #89 in
   Broadcast #89 out 8 bytes
   Calculate angle #89
   Reduce #89 in
   Reduce #89 out 208 bytes
 Write result #89 to file on service node
 Broadcast #90 in
   Broadcast #90 out 8 bytes
   Calculate angle #89
   Reduce #90 in
   Reduce #90 out 208 bytes
 Write result #90 to file on service node

If I slow down the above acceptance test, for example by running it under 
valgrind, then it runs to completion and yields the correct result.  So it 
seems to suggest that something internal to Open MPI is getting swamped.  I 
understand that these acceptance tests might be pushing the limit, given that 
they involve so many short calculations combined with frequent, yet tiny, 
transfers of data among nodes.  

Would it be worthwhile for me to enforce with some minimum wait time between 
the MPI calls, say 0.01 or 0.001 sec via nanosleep()?  The only time it would 
matter would be when acceptance tests are run, as the situation doesn't arise 
when beefier runs are performed. 


users mailing list

Reply via email to