Gijsbert Wiesenekker wrote:
My MPI program consists of a number of processes that send 0 or more messages
(using MPI_Isend) to 0 or more other processes. The processes check
periodically if messages are available to be processed. It was running fine
until I increased the message size, and I got deadlock problems. Googling
learned I was running into a classic deadlock problem if (see for example
http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html). The workarounds
suggested like changing the order of MPI_Send and MPI_Recv do not work in my
case, as it could be that one processor does not send any message at all to the
other processes, so MPI_Recv would wait indefinitely.
Any suggestions on how to avoid deadlock in this case?
The problems you describe would seem to arise with blocking functions
like MPI_Send and MPI_Recv. With the non-blocking variants
MPI_Isend/MPI_Irecv, there shouldn't be this problem. There should be
no requirement of ordering the functions in the way that web page
describes... that workaround is suggested for the blocking calls. It
feels to me that something is missing from your description.
If you know the maximum size any message will be, you can post an
MPI_Irecv with wild card tags and source ranks. You can post MPI_Isend
calls for whatever messages you want to send. You can use MPI_Test to
check if any message has been received; if so, process the received
message and re-post the MPI_Irecv. You can use MPI_Test to check if any
send messages have completed; if so, you can reuse those send buffers.
You need some signal to indicate to processes that no further messages
will be arriving.