On 23 Mar 2009, at 23:36, Shaun Jackman wrote:
loop {
MPI_Ibsend (for every edge of every leaf node)
MPI_barrier
MPI_Iprobe/MPI_Recv (until no messages pending)
MPI_Allreduce (number of nodes removed)
} until (no nodes removed by any node)

Previously, I attempted to use a single MPI_Allreduce without the MPI_Barrier:

You need both the MPI_Barrier and the synchronisation semantics of the MPI_Allreduce in this example, it's important that each send matches a recv for the same iteration so you need to ensure all sends have been sent before you call probe and a Barrier is one way of doing this. You also need the syncronisation semantics of the Allreduce to ensure the iProbe doesn't match a send from the next iteration of the loop.

Perhaps there is a better way of accomplishing the same thing however, MPI_Barrier syncronises all processes so is potentially a lot more heavyweight than it needs to be, in this example you only need to syncronise with your neighbours so it might be quicker to use a send/receive for each of your neighbours containing a true/false value rather than to rely on the existence of a message or not. i.e. the barrier is needed because you don't know how many messages there are, it may well be quicker to have a fixed number of point to point messages rather than a extra global synchronisation. The added advantage of doing it this way would be you could remove the Probe as well.

Potentially it would be possible to remove the Allreduce as well and use the tag to identify the iteration count, assuming of course you don't need to know the global number of branches at any iteration. One problem with this approach can be that one process can get very slow and swamped with unexpected messages however assuming your neighbour count is small this shouldn't be a problem. I'd expect their to not only be a net gain changing to this way but for the application to scale better as well.

Finally I've always favoured iRecv/Send over Ibsend/Recv as in the majority of cases this tends to be faster, you'd have to benchmark your specific setup however.

Ashley,

Reply via email to