Can you isolate a bit more where the time is being spent? The
performance effect you're describing appears to be drastic. Have you
profiled the code? Some choices of tools can be found in the FAQ
http://www.open-mpi.org/faq/?category=perftools The results may be
"uninteresting" (all time spent in your MPI_Waitall calls, for example),
but it'd be good to rule out other possibilities (e.g., I've seen cases
where it's the non-MPI time that's the culprit).
If all the time is spent in MPI_Waitall, then I wonder if it would be
possible for you to reproduce the problem with just some
MPI_Isend|Irecv|Waitall calls that mimic your program. E.g., "lots of
short messages", or "lots of long messages", etc. It sounds like there
is some repeated set of MPI exchanges, so maybe that set can be
extracted and run without the complexities of the application.
Anyhow, some profiling might help guide one to the problem.
Gilbert Grosdidier wrote:
There are indeed a high rate of communications. But the buffer
size is always the same for a given pair of processes, and I thought
that mpi_leave_pinned should avoid freeing the memory in this case.
Am I wrong ?