> Re: MPI_Ssend(). This indeed fixes bug3, the process at rank 0 has > reasonable memory usage and the execution proceeds normally. > > Re scalable: One second. I know well bug3 is not scalable, and when to > use MPI_Isend. The point is programmers want to count on the MPI spec as > written, as Richard pointed out. We want to send small messages quickly > and efficiently, without the danger of overloading the receiver's > resources. We can use MPI_Ssend() but it is slow compared MPI_Send().
Your last statement is not necessarily true. By synchronizing processes using MPI_Ssend(), you can potentially avoid large numbers of unexpected messages that need to be buffered and copied, and that also need to be searched every time a receive is posted. There is no guarantee that the protocol overhead on each message incurred with MPI_Ssend() slows down an application more than the buffering, copying, and searching overhead of a large number of unexpected messages. It is true that MPI_Ssend() is slower than MPI_Send() for ping-pong micro-benchmarks, but the length of the unexpected message queue doesn't have to get very long before they are about the same. > > Since identifying this behavior we have implemented the desired flow > control in our application. It would be interesting to see performance results comparing doing flow control in the application versus having MPI do it for you.... -Ron