I have another send side optimization on the pipeline. As Galen will
take care of the receiver side, I can focused on the send side. As far
as I see, this will rip off few tens of microsecond, bringing our
latency on shared memory to basically the same level as MPICH2. This
might requires s
FYI, About six months ago several of us spent some time coming up with a
plan to deal with the latency problems in Open MPI. George went ahead and
has been implementing the send side changes of this optimization over the
last several months, but has not had time to get to the receive side. Galen
i