On Jan 12, 2007, at 9:03 AM, Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Fri, 12 Jan 2007 09:55 -0500:
In TCP, the OS will receive and buffer the data. There is always a
copy regardless if you pre-post the receive or not. Are you asking
which is faster between memory copy and network transfer? If so, I
would think that the memory copy is always faster. Given that, the
current strategy (ack, then post the flow) makes the most sense.

In IB, I believe that A cannot write/put a large message to B until B
has allocated memory and sent the memory address to A. This is why
bmi_ib needs the RTS and CTS messages.

MX does this internally. When A posts a large send, MX sends a
"scout" message, which is equivalent to the RTS, to B that includes
the matching info and length. If B has posted a receive, than B
replies with an ack and A can start sending data. If B has not posted
a receive, then the scout message goes into the unexpected queue.
When B does post a matching receive, it then has to scan the
unexpected queue to see if it has already arrived. If so, it matches
and sends an ack to start the data transfer.

By pre-posting the receives, we eliminate the scanning a potentially
very long unexpected queue (I am thinking of the case of a storage
server handling 10s or 100s of clients).

If you pre-post the receives, then in the IB case you could send all
of that data in the ack to the initial sendunexpected and potentially
eliminate the RTS and CTS messages as well.

Pete, I could possible be smoking something and this is not possible
in IB at all. Any thoughts?

Sam, it may be that I am trying to optimize something that will not
provide much benefit at all. Can you send a patch that simply posts
the flow before the ack. I can test it on MX-10G and see if it
impacts performance at all. If not, leave things as they are.

I think that all makes sense.  Agree that the need for preposting
receives is to avoid big queues of waiting unexpected messages.

Hi Scott,

The attached patch posts the flow (and receives) before posting the send of the response ack. I posted the response ack before the flow, because the first call to BMI_memalloc (with a request for 1MB in your case) happens in the flow post call, so that delays posting of the response ack. I'm curious if this will actually give you better performance.

I also fixed that assert failure you were getting. Let me know if this works for you.

Thanks,

-sam

Attachment: io-flow-post.patch
Description: Binary data



(Doubt anyone will bother to coalesce the sendunexpected ack and CTS,
as that's some complexity to save one little message.)

A long time ago I suggested that we mandate that BMI users must
prepost all receives, but this was rejected (reasonably) in that it
makes app programming more difficult.  Instead I had to go and
implement RTS/CTS, and MX has to use its scout messages.  These
things are fine to do, but we can avoid some performance overheads
by still trying to use preposted receives where possible, especially
in hot paths like IO flows.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to