Scott showed in his MX tests that a number of the expected messages were getting received before BMI operations had actually been posted for them. He sees this on both client and server, but I'm going to focus on the server (and writes) for now. I think he's seeing this in part due to a race condition in the IO state machine on the server. We send back the positive ack, wait for it to complete, and then post the flow, which in turn posts the first expected message for data. Posting the flow does the allocation of the first flow buffer, which in some cases may take as long as the client to send the first expected message (esp. with larger flow buffers). The race occurs if allocating the buffer takes longer than sending the first message.

The attached patch tries to address this issue, by posting the ack and the flow together in a single state, and then waiting for both to complete before sending the write ack. Its a bit more complicated than the current IO state machine, because we have different states for reads and writes now, but I tried to move most of the common code into separate functions.

I'm not sure this will actually make a difference for expected messages though. With tcp, I'm seeing the bmi call that posts the response complete immediately (returning 1), so its essentially the same as before. Do the other bmi methods complete immediately for small messages as well? I could post the flow first before posting the ack, which will remove the race condition guaranteed, but it might also be slower (which takes longer, buffering the expected message and copying to the flow buffers or allocating the flow buffers first)?

In any case, Scott it might be worth it to try this patch and see if we're able to post operations for expected messages before they're received.

-sam


Attachment: io-flow-post.patch
Description: Binary data

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to