Re: [Pvfs2-developers] BMI questions

Sam Lang Fri, 01 Dec 2006 09:54:48 -0800


On Dec 1, 2006, at 7:10 AM, Scott Atchley wrote:

On Dec 1, 2006, at 4:33 AM, Sam Lang wrote:
Your example above is currently how writes work. The client sendsan unexpected message to the server (a control message for the IO,file info, size of the IO, etc.), which posts an expected receive,and then sends an expected back to the client. The client posts areceive for the expected before sending the unexpected. After thereceive of the expected message at the client completes (this is a'ready for IO' message from the server), It posts a send of theactual IO (this will be up to FlowBufferSize). Once that sendcompletes, it posts another one, and assumes that the server hasalready posted another receive (based on the size of the entireIO). Once all the IO has completed at the server (includingpushing the data to disk), the server sends a response ackmessage, which the client posted a receive for before doing any ofthe actual IO.
Ok.
It looks like the flow code on the server doesn't actually postthe next recv of IO (IO2), until the first recv has completed(IO1), so its possible that the client posts (and starts) the nextsend before the server posts the next receive, although itsprobably unlikely.
If IO operations are always > 32 KB, I would agree. But if any are<= 32 KB, MX will buffer them on the send side and completeimmediately. The client could then post another even if MX is inthe middle of delivering the first one. I can override thisbehavior (use mx_issend()) or use credits for control flow.

Hm...these particular IOs are going to post BMI_send calls > 32KB.If the IO is less than that, we probably want to pack the IO in thefirst request. We call that eager mode, and you would need to havethe BMI_get_info(BMI_GET_UNEXP_SIZE) return 32K.

In either case it sounds like its possible for a bunch of clientsends to get posted, and a bunch of server receives to get posted,without any of them actually completing. Is it possible to sort allthat out if the same tag is specified for all of them?

Each BMI receive uses a separate buffer (up to a max of 8 buffers).
Does this mean that at most, the client will post 8 IO sends peroperation?

The 8 buffer limit is specified by the FlowBuffersPerFlow configoption, and it just limits the number of buffers that can beallocated on the server (and hence the number of outstanding BMIoperations for a particular IO). In the diagram I sent in theprevious email, each IOn would have had an associated buffer. Whenit gets to 8, no more BMI_post_recv calls are made until one of theTROVE_post_write calls has completed first (freeing up one of thebuffers). None of that changes the behavior on the client, since theclient uses the user buffer. He keeps posting another send once aprevious send has completed.

Every time a bmi recv completes, two things happen, the associatedtrove write is posted, and a new bmi recv is posted. So overtime, bmi receives will get posted at the server before bmi sendsget posted at the client, but the second and maybe third bmireceives posted may be posted after the bmi sends at the client.
To answer your specific questions:
The same bmi tag is passed to each of the post_send and post_recvcalls for the entire IO operation.
I can live with this as long as only one receive is posted at atime using a specific tag.

Hm..we actually do post multiple receives using the same tag. AllBMI messages for a given IO operation get the same tag.

As to hitting resource limits, the client doesn't post the nextsend until the previous send has completed. I think with enoughIO operations from different clients happening concurrently, itmay be possible to run into the resource issues you speak of, butI need to verify that.
Definitely.
Yes it always posts a receive for an expected message. For mostexpected messages the receive is guaranteed to be posted beforethe peer posts the send. That doesn't appear to guaranteed in theIO case though, as I mentioned above.
Hope this helps.

-sam
Tremendously. In one of the diagrams above, you seem to indicatethat the server will post receives for unexpected messages. Is thisthe case? If so, does it simply use BMI_method_post_recv()? Withwhat tag, etc.?

From the IB code, it looks like the server does not post anunexpected, but relies on the BMI method to receive the message andput it in a queue, and then return it whenBMI_method_test_unexpected() is called. Am I reading this wrong?

No that's partly my own confusion. We post unexpected jobs in theserver, but this doesn't translate to a posted receive for unexpectedmessages in BMI. We just setup a queue for completed unexpected BMImessages, and populate that once BMI_testunexpected returns something.


-sam


Scott


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] BMI questions

Reply via email to