I have some questions related to the design semantics of BMI.

* timeouts. It looks like the timeout for bmi test calls is the max amount of time spent _idling_ in the test call (as apposed to the max time spent in the test call).

This is correct. The name of the argument is max_idle_time_ms. The main reason it was put there is to give an opportunity to prevent BMI from busy spinning when it is polling for completion. The more traditional timeout semantics (where you wait up to N seconds for something specific to finish before giving up, whether busy or not) is implemented at the job level. When the job level doesn't want BMI to block, it sets max_idle_time_ms to 0, but when it is doesn't really have much else to do it will set it to a few milliseconds. This is enough to prevent high cpu usage, but still low enough for us to pop out and do other occasional book keeping at the job level.

In other words, if operations are being completed continuously, then the timeout is never triggered, and the call can block for much longer than the actual timeout specified.

I don't think this is true in practice, because we never loop (within bmi) over a function that can idle. The bmi_tcp and bmi_gm methods take this approach to implementing the max idle time:

- check completion queue: if find something, return immediately
- call a generic progress function that may idle for as long as max_idle_time_ms but will exit as soon as it gets any work done (the work may or may not be related to what the caller tested for)
- check completion queue: if find something, return immediately

So the only way that this function can block much longer than max_idle_time_ms is if checking the completion queue takes a long time. Completion checking is typically very fast though; testsome() and test() map ids directoy to operations so there is no data structure searching, while testcontext() just takes the first N available items from the completion queue.

Is this the desired behavior? The concern would be that the bmi operations would be completed at a constant rate, causing a bursty behavior of completed bmi jobs.

I don't think it is particularly bursty, but the test functions will always return as much as they can from the completion queue when they check, on the theory that the caller can do a better job of figuring out what to do with them. There isn't much reason for the BMI layer to throttle completed operations.

> The incount constrains this,  but for
> both bmi api users and bmi method implementors we should  probably
> document all those semantics.

This stuff could definitely stand to have much better documentation.

-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to