This is a continuation of the discussion from today's ofiwg and github issue
1645.
An attempt to describe the desired application behavior is:
1. Wait for one or more events to occur
2. Get a list of queues that are ready for action
3. Process each queue until empty
Assuming this behavior, this could roughly be broken into 3 cases:
a.) Single libfabric call - for one queue
fi_cq_sread/fi_eq_sread/fi_cntr_wait basically encapsulates the above steps
into one call. FWIW, some providers implement these calls by allocating a wait
set internally.
b.) libfabric only calls - for multiple queues
The 'natural' match for this would be to use:
1. fi_wait
2. fi_poll
3. fi_cq_read/fi_eq_read/fi_cntr_read
Step 2 is optional. Also, EQs cannot be assigned to poll sets, so all EQs
(likely 0 or 1) would need to be checked at step 3.
c.) OS + libfabric calls - for one or multiple queues
This modifies the above sequence to:
1. poll/select
2. fi_poll
3. fi_cq_read/fi_eq_read/fi_cntr_read
In case b) we can require that providers implement fi_wait such that it avoids
infinite waiting and application spinning, assuming correct application
behavior. It would be up to the provider to guarantee this. E.g. fi_wait
could clear any signals and check for ready queues before sleeping.
If this works, then there are issues only in case c). The poll/select fd's (or
wait objects) could come from a wait set or directly from the CQs/EQs/counters.
Even in the case of a wait set, the returned fd could be from epoll, and does
not guaranteed a single underlying wait object is in use. For example, verb
devices cannot share an fd between an EQ and CQ. I'm going to claim that this
means the app must act on each CQ etc. to guarantee the wait object is reset.
I believe this is true regardless of what fi_poll returns.
I'm not quite sure what all this means yet. :) In case c) the use of a
pollset does not seem to help, and could lead to lost events. E.g. an entry is
added to an empty CQ after fi_poll returns, while the fd is still readable. If
the app doesn't check the CQ, which isn't in the fi_poll output, it could miss
seeing the completion.
As for the API, it's unfortunate, but I believe that fi_cq_sread/fi_rq_sread
should be used to both read events from a queue and reset the wait object. The
alternative is a separate 'reset/rearm' call, which I would rather avoid, but
others can chime in. The sread calls are only needed in case c). Even more
unfortunate is that there is not fi_cntr_sread, only fi_cntr_wait.
<insert brilliant idea here>
- Sean
_______________________________________________
ofiwg mailing list
[email protected]
http://lists.openfabrics.org/mailman/listinfo/ofiwg