This is a continuation of the discussion from today's ofiwg and github issue 

An attempt to describe the desired application behavior is:

    1. Wait for one or more events to occur
    2. Get a list of queues that are ready for action
    3. Process each queue until empty

Assuming this behavior, this could roughly be broken into 3 cases:

a.)  Single libfabric call - for one queue
fi_cq_sread/fi_eq_sread/fi_cntr_wait basically encapsulates the above steps 
into one call.  FWIW, some providers implement these calls by allocating a wait 
set internally.

b.)  libfabric only calls - for multiple queues
The 'natural' match for this would be to use:

    1. fi_wait
    2. fi_poll
    3. fi_cq_read/fi_eq_read/fi_cntr_read

Step 2 is optional.  Also, EQs cannot be assigned to poll sets, so all EQs 
(likely 0 or 1) would need to be checked at step 3.

c.)  OS + libfabric calls - for one or multiple queues
This modifies the above sequence to:

    1. poll/select
    2. fi_poll
    3. fi_cq_read/fi_eq_read/fi_cntr_read

In case b) we can require that providers implement fi_wait such that it avoids 
infinite waiting and application spinning, assuming correct application 
behavior.  It would be up to the provider to guarantee this.  E.g. fi_wait 
could clear any signals and check for ready queues before sleeping. 

If this works, then there are issues only in case c).  The poll/select fd's (or 
wait objects) could come from a wait set or directly from the CQs/EQs/counters. 
 Even in the case of a wait set, the returned fd could be from epoll, and does 
not guaranteed a single underlying wait object is in use.  For example, verb 
devices cannot share an fd between an EQ and CQ.  I'm going to claim that this 
means the app must act on each CQ etc. to guarantee the wait object is reset.  
I believe this is true regardless of what fi_poll returns.

I'm not quite sure what all this means yet.  :)  In case c) the use of a 
pollset does not seem to help, and could lead to lost events.  E.g. an entry is 
added to an empty CQ after fi_poll returns, while the fd is still readable.  If 
the app doesn't check the CQ, which isn't in the fi_poll output, it could miss 
seeing the completion.

As for the API, it's unfortunate, but I believe that fi_cq_sread/fi_rq_sread 
should be used to both read events from a queue and reset the wait object.  The 
alternative is a separate 'reset/rearm' call, which I would rather avoid, but 
others can chime in.  The sread calls are only needed in case c).  Even more 
unfortunate is that there is not fi_cntr_sread, only fi_cntr_wait.

<insert brilliant idea here>

- Sean
ofiwg mailing list

Reply via email to