On Wednesday 22 February 2006 18:36, Jack Morgenstein wrote: > The issue is complex, and two-fold: > > A > -- > 1. We should PREVENT sending a new duplicate identical request MADs > while the previous MAD has not yet timed out (but allow RMPP ACK/NACK > packets, which have the identical TID/GID/class as the original request > packet). > > 2. Similarly, we should PREVENT sending a new duplicate RMPP mad from > sender side (usually an RMPP response) while the previous RMPP session > is still in progress. > > B > -- > We should ALLOW sending duplicate response MADs (or duplicate RMPP > response sessions) having the same transaction ID, but going to > different destinations. > > ---- > Regarding A.2 and B: Normal (non-RMPP) responses do not have timeouts, > whereas RMPP responses do have timeouts per segment (via the RMPP > protocol). > However, these timeouts are visible only after the call to > ib_post_send_mad() (which is the natural place to put duplication > detection). > > In the current OpenSM implementation, all response MADs are passed from > user-space to kernel space with a timeout set to zero -- and this > 0-timeout is passed to ib_post_send_request() by ib_umad_write. > > If an RMPP response is indicated, the timeout is changed in mad_rmpp.c, > send_next_seg() just before calling ib_send_mad(). Thus, when the > segment is sent and the send_completion is received, the mad transaction > is transferred to the send wait-queue to await a response packet (since > the timeout is non-zero at that point).
The reason for this discussion on timeouts is for issue A.1. If we only do the duplication check for MADs with timeouts (i.e., MADs expecting a response), we will miss checking RMPP responses (which are sent with 0-timeout, as they should be -- all the RMPP complexity is, and should be, hidden from the sender). If, however, we add the duplication check for MADs with timeout=0, we'll check duplicates (inpropriately) for ALL mads. This, specifically, will cause problems for the RMPP ACK/NACK messages. The correct condition for checking when sending a MAD is therefore: If EITHER the timeout specified in the ib_mad_send_buf struct is > 0 ; OR the packet has RMPP active, but is only a data packet (not a control packet), so we will check for RMPP responses. -- Jack _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
