>Ahh, ok. I don't think we get the communication established event in this case >since that should only happen while we're in RTR, not RTS, and the QP is >transitioned to RTS right away, isn't it? Or do you delay the RTS transition >until the RTU is received in WinVerbs?
winverbs transitions to RTS before sending the REP. This way the app can immediately respond to a received message. >I think we need to have a better understanding of what's going on. We're >getting closer, but not quite there yet (at least I don't fully understand >yet.) The basic problem is that __cep_mad_send_cb() assumes that the mad being processed is associated with the *current* state of the CEP. What's observed is this: __cep_mad_send_cb() was invoked for a mad with attr_id = 0x1300 (CM_REP_ATTR_ID) with status 0xf (IB_WCS_CANCELED). The current state of the cep is CEP_STATE_DREQ_SENT. You'll need to trace through the call for this, but the code sees that the request was canceled, changes mad->status to timeout_retry, then drops to processing cep state CEP_STATE_DREQ_SENT. The assumption being made is that the mad being processed is a timed out DREQ, so the cep is transitioned into CEP_STATE_TIMEWAIT. In reality, the mad was a successfully processed REP, which was canceled when the RTU was received. Meanwhile, the real DREQ is still outstanding. Even if a DREP is received, it'll be dropped because the cep is now in the wrong state, or could have exited timewait completely. To fix this, before processing a completed send mad, the current state of the cep should be checked against the state that the cep was in when the mad was sent. If those states differ, then the send completion should simply be discarded, as some other action is now driving the state machine. - Sean _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
