Hefty, Sean wrote on Thu, 17 Dec 2009 at 13:53:16

>> +        default:
>> +            CL_ASSERT( p_mad->p_mad_buf->attr_id == CM_DREQ_ATTR_ID );
>> +            /* Treat as a timeout so we don't stall the state machine. */
>> +            p_mad->status => IB_WCS_TIMEOUT_RETRY_ERR;
> 
> I'm wondering about this and if it's even needed.  Changing the status
> isn't intuitive to someone reading the code, plus it hindered trying to
> debug the problem because the MAD was reported to the user with a
> timeout error, rather than canceled.  A canceled MAD indicates normal
> operation for the CM.

The idea is that a canceled DREQ should be handled identically to one that 
timed out.  The code delays moving the CEP into timewait until the DREQ is 
complete one way or another.  When you destroy the CEP, if you didn't send the 
DREQ, you send it.  If the DREQ was sent, you cancel it.  Once the DREQ 
completes, the CEP enters timewait.  You might be able to just move to timewait 
in __cleanup_cep and eliminate the DREQ_DESTROY state altogether.
 
> The mad status isn't used further down in the function to drive the
> state machine.  The states are driven solely based on the current state
> and the fact that a mad has failed.  I'm going to try removing this, so
> if you're aware of something that will immediately break, let me know.

Yes, removing the status assignment should be fine.  But I think entering 
timewait as soon as the CEP is destroyed might be cleaner, as it allows all 
canceled MADs to be treated the same way.

-Fab
_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to