RE: [ofw] CM ref counting issues...

Hefty, Sean Thu, 17 Dec 2009 07:55:31 -0800

>> For example, in my testing, a REP mad was completed as canceled;
>
>A REP?  If a REP times out, why aren't you ending up sending a REJ and aborting
>the connection?


The RTU for a connection can be lost, but the connection still formed.  An app 
will see transferred data on the QP.  If the app then issues a DREQ, the state 
transitions to DREQ_SENT.  This is the state that the connection is in when the 
send callback is invoked for the REP.  The connections are much shorter lived 
than the CM message timeouts are in this case.

I would need to double check, but I thought the REP completed as canceled, not 
timed out.

>Not sure I quite follow here... The DREQ_SENT state should have invoked the
>callback.

This is a mad completion callback, not a cep state callback.  The mad was a 
REP, but the state was a DREQ_SENT.  This was the case I observed, but I'm 
pretty sure that other, similar problems exist.

>> +    else
>> +    {
>> +            KeReleaseInStackQueuedSpinLockFromDpcLevel( &hdl );
>> +            ib_put_mad( p_mad );
>> +    }
>
>Are you going to skip the switch statement on the MAD status then?  If so,
>don't forget to release the reference on the CEP held by the MAD.  Seems like
>you're missing a 'goto done;' here.

Yes - this needs to jump to the end, so we don't try to release the lock twice 
and we do release the reference on the cep.
_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

RE: [ofw] CM ref counting issues...

Reply via email to