> From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Friday, June 30, 2006 2:28 PM > > Rimmer, Todd wrote: > > Shouldn't the cm_dup_req_handler in this case also resend the REP per > > the IBTA passive side state machine "REP Sent" state? > > The REP will already being retried based on a timeout. It could be resent > immediately in response to a duplicate REQ as well, but that shouldn't be > necessary, and actually makes things more complex, since coordination must > be > done between sending based on a timeout, versus receiving a duplicate REQ.
I would recommend implementing the state machine as defined in the spec for the following reasons: 1. it will be necessary to pass any future IBTA CIWG compliance tests for the CM 2. I would need to think about it, but the lost REP case may not be the only situation where a duplicate REQ can be received. 3. depending on RTU timeout on the passive side as the only means for resending the REP reduces the retries attempted in a "lossy" fabric for REP and RTU loss (eg. if you have 8 RTU timeout retries on passive side, and many REPs are lost followed by many RTUs, you get a total of 8 lost REPs+RTUs before you give up, managing the counters separately will tend allow for more retries). In our proprietary stack we implemented the defined state machine and have stressed it for 1000s of concurrent connections (including various Chariot SDP connect/disconnect stress tests and Oracle uDAPL stress tests plus our use of the CM to establish connections when running MPI on 1000s of nodes) in various real world and contrived situations of packet loss and slow responsiveness and the defined state machine has worked very well for all these situations. Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
