Re: [openib-general] RFC on CM error handling

Sean Hefty Fri, 21 Jan 2005 11:55:33 -0800

Libor Michalek wrote:

An example issue that I'm thinking of is a user gets a reply callback. A reject is then received by the CM, and a second callback to the user is initiated. If the user tries to send an RTU, the call will fail since the cm_id is in an invalid state. If the user then returns -1 from the callback, the CM will destroy the cm_id. The destruction will block while the reject callback completes. Since the user returned -1 from the reply callback, they may not be ready to handle another callback.

The fix that I'm working on should still allow multithreaded operation inside the CM, but callbacks to the user will be serialized. If a user returns a non-zero value from a callback, no additional callbacks will be generated.
  OK, that's the behaviour I would expect. However, in the example, even
if the user returns 0 from the REP callback, I wouldn't expect to see
the REJ after the REP has been processed. (or after the RTU has been sent)
The CM states updates for a connection and resulting callbacks would be
serialized, so the REJ after the REP would be discarded since it was
received in a CM state which does not allow rejects. Or is this incorrect?

From the REP callback, even if the call to send an RTU is successful, a REJ could still be received. (The remote side timed out waiting for the RTU.) Locally, the cm_id state went from REP_RCVD to ESTABLISHED to TIMEWAIT. Given this, it seems that there are missing state transitions in the spec handling a REJ from REP_RCVD or MRA_REP_SENT states, which would drive the state back to IDLE.

- Sean

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RFC on CM error handling

Reply via email to