>I thought about this some more. I think there's value in making it generic. >Can we maybe emulate TCP by changing the TID, or is this better done in the >ULP, in your opinion?
Thinking out loud here... I don't think that it makes sense to change the IB CM to support retrying a REQ more than that specified by the spec. Max CM retries is also used by other CM messages, plus there's the problem that what the active side is sending as a retry is really a new request to the passive side, and both requests carry the same active QPN. The problem that we were seeing running Intel MPI, and I'm guessing at least a couple of the other MPI implementations are hitting as well, wasn't that the number of retries was too small, but that the remote_cm_response_timeout was. Connections were taking minutes to form. Setting max CM retries to the largest value only helped to a point. My solution was to allow the user to override the IB CM REQ parameters used by the RDMA CM. This included local and remote CM response timeouts, plus max CM retries. It sounds like the only value that you want to make generic is max CM retries. Could the CMA retry a connection request after it times out by the IB CM? I think so, but that gets back to the issue of the passive IB CM seeing different connection requests for the same QP. For the actual problem I was trying to solve, the original REQ had been received, so a second REQ would have been rejected due to a duplicate QPN. I think that a generic solution would have to scale down to the lowest value. - Sean _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
