Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: RE: [openib-general] ucma into kernel.org > > >I thought about this some more. I think there's value in making it generic. > >Can we maybe emulate TCP by changing the TID, or is this better done in the > >ULP, in your opinion? > > Thinking out loud here... > > I don't think that it makes sense to change the IB CM to support retrying a > REQ > more than that specified by the spec. Max CM retries is also used by other CM > messages, plus there's the problem that what the active side is sending as a > retry is really a new request to the passive side, and both requests carry the > same active QPN. > > The problem that we were seeing running Intel MPI, and I'm guessing at least a > couple of the other MPI implementations are hitting as well, wasn't that the > number of retries was too small, but that the remote_cm_response_timeout was. > Connections were taking minutes to form. Setting max CM retries to the > largest > value only helped to a point. > > My solution was to allow the user to override the IB CM REQ parameters used by > the RDMA CM. This included local and remote CM response timeouts, plus max CM > retries. It sounds like the only value that you want to make generic is max > CM > retries. > > Could the CMA retry a connection request after it times out by the IB CM? I > think so, but that gets back to the issue of the passive IB CM seeing > different > connection requests for the same QP. For the actual problem I was trying to > solve, the original REQ had been received, so a second REQ would have been > rejected due to a duplicate QPN.
How do you mean duplicate QPN? You can;t track remote QPNs, can you? > I think that a generic solution would have to scale down to the lowest value. > > - Sean > -- MST _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
