> Talpey, Thomas > Sent: Tuesday, June 06, 2006 10:49 AM > > At 10:40 AM 6/6/2006, Roland Dreier wrote: > > Thomas> This is the difference between "may" and "must". The value > > Thomas> is provided, but I don't see anything in the spec that > > Thomas> makes a requirement on its enforcement. Table 107 says the > > Thomas> consumer can query it, that's about as close as it > > Thomas> comes. There's some discussion about CM exchange too. > > > >This seems like a very strained interpretation of the spec. For > > I don't see how strained has anything to do with it. It's not saying > anything > either way. So, a legal implementation can make either choice. We're > talking about the spec! > > But, it really doesn't matter. The point is, an upper layer should be > paying > attention to the number of RDMA Reads it posts, or else suffer either the > queue-stalling or connection-failing consequences. Bad stuff either way. > > Tom.
Somewhere beneath this discussion is a bug in the application or IB stack. I'm not sure which "may" in the spec you are referring to, but the "may"s I have found all are for cases where the responder might support only 1 outstanding request. In all cases the negotiation protocol must be followed and the requestor is not allowed to exceed the negotiated limit. The mechanism should be: client queries its local HCA and determines responder resources (eg. number of concurrent outstanding RDMA reads on the wire from the remote end where this end will respond with the read data) and initiator depth (eg. number of concurrent outstanding RDMA reads which this end can initiate as the requestor). client puts the above information in the CM REQ. server similarly gets its information from its local CA and negotiates down the values to the MIN of each side (REP.InitiatorDepth = MIN(REQ.ResponderResources, server's local CAs Initiator depth); REP.ResponderResources = MIN(REQ.InitiatorDepth, server's local CAs responder resources). If server does not support RDMA Reads, it can REJ. If client decided the negotiated values are insufficient to meet its goals, it can disconnect. Each side sets its QP parameters via modify QP appropriately. Note they too will be mirror images of eachother: client: QP.Max RDMA Reads as Initiator = REP.ResponderResources QP.Max RDMA reads as responder = REP.InitiatorDepth server: QP.Max RDMA Reads as responder = REP.ResponderResources QP.Max RDMA reads as initiator = REP.InitiatorDepth We have done a lot of high stress RDMA Read traffic with Mellanox HCAs and provided the above negotiation is followed, we have seen no issues. Note however that by default a Mellanox HCA typically reports a large InitiatorDepth (128) and a modest ResponderResources (4-8). Hence when I hear that Responder Resources must be grown to 128 for some application to reliably work, it implies the negotiation I outlined above is not being followed. Note that the ordering rules in table 76 of IBTA 1.2 show how reads and write on a send queue are ordered. There are many cases where an op can pass an outstanding RDMA read, hence it is not always bad to queue extra RDMA reads. If needed, the Fence can be sent to force order. For many apps, its going to be better to get the items onto queue and let the QP handle the outstanding reads cases rather than have the app add a level of queuing for this purpose. Letting the HCA do the queuing will allow for a more rapid initiation of subsequent reads. Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
