> From: Michael S. Tsirkin > Sent: Wednesday, October 04, 2006 4:37 PM > To: Sean Hefty > Cc: Ishai Rabinovitz; [email protected] > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > > > Michael S. Tsirkin wrote: > > >>There's several timeout values transfered and used by the cm, most > notably the > > >>remote cm response timeout and packet life time. Does it make more > sense to > > >>have a single, generic timeout maximum instead? > > > > > > Hmm. I'm not sure - we are working around an actual broken > implementation here - > > > what do you think? > > > > I wasn't sure either. The MRA timeout is a combination of the packet > life time > > + service timeout, which made me bring this up. The patch only handles > the > > service timeout portion, so we end up in the same situation if a large > packet > > life time is ever used. > > But that comes from the SA, does it not? > > > >>Would it make more sense to > > >>enable the maximum(s) by default, since we're dependent upon values > received > > >>over the network? > > > > > > I think it would. > > > > So do I. > > > > The CM has checks to bring out of range values into range, but at the > maximum, > > we get a timeout of about 2.5 hours. Multiple that by 15 retries, and > the cm > > can literally spend all day retrying a request. > > > > I was considering dropping the default maximum down to around 4-8 > seconds, which > > with retries still gives us about a minute to timeout a request. The > default > > maximum would apply to local and remote cm timeouts, packet life time, > and > > service timeout, but could be overridden by the user. (Basically, with > Ishai's > > patch: rename mra_timeout_limit to timeout_limit, set to a default of > 20, and > > replace occurrences of '31' in the code with timeout_limit.) > > For remote cm timeout and service timeout this makes sense - they seem > currently mostly taken out of the blue on implementations I've seen. > > But since the packet lifetime comes from the SM, it actually has a chance > to reflect some knowledge about the network topology. > And since we haven't see any practical issues with packet life time yet - > maybe a different paremeter for that, with a higher limit? > > --
I recommend sticking with the IB spec for the various timeouts. In our products we carefully implemented the timeouts and computations as defined by the spec. The SM controls the pkt lifetime and should base it on a knowledge of the fabric topology and configuration. Many of the CA specific base timers are specific to the HCA/TCA itself (hence we provided this information as part of queries to the CA verbs driver). We permitted configuration in the individual verbs drivers to override the "reasonable estimates" which we provided as defaults for each HCA model we support. It's a little tricky to work out the details defined in the spec (a summary section on timers would have made it easier), however I did that effort a few years ago and here is a summary of all the HCA/TCA related IB timers below. Notice many of these must be "uncomputed" from information in the CM REQ and REP to get the base level values (such as pkt lifetime which is not directly specified in CM REQ): 3.1 Base Timers CA Ack Delay - time from Receipt of IB transport packet to sending of ACK. Hardware and VlArb dependent. CA inbound processing time - time from receipt of IB transport packet to delivery and processing in CA's transport state machine. Hardware dependent. CA outbound processing time - time from entry of packet to QP until transmit packet on wire. hardware and VlArb dependent. Class turnaround time(class) - processing time from delivery of request on QP to posting of response on QP 3.2 Derived Timers Ack Timeout - timeout for QP ACK/NAK before QP resends up to RetryCount = 2*(PktLifeTime)+Remote CA Ack Delay + local CA inbound processing Time RNR NAK Delay - Appl protocol must be prepared to replenish Recv Q of QP within RNR NAK Delay + 2*(PktLifeTime), can set this to low bound and RNRNakDelay*RNRRetryLimit must be > upper bound PortInfo:SubnetTimeout = max(PktLifeTime for all pathsRecords within subnet) PortInfo:RespTimeout - SMA max time between receipt to response within Node, includes CA delays in receive and Send. = ClassTurnaroundTime(SMA) + CA inbound (QP0) + CA outbound (QP0) ClassPortInfo:RespTimeout- GSA class max time between receipt to response within Node, includes CA delays in receive and Send. = ClassTurnaroundTime(class) + CA inbound (QP1) + CA outbound (QP1) PathRecord:PacketLifeTime - reasonable estimate of worst case time through path for packet to traverse fabric in 1 direction. 0 if loopback path from port to itself (CA inbound/outbound and/or ACK delay values should cover) LocalAckTimeout - QP/CM - 2*PathRecord:PktLifeTime + local CA Ack Delay QP:AckTimeout - use 2*PathRecord:PktLifeTime + remote CA Ack Delay Remote CM Resp Timeout - CM - CM server REQ response time (should be based on Get(ClassPortInfo) for CM against remote CM) Local CM Resp Timeout - 2*PathRecord:PktLifetime + client REP response time CM MRA Service Timeout - anticipated maximum time before sender of the MRA will send the actual CM response message (REP, RTU, APR or REJ). Recipient of MRA should wait Service Timeout + packet lifetime before timing out. Note this value is subjective in nature and may depend on load on the server, performance of the application, etc. In our stack we heuristically computed a pseudo average (weighted toward longer timeouts) with configurable min/max. We also permitted the application to adjust the min/max for a given CEP. It was important that the MRA sending be issued at a low level since if the application is too busy to respond to the REQ, REP, etc; its probably also too busy to compose an MRA. We found that proper implementation of MRA was critical for high stress CM situations, such as startup of a large MPI run or Oracle's uDAPL based stress tests which made thousands of simultaneous connections. Subnet Timeout = max(Path Record Packet Lifetime) Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
