> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 04, 2006 5:18 PM > To: Rimmer, Todd > Cc: Sean Hefty; Ishai Rabinovitz; [email protected] > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Rimmer, Todd <[EMAIL PROTECTED]>: > > I recommend sticking with the IB spec for the various timeouts. > > So what do you suggest, wait a day or so to timeout the MRA? > > -- > MST Fix the broken endpoint and document the potential issue. As a potential workaround, permit a configuration option in OFED which sets an upper bound on CM related timeouts such that broken endpoints can be worked around. However by default this timeout limit should be very high (many seconds maybe even a minute).
Otherwise keep all the computations as in the spec. Parts of this thread were starting to propose alternate algorithms for the computations which is what I was very concerned about because this gets into the realm of rewriting the spec and will cause all kinds of subtle issues, including interop issues with existing devices (such as native IB storage, existing virtual IO controllers, etc). I also posted the summary because some of the computations are subtle. It took a while to uncover these details from the spec and implement them properly in our stack. So I thought the group could benefit from the research, for example, most of stacks I have reviewed ignore the CA local Ack delay components in the equations. Many do not properly implement timewait, many assume the CM REQ contains Packet Lifetime - it does not but it can be computed from CM REQ information, etc. All the equations can be made to work, but it does require some attention to detail. Another key point is that to be effective MRA really needs to be issued by the stack itself, not the ULP. MRA needs to cover both ULPs that know they have a lot of work before they can respond (for example if a ULP know's it must do significant IO to a device before it can respond to a REQ, etc). However the more typical case is a ULP is simply bombarded with 20,000 REQs at once the 1st time the fabric boots or when a large job is started. In this case the ULP backlog will cause many of the REQs to timeout. However an MRA generated by the stack prior to the REQ being delivered to the ULP can help this situation. Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
