> From: Michael S. Tsirkin > Sent: Thursday, November 02, 2006 6:15 PM > To: Hal Rosenstock > Cc: Or Gerlitz; openib-general; Arlin R Davis > Subject: Re: [openib-general] scaling issues, was: uDAPL cma: add support > for address and route retries, call disconnect when recving dreq > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: scaling issues, was: uDAPL cma: add support for address and > route retries, call disconnect when recving dreq > > > > On Thu, 2006-11-02 at 17:54, Michael S. Tsirkin wrote: > > > Quoting r. Arlin Davis <[EMAIL PROTECTED]>: > > > > Subject: Re: [openib-general] scaling issues, was: uDAPL cma: add > support for address and route retries, call disconnect when recving dreq > > > > > > > > Sean Hefty wrote: > > > > > > > > >One option is having the SA (or ib_umad?) return a busy status in > response to a > > > > >MAD, but we'd still have to be able to send this response as > quickly as requests > > > > >are being received. We could then limit the number of requests > that would be > > > > >queued in the kernel for a user. > > > > > > > > > > > > > > > > > > Another great option would be to have path record caching. > Unfortunately > > > > OFED 1.1 did not include ib_local_sa in the release. > > > > > > > > > > This won't help you much. > > > With 256 nodes all to all already gives you 65000 requests > > > which is the same order of magnitude as the reported 130000. > > > > The requests might occur at a different time so they could be spread out > > rather than synchronized. > > I don't see how caching does this. > If all the queries are made at app startup, there will be one huge batch of queries to the SA, especially for a many process MPI job.
In contrast if SA caching is building its own replica of the relevant subset of the SA, the pace can be more controlled. It can even be purposely randomized by the SA cache code itself (eg. don't just do it every 10 minutes, do it every 10 minutes +/- a random number, etc). This way if all nodes powered on at similar time you won't have a pattern of everyone asking SM at the same time. Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
