> From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Monday, October 16, 2006 5:33 PM > To: Rimmer, Todd; Matt Leininger > Cc: openib > Subject: Re: [openib-general] [RFC] Notice/InformInfo event reporting > > Rimmer, Todd wrote: > > My recommendation is option 2. > > Thanks for the response. > > > In large fabrics the SA can be a bottleneck. It is best for an end node > > to register with the SA only for the events which are of actual interest > > to the end node. > > Which part of the SA is the bottleneck? Is it the sending of MADs, or the > processing of events to determine which end nodes are interested in the > event? Both can be a bottleneck in a big fabric. Since the SA needs to always determine which end nodes are registered for a given event, the fewer are registered the better. Even if all the hosts are registered, there will be other nodes (switches, TCAs, etc) which are not registered, so the SA will need to always check its list of who to send to. Since the notice is not a broadcast, it will need to send a separate packet to each end node.
Each notice will then get a response from each end node which will need to be correlated to the outstanding notices so the SA can determine which notices need to be resent vs those which where acknowledged. If you consider a large fabric (say 2000+ nodes) and all the events which the SA can generate (at least 4: Gid in/out multicast in/out of service) that can be a big bursty load on the SA. Factor in the nodes responding to those requests (for example GID in service may trigger path record queries), and even more work occurs on the SA. Most HCAs don't optimize the GSI datapath, so data packet rates for SA packets is less than might be observed on UD or RC QPs. > > My thinking was that if events are rare, then having the SA simply forward > the > events to the end nodes saves processing time on the SA. So, we can trade > off > SA processing by sending more MADs. I'm not sure which is worse. In a functioning fabric, events will be rare. However its when you first boot the fabric, reboot the SM or other similar "start up" actions that things get real busy. > > > With regards to "duplicating dispatching code on every node", rather > > than duplication, think of this as "distributing event dispatching code > > among the interested nodes". Thinking of it in these terms makes option > > 2 stand out as more scalable. > > To provide the highest level of filtering at the SA, we need an interface > based > on Informinfo. Trying to reference count at that level would be > difficult. > (E.g. client 1 wants events for LIDs 2-25, client 2 LIDs 3-4, client 3 > LIDs > 2-25, client 4 LIDS 15-30, etc.) I'm not sure we need an interface this > complex. It increases the processing requirements needed of the SA, and > may > increase the number of MADs that it needs to send to a given node. > (Unless we > start trying to be really clever with the registration.) > > I was thinking of letting clients register for a particular "class" of > event, > then dispatching the events among the registered clients. But I'm still > uncertain about how to define event classes. > > Some expected usage models would be helpful. In my experience, few clients will filter by LID. For example a client interested in GID in service, would want to know about all LIDs. A client such as IPoIB would be interested in all multicast groups. So perhaps the registration with the SA should be for "all lids" and let the client filter by LID as needed. So my interpretation of option 2 is the end node registers once with the SA for "all lids" for the events which clients are interested in. Then the end node can filter appropriately (filtering at the client may be best). In general I have found that only a few clients will use events such as: IPoIb to manage multicast subscriptions (join as send only for new groups) and SA caches/replicas to keep their cache/replica synchronized. In the silverstorm stack we created an API for a client to subscribe to a notice. It allowed the client to specify: trap number, local HCA port subscription was applicable to (in case multi-port HCAs on different fabrics) and information for a callback to the client (client context void*, function). The callback provided the client context void*, the actual NOTICE from the SA and which HCA port it arrived on. The API in the stack dealt with all the issues of remaining subscribed (SA reregistraton, port disconnected/reconnected, etc) so the client merely subscribed, got notice callbacks and later unsubscribed. In this style API any LID based filtering would be done in the client itself. Todd Rimmer _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
