Re: [openib-general] SA cache design

Eitan Zahavi Fri, 06 Jan 2006 12:15:02 -0800

Hi Todd,

So you agree we will need to design "replica" buildup scalability features into 
the solution ( to avoid the bring-up load on the SA) ?


Why would a caching system not work here? Instead of replicating the data.

The caching concept allows for the SA to still be in the loop by invalidating 
the cache or through cache entries lifetime policy.

The reason I think a total replica (distribution of the SA) would eventually be 
problematic is that as we approach QoS solutions,
some need for path record use and retirement is going to show up. What if the 
SM decides to change SL2VL maps due to new QoS requirement.
We will need a more complicated "synchronization" or invalidation technique to push that 
kind of data into the "replica" SAs.

Eitan

Rimmer, Todd wrote:

From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
Hi Sean, Todd,
Although I like the "replica" idea for its "query"performance boost - I suspect it will actually do not scalefor very largenetworks: Each node has to query for the entire databasewould cause N^2 load on the SA.After any change (which do happen with higher probability onlarge networks) the SA will need to send each Report to N targets.
We already have some bad experience with large clusters SAquery issues, like the one reported by Roland
"searching for SRP targets using PortInfo capability mask".


Our experience has been the exact opposite.
While there is an initial load on the SA to populate the replica (which we have 
used various techniques to reduce such as backing off when the SA reports Busy, 
having a random time offset of start of query, etc).  The boost occurs when a 
new application starts, such as an MPI using the SA/CM to establish connections 
as per the IBTA spec.  A 1000 process MPI job would have each process make 999 
queries to the SA at job startup time.  This causes a burst of 999,0000 sets of 
SA queries (most will involve both Node Record and Path record queries so it 
will really be 2x this amount), BEFORE the MPI job can actually start.

As Open IB moves forward to implement QOS and other features, MPI will have to 
use the SA to get its path records.  If you study MVAPICH at present, it merely 
exchanges LIDs between nodes and hardcodes (or via enviornment variables uses 
the same value for all processes) all the other QOS parameters.  In a true QOS 
and congestion management environment it will instead have to use the CM/SA.

We have been using this replica technique quite successfully for 2-3 years now. 
 Our MPI has used the SA/CM for connection establishment for just as long.

As it was pointed out, most fabrics will be quite stable.  Hence having a 
replica and paying the cost of the SA queries once will be much more efficient 
than paying that cost on every application startup.

Todd Rimmer


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SA cache design

Reply via email to