> From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
> Hi Sean, Todd,
> 
> Although I like the "replica" idea for its "query" 
> performance boost - I suspect it will actually do not scale 
> for very large
> networks: Each node has to query for the entire database 
> would cause N^2 load on the SA.
> After any change (which do happen with higher probability on 
> large networks) the SA will need to send each Report to N targets.
> 
> We already have some bad experience with large clusters SA 
> query issues, like the one reported by Roland
> "searching for SRP targets using PortInfo capability mask".
> 
Our experience has been the exact opposite.
While there is an initial load on the SA to populate the replica (which we have 
used various techniques to reduce such as backing off when the SA reports Busy, 
having a random time offset of start of query, etc).  The boost occurs when a 
new application starts, such as an MPI using the SA/CM to establish connections 
as per the IBTA spec.  A 1000 process MPI job would have each process make 999 
queries to the SA at job startup time.  This causes a burst of 999,0000 sets of 
SA queries (most will involve both Node Record and Path record queries so it 
will really be 2x this amount), BEFORE the MPI job can actually start.

As Open IB moves forward to implement QOS and other features, MPI will have to 
use the SA to get its path records.  If you study MVAPICH at present, it merely 
exchanges LIDs between nodes and hardcodes (or via enviornment variables uses 
the same value for all processes) all the other QOS parameters.  In a true QOS 
and congestion management environment it will instead have to use the CM/SA.

We have been using this replica technique quite successfully for 2-3 years now. 
 Our MPI has used the SA/CM for connection establishment for just as long.

As it was pointed out, most fabrics will be quite stable.  Hence having a 
replica and paying the cost of the SA queries once will be much more efficient 
than paying that cost on every application startup.

Todd Rimmer
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to