RE: [openib-general] SA cache design

Rimmer, Todd Thu, 05 Jan 2006 13:41:11 -0800

> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> 
> I've been given the task of trying to come up with an 
> implementation for an SA 
> cache.  The intent is to increase the scalability and 
> performance of the openib 
> stack.  My current thoughts on the implementation are below.  
> Any feedback is 
> welcome.


Sean, This is great.  This is a feature which I find near and dear and is very 
important to large fabric scalability.  If you look in contrib in the infinicon 
area, you will see a version of a SA replica which we implemented in the 
linux_discovery tree.  The version in SVN is a little dated, but has the major 
features and capabilities.  If you find it useful I could provide a more 
updated version of that component for your reference.

Some features of it (which you should consider or possibly use as reference 
code):
- It maintains a full replica of:
        - All Node Records
        - Path Records relevant to this Node (where this node is Source)
        - Device Management Agent records for IOUs, IOCs and Service Records
        - even for a large cluster, the footprint of the above will be < 1MB

- It is implemented in kernel mode
        - while user mode may help during initial debug, it will be important 
for
                kernel mode ULPs such as SRP, IPoIB and SDP to also make use of 
these records

- It is infact a replica, not a cache.  It maintains an up to date replica using
        the following techniques
        - registers for SA GID in/out of service notices
                - such notices when received trigger a query of information 
about that node only
        - schedules a periodic full SA query
                - if notices are successfully registered for, the query is at a 
slow pace (once every 10 minutes is default, but its configureable)
                - if notices are not successfully registered for, the query is 
at a faster pace (once a minute, but its configurable)
                - since notices are unreliable, the periodic sweep is needed to 
cover for lost notices, however the SA should resend notices which are not 
responded to

- In addition for CAs it performs IOU, IOC and Service record queries and 
replicates them
        - this allows for very fast access to IOU/IOC/Service record info by 
drivers like SRP
        - hence allowing for faster reconnection and failure recovery handling

- It can handle SA outages and still respond to queries while the SA is down, 
the SA is slow, or while the synchronization process is being performed (eg. it 
does all its queries to a temporary replica then updates the main replica, 
hence if the queries fail or take a long time, the main replica is still 
available and reasonably accurate).

- I like the idea of using the same API for SA queries and allowing an SA mux 
to choose to query the replica or the actual SA.  Hence if later versions 
choose to extend what is maintained in the replica, it would be transparent to 
applications
        - The API could allow for a flag to force a query against the replica 
or against the actual SA, with the default being to allow the "SA mux" to 
select which to use


> 
> To keep the design as flexible as possible, my plan is to 
> implement the cache in 
> userspace.  The interface to the cache would be via MADs.  
> Clients would send 
> their queries to the sa_cache instead of the SA itself.  The 
> format of the MADs 
> would be essentially identical to those used to query the SA 
> itself.  Response 
> MADs would contain any requested information.  If the cache 
> could not satisfy a 
> request, the sa_cache would query the SA, update its cache, 
> then return a reply.

- in our stack we had a separate more advanced SA query API (refered to the 
Subnet Driver API).  This has evolved significantly since the old Intel 
IbAccess days, but still has similarities.  It handled all the details of the 
query including retries (as specified by the caller), timeouts and even 
multi-level queries (get path records based on Node Guids, etc).  It also 
handled the RMPP aspects and hid the intermediate RMPP headers and control 
protocol.  You may want to consider defining and using such an API instead of 
MADs, least the user of the SA replica need to also implement RMPP itself.  
Given such an API the implementation could choose to query the actual SA or the 
replica and hide the RMPP details in the SA query case.

Todd Rimmer
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] SA cache design

Reply via email to