> From: Sean Hefty [mailto:[EMAIL PROTECTED]
>
> I've been given the task of trying to come up with an
> implementation for an SA
> cache. The intent is to increase the scalability and
> performance of the openib
> stack. My current thoughts on the implementation are below.
> Any feedback is
> welcome.
Sean, This is great. This is a feature which I find near and dear and is very
important to large fabric scalability. If you look in contrib in the infinicon
area, you will see a version of a SA replica which we implemented in the
linux_discovery tree. The version in SVN is a little dated, but has the major
features and capabilities. If you find it useful I could provide a more
updated version of that component for your reference.
Some features of it (which you should consider or possibly use as reference
code):
- It maintains a full replica of:
- All Node Records
- Path Records relevant to this Node (where this node is Source)
- Device Management Agent records for IOUs, IOCs and Service Records
- even for a large cluster, the footprint of the above will be < 1MB
- It is implemented in kernel mode
- while user mode may help during initial debug, it will be important
for
kernel mode ULPs such as SRP, IPoIB and SDP to also make use of
these records
- It is infact a replica, not a cache. It maintains an up to date replica using
the following techniques
- registers for SA GID in/out of service notices
- such notices when received trigger a query of information
about that node only
- schedules a periodic full SA query
- if notices are successfully registered for, the query is at a
slow pace (once every 10 minutes is default, but its configureable)
- if notices are not successfully registered for, the query is
at a faster pace (once a minute, but its configurable)
- since notices are unreliable, the periodic sweep is needed to
cover for lost notices, however the SA should resend notices which are not
responded to
- In addition for CAs it performs IOU, IOC and Service record queries and
replicates them
- this allows for very fast access to IOU/IOC/Service record info by
drivers like SRP
- hence allowing for faster reconnection and failure recovery handling
- It can handle SA outages and still respond to queries while the SA is down,
the SA is slow, or while the synchronization process is being performed (eg. it
does all its queries to a temporary replica then updates the main replica,
hence if the queries fail or take a long time, the main replica is still
available and reasonably accurate).
- I like the idea of using the same API for SA queries and allowing an SA mux
to choose to query the replica or the actual SA. Hence if later versions
choose to extend what is maintained in the replica, it would be transparent to
applications
- The API could allow for a flag to force a query against the replica
or against the actual SA, with the default being to allow the "SA mux" to
select which to use
>
> To keep the design as flexible as possible, my plan is to
> implement the cache in
> userspace. The interface to the cache would be via MADs.
> Clients would send
> their queries to the sa_cache instead of the SA itself. The
> format of the MADs
> would be essentially identical to those used to query the SA
> itself. Response
> MADs would contain any requested information. If the cache
> could not satisfy a
> request, the sa_cache would query the SA, update its cache,
> then return a reply.
- in our stack we had a separate more advanced SA query API (refered to the
Subnet Driver API). This has evolved significantly since the old Intel
IbAccess days, but still has similarities. It handled all the details of the
query including retries (as specified by the caller), timeouts and even
multi-level queries (get path records based on Node Guids, etc). It also
handled the RMPP aspects and hid the intermediate RMPP headers and control
protocol. You may want to consider defining and using such an API instead of
MADs, least the user of the SA replica need to also implement RMPP itself.
Given such an API the implementation could choose to query the actual SA or the
replica and hide the RMPP details in the SA query case.
Todd Rimmer
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general