Sean Hefty wrote:
The cache is updated using an SA GET_TABLE request, which is more efficient than sending separate SA GET requests for each path record. Your assumption is correct. The implementation will contain copies of all path records whose SGID is a local node GID. (Currently it contains only a single path record per SGID/DGID, but that will be expanded.)

Taking into account an invalidation window of 15 minutes which you have mentioned in one of the emails and doing some math i have come into the following:

For 1k node/port fabric the SM/SA need to xmit a table of 1k paths to each local SA, where you can embed 3 paths in a MAD, which would take at least 350 MADs (330 RMPP segments + 20 ACKS). Since we have 1k nodes there are 350K MADs to xmit, and if we assume xmit is uniform over the 1k seconds (1000 second = 16 minutes & 40 seconds invalidation window) we require the -----SM to xmit in constant rate of 350k/1k = 350 MADs/sec forever-----. And this is RMPP, so depending on the RMPP impl it would run into re-transmission of segments or the whole payload. And each such table takes 90K (350*256) RAM so the SM needs to allow for up to 90MB of RAM to hold all those tables.

Aren't we creating a monster here??? if this is SA replica which should work for scale from day one, lets call it this way and see how to reach there.

> I view MPI as one of the primary reasons for having a cache. Waiting > for a
> failed lookup to create the initial cache would delay the startup time
> for apps wanting all-to-all connection establishment. In this case, we
> also get the side effect that the SA receives GET_TABLE requests from
> every node at roughly the same time.

Talking MPI, here are few points that seems to me somehow un addressed in the all-to-all cache design:

+ neither MVAPICH nor OpenMPI are using path query

+ OpenMPI is opening its connections "per demand" that is only if rank I attempts to send a message to rank J then I connects to J

+ even MPIs that connect all-to-all in an N ranks JOB would do only n(n-1)/2 path queries, so the load aggregated load on the SA is half what the all-to-all caching scheme is generating

Or.



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to