Re: [openib-general] [PATCH 0/4] SA path record caching

Or Gerlitz Wed, 01 Feb 2006 08:12:57 -0800

Sean Hefty wrote:

The cacheis updated using an SA GET_TABLE request, which is more efficient thansending separate SA GET requests for each path record.Your assumption is correct. The implementation will contain copies ofall path records whose SGID is a local node GID. (Currently it containsonly a single path record per SGID/DGID, but that will be expanded.)

Taking into account an invalidation window of 15 minutes which you havementioned in one of the emails and doing some math i have come into thefollowing:

For 1k node/port fabric the SM/SA need to xmit a table of 1k paths toeach local SA, where you can embed 3 paths in a MAD, which would take atleast 350 MADs (330 RMPP segments + 20 ACKS). Since we have 1k nodesthere are 350K MADs to xmit, and if we assume xmit is uniform over the1k seconds (1000 second = 16 minutes & 40 seconds invalidation window)we require the -----SM to xmit in constant rate of 350k/1k = 350MADs/sec forever-----. And this is RMPP, so depending on the RMPP implit would run into re-transmission of segments or the whole payload. Andeach such table takes 90K (350*256) RAM so the SM needs to allow for upto 90MB of RAM to hold all those tables.

Aren't we creating a monster here??? if this is SA replica which shouldwork for scale from day one, lets call it this way and see how to reachthere.

> I view MPI as one of the primary reasons for having a cache. Waiting> for a

> failed lookup to create the initial cache would delay the startup time
> for apps wanting all-to-all connection establishment. In this case, we
> also get the side effect that the SA receives GET_TABLE requests from
> every node at roughly the same time.

Talking MPI, here are few points that seems to me somehow un addressedin the all-to-all cache design:


+ neither MVAPICH nor OpenMPI are using path query

+ OpenMPI is opening its connections "per demand" that is only if rank Iattempts to send a message to rank J then I connects to J

+ even MPIs that connect all-to-all in an N ranks JOB would do onlyn(n-1)/2 path queries, so the load aggregated load on the SA is halfwhat the all-to-all caching scheme is generating


Or.



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 0/4] SA path record caching

Reply via email to