So, if Lustre creates only one endpoint (QP) to another node and fires a high rate of concurrent messages (high thread count) over that endpoint, will libfabrics/kFabrics intelligently use CPU cores, IRQ balancing, NUMA, etc? Or will it be the responsibility of the application writers to find a way to manipulate the use of endpoints to get the best performance?
Doug > On Feb 12, 2016, at 8:52 AM, Smith, Stan <[email protected]> wrote: > > Hi Doug, > I may have misled you in believing that clients of libfabric and/or Kfabric > are responsible for transport locking issues, they are 'not'. > > Libfabric/kFabric providers 'are' responsible for access serialization to > hardware. > > s. > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Oucharek, Doug S > Sent: Wednesday, February 10, 2016 3:37 PM > To: Paul Grun <[email protected]> > Cc: [email protected] > Subject: [ofiwg] DS/DA Runtime Model Discussion > > This email is a followup to my comment in a previous DS/DA call about the > runtime model being an important part of the DS/DA definition. > > MPI seems to be the dominate user of fabrics in HPC. As such, they have a > huge impact on the design of the runtime model being followed by fabric > developers and corresponding middleware (what I consider OFED/verbs, > libfabrics, and DS/DA). Currently, they seems to be pushing for bare metal > access from the providers leaving the work of serialization/locking to the > middleware or the applications themselves. > > If DS/DA follows libfabrics in its development, I am concerned that the bare > metal mindset will dominate here as well and that will leave “application > anarchy” with regards to how serialization/locking is being done. Mitigating > the strategy of fabric users is something I would expect from the providers > (the one common access point regardless of middleware). The MPI push was to > get this common point to back off and leave serialization/locking to the > upper layers but we now do not have a common point to coordinate competing > access to the fabric. > > Should it not be a part of the middleware (libfabrics and DS/DA) to at the > very least, put demands upon the providers so a common strategy for > serialization/locking can be enforced for a specific fabric so the apps, like > Lustre, don’t have to make significant code changes to get reasonable > performance out of the fabric? If we have to make significant changes for > each new fabric released, the value of the middleware (be it OFED, > libfabrics, or DS/DA) is severely diminished and we might as well just access > the fabric drivers directly. > > Discussion? > > Doug > _______________________________________________ > ofiwg mailing list > [email protected] > http://lists.openfabrics.org/mailman/listinfo/ofiwg _______________________________________________ ofiwg mailing list [email protected] http://lists.openfabrics.org/mailman/listinfo/ofiwg
