This email is a followup to my comment in a previous DS/DA call about the 
runtime model being an important part of the DS/DA definition.

MPI seems to be the dominate user of fabrics in HPC.  As such, they have a huge 
impact on the design of the runtime model being followed by fabric developers 
and corresponding middleware (what I consider OFED/verbs, libfabrics, and 
DS/DA).  Currently, they seems to be pushing for bare metal access from the 
providers leaving the work of serialization/locking to the middleware or the 
applications themselves.

If DS/DA follows libfabrics in its development, I am concerned that the bare 
metal mindset will dominate here as well and that will leave “application 
anarchy” with regards to how serialization/locking is being done.  Mitigating 
the strategy of fabric users is something I would expect from the providers 
(the one common access point regardless of middleware).  The MPI push was to 
get this common point to back off and leave serialization/locking to the upper 
layers but we now do not have a common point to coordinate competing access to 
the fabric.

Should it not be a part of the middleware (libfabrics and DS/DA) to at the very 
least, put demands upon the providers so a common strategy for 
serialization/locking can be enforced for a specific fabric so the apps, like 
Lustre, don’t have to make significant code changes to get reasonable 
performance out of the fabric?  If we have to make significant changes for each 
new fabric released, the value of the middleware (be it OFED, libfabrics, or 
DS/DA) is severely diminished and we might as well just access the fabric 
drivers directly.

Discussion?  

Doug
_______________________________________________
ofiwg mailing list
[email protected]
http://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to