Hi Andy Let me see if I can clarify this for you and the others on this mailing list. The Ohio State group has been focused on improving the wireup algorithm in PMI-2, which focuses on the allgather operation. Hence their “ring” implementation.
PMIx has been focused on two goals: 1. high-speed “instant on” application startup that basically allows parallel programming models (MPI, OSHMEM, etc) to start/wireup as fast as the RM can start them 2. extending PMI to provide an enhanced application-RM partnership to support emerging programming paradigms where the application works in concert with the RM to steer execution. This takes many forms, including malleable workflow execution, file prepositioning, flexible allocations, and fault notifications. We are accomplishing #1 by completely eliminating the wireup data exchange operation, and therefore the allgather disappears. Hence my comment that this question is moot. With the adoption of PMIx into essentially all RMs in use out there, we provide the infrastructure by which this is accomplished in a standard enough way that programming model implementations can reasonably count on it being there. We still have to maintain “backward compatibility” for those instances where we don’t have access to it, but we can treat those as lower-performance exception code paths. OpenMPI will soon release the first use of that integration to achieve “instant on” behaviors. I can’t speak for the other MPI implementations, but history indicates that the community picks things up from each other rather quickly. So I expect that a year or so from now, the question of having a really good allgather will largely be moot. I’m not saying it will never be used, because we do have times when a “barrier” operation can be useful, and so PMIx will be adopting appropriate algorithms for such occasions (and OSU’s ring will surely be one of them). I’m just saying it won’t be in the critical startup path. You can see more about the direction of PMIx, and how it is expected to be used in a broader vision of RM in general, in the following two presentations: http://www.slideshare.net/rcastain/exascale-process-management-interface <http://www.slideshare.net/rcastain/exascale-process-management-interface> http://www.slideshare.net/rcastain/hpc-controls-future <http://www.slideshare.net/rcastain/hpc-controls-future> PMIx will be hosting a BoF at SC’15 in Austin, TX on Thurs, Nov 19th, at 12:15-1:15pm to discuss the roadmap for this effort. Anyone interested in participating in the discussion, and contributing to the project(!), is welcome HTH Ralph > On Sep 25, 2015, at 6:22 AM, Andy Riebs <[email protected]> wrote: > > Synthesizing what I think I've learned over the past 24 hours, > The PMIx implementation described at > <http://slurm.schedmd.com/SLUG15/PMIx.pdf> > <http://slurm.schedmd.com/SLUG15/PMIx.pdf> describes a complete, > upward-compatible [I hope!] highly-scalable (exascale) replacement for the > PMI-1 and PMI-2 job launch facilities > The "PMIX" paper at <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> > <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> describes creative ways > to use and extend the existing PMI-2 interface to exascale levels which will > [I hope!] be fully compatible with the PMIx implementation > Clarifications and corrections gratefully accepted! > Andy > > On 09/24/2015 06:49 PM, Andy Riebs wrote: >> >> Ralph, Artem, and Sourav, thanks for the explanation! >> >> Andy >> >> On 09/24/2015 04:52 PM, Ralph Castain wrote: >>> Hi Andy. >>> >>> I honestly have no idea why those guys did that :-). We’ve known about >>> their efforts for awhile, but this is the first I’ve seen them labeled as >>> PMIX. Guess we do differ on the capitalization of the last letter. >>> >>> Anyway, the PMIx effort is already being integrated in several popular RMs, >>> including Slurm, so I imagine it’s a moot point accept for possibly >>> confusing people searching publications. >>> >>> Ralph >>> >>>> On Sep 24, 2015, at 1:42 PM, Andy Riebs <[email protected]> >>>> <mailto:[email protected]> wrote: >>>> >>>> >>>> From the presentations at the Slurm Users' Group meeting, >>>> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> >>>> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> and >>>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf> >>>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf>, it appears that two >>>> implementations of an improved PMI interface, both tagged "PMIX", are >>>> underway. Are these cooperating, competing, or "only just realized that >>>> they could be cooperating" activities? >>>> >>>> Andy >>>> >>>> -- >>>> Andy Riebs >>>> New email address! [email protected] <mailto:[email protected]> >>>> Hewlett-Packard Company >>>> High Performance Computing Software Engineering >>>> +1 404 648 9024 >>>> My opinions are not necessarily those of HP >
