Hi All,

To clarify things, we have had similar goals and have been working on
improving the PMI-2 plugin for some time. We evaluated several designs and
strategies:

1. Designs and detailed performance evaluations (up to 16K cores) for
on-demand PMI gets (similar to instant startup if I understand it
correctly) and sparse/dense hints for PMI operations (somewhat similar to
scoping of key-value pairs), and the ring based approach were presented in
[1].

2. Designs and performance evaluations (up to 16K cores) for non-blocking
PMI collectives like Ifence and Iallgather have shown good scalability for
different programming models such as MPI and OpenSHMEM. These were
presented in [2] and [3] respectively.

Some of these designs are already available through the "mpirun_rsh" job
launcher distributed as part of the MVAPICH2 software stack since MVAPICH2
2.1rc2 released on 03/12/2015.

We plan to make these designs available for Slurm users as well through
future MVAPICH2 releases.

One important difference is that our changes are on top of the existing
PMI2 plugin instead of a completely new PMIx plugin. There are several
differences in the API as well (use of request objects vs callbacks for
non-blocking operations). I don't think it would be conflicting to have
both of them in Slurm.

[1] "PMI Extensions for Scalable MPI Startup", S. Chakraborty, H.
Subramoni, J. Perkins, A. Moody, M. Arnold, D. K. Panda, EuroMPI/ASIA '14. (
http://dl.acm.org/citation.cfm?id=2642780)

[2] "Non-Blocking PMI Extensions for Fast MPI Startup", S. Chakraborty, H.
Subramoni, A. Moody, A. Venkatesh, J. Perkins, D. K. Panda, CCGrid '15 (
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7152479&tag=1)

[3] "On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI". S.
Chakraborty, H. Subramoni, J. Perkins, A. Awan, D. K. Panda, HIPS '15 (
http://hpc.pnl.gov/conf/hips-lspp15/talks/chakraborty.pdf)

Thanks,
Sourav


On Fri, Sep 25, 2015 at 10:59 AM, Ralph Castain <[email protected]> wrote:

> Hi Andy
>
> Let me see if I can clarify this for you and the others on this mailing
> list. The Ohio State group has been focused on improving the wireup
> algorithm in PMI-2, which focuses on the allgather operation. Hence their
> “ring” implementation.
>
> PMIx has been focused on two goals:
>
> 1. high-speed “instant on” application startup that basically allows
> parallel programming models (MPI, OSHMEM, etc) to start/wireup as fast as
> the RM can start them
>
> 2. extending PMI to provide an enhanced application-RM partnership to
> support emerging programming paradigms where the application works in
> concert with the RM to steer execution. This takes many forms, including
> malleable workflow execution, file prepositioning, flexible allocations,
> and fault notifications.
>
> We are accomplishing #1 by completely eliminating the wireup data exchange
> operation, and therefore the allgather disappears. Hence my comment that
> this question is moot. With the adoption of PMIx into essentially all RMs
> in use out there, we provide the infrastructure by which this is
> accomplished in a standard enough way that programming model
> implementations can reasonably count on it being there. We still have to
> maintain “backward compatibility” for those instances where we don’t have
> access to it, but we can treat those as lower-performance exception code
> paths.
>
> OpenMPI will soon release the first use of that integration to achieve
> “instant on” behaviors. I can’t speak for the other MPI implementations,
> but history indicates that the community picks things up from each other
> rather quickly. So I expect that a year or so from now, the question of
> having a really good allgather will largely be moot. I’m not saying it will
> never be used, because we do have times when a “barrier” operation can be
> useful, and so PMIx will be adopting appropriate algorithms for such
> occasions (and OSU’s ring will surely be one of them). I’m just saying it
> won’t be in the critical startup path.
>
> You can see more about the direction of PMIx, and how it is expected to be
> used in a broader vision of RM in general, in the following two
> presentations:
>
> http://www.slideshare.net/rcastain/exascale-process-management-interface
>
> http://www.slideshare.net/rcastain/hpc-controls-future
>
> PMIx will be hosting a BoF at SC’15 in Austin, TX on Thurs, Nov 19th, at
> 12:15-1:15pm to discuss the roadmap for this effort. Anyone interested in
> participating in the discussion, and contributing to the project(!), is
> welcome
>
> HTH
> Ralph
>
>
> On Sep 25, 2015, at 6:22 AM, Andy Riebs <[email protected]> wrote:
>
> Synthesizing what I think I've learned over the past 24 hours,
>
>    - The PMIx implementation described at
>    <http://slurm.schedmd.com/SLUG15/PMIx.pdf>
>    <http://slurm.schedmd.com/SLUG15/PMIx.pdf> describes a complete,
>    upward-compatible [I hope!] highly-scalable (exascale) replacement for the
>    PMI-1 and PMI-2 job launch facilities
>    - The "PMIX" paper at
>    <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf>
>    <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> describes
>    creative ways to use and extend the existing PMI-2 interface to exascale
>    levels which will [I hope!] be fully compatible with the PMIx 
> implementation
>
> Clarifications and corrections gratefully accepted!
>
> Andy
>
> On 09/24/2015 06:49 PM, Andy Riebs wrote:
>
>
> Ralph, Artem, and Sourav, thanks for the explanation!
>
> Andy
>
> On 09/24/2015 04:52 PM, Ralph Castain wrote:
>
> Hi Andy.
>
> I honestly have no idea why those guys did that :-). We’ve known about
> their efforts for awhile, but this is the first I’ve seen them labeled as
> PMIX. Guess we do differ on the capitalization of the last letter.
>
> Anyway, the PMIx effort is already being integrated in several popular
> RMs, including Slurm, so I imagine it’s a moot point accept for possibly
> confusing people searching publications.
>
> Ralph
>
> On Sep 24, 2015, at 1:42 PM, Andy Riebs <[email protected]>
> <[email protected]> wrote:
>
>
>  From the presentations at the Slurm Users' Group meeting,
> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf>
> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> and
> <http://slurm.schedmd.com/SLUG15/PMIx.pdf>
> <http://slurm.schedmd.com/SLUG15/PMIx.pdf>, it appears that two
> implementations of an improved PMI interface, both tagged "PMIX", are
> underway. Are these cooperating, competing, or "only just realized that
> they could be cooperating" activities?
>
> Andy
>
> --
> Andy Riebs
> New email address! [email protected]
> Hewlett-Packard Company
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HP
>
>
>
>

Reply via email to