[slurm-dev] Re: slurm-dev summary, was Re: What follows PMI-2?

Ralph Castain Fri, 25 Sep 2015 13:36:51 -0700

I don’t think it will be conflicting, though I’ll leave the naming issue to Moe 
et al - we really can’t change ours as they are too far into being integrated 
elsewhere as well.


We also are using PMIx internal to OpenMPI, as I said, so that will be 
available with the new release. Our intent with the RMs is to eliminate the 
need to launch via mpirun in order to get the benefit, and to standardize the 
extended interfaces (which I realize your project isn’t pursuing).

PMIx uses non-blocking collectives as well, but again, we are focused on 
completely eliminating those operations. I suspect that is where we diverge a 
bit, plus in the fact that PMIx is a standalone library so the implementations 
will be common across the RMs.

So roughly similar goals when it comes to scalability, different approaches - 
nothing new there. Nor are our two groups the only ones looking at these issues.

Thanks
Ralph


> On Sep 25, 2015, at 1:09 PM, Sourav Chakraborty 
> <[email protected]> wrote:
> 
> Hi All,
> 
> To clarify things, we have had similar goals and have been working on 
> improving the PMI-2 plugin for some time. We evaluated several designs and 
> strategies:
> 
> 1. Designs and detailed performance evaluations (up to 16K cores) for 
> on-demand PMI gets (similar to instant startup if I understand it correctly) 
> and sparse/dense hints for PMI operations (somewhat similar to scoping of 
> key-value pairs), and the ring based approach were presented in [1].
> 
> 2. Designs and performance evaluations (up to 16K cores) for non-blocking PMI 
> collectives like Ifence and Iallgather have shown good scalability for 
> different programming models such as MPI and OpenSHMEM. These were presented 
> in [2] and [3] respectively.
> 
> Some of these designs are already available through the "mpirun_rsh" job 
> launcher distributed as part of the MVAPICH2 software stack since MVAPICH2 
> 2.1rc2 released on 03/12/2015.
> 
> We plan to make these designs available for Slurm users as well through 
> future MVAPICH2 releases.
> 
> One important difference is that our changes are on top of the existing PMI2 
> plugin instead of a completely new PMIx plugin. There are several differences 
> in the API as well (use of request objects vs callbacks for non-blocking 
> operations). I don't think it would be conflicting to have both of them in 
> Slurm.
> 
> [1] "PMI Extensions for Scalable MPI Startup", S. Chakraborty, H. Subramoni, 
> J. Perkins, A. Moody, M. Arnold, D. K. Panda, EuroMPI/ASIA '14. 
> (http://dl.acm.org/citation.cfm?id=2642780 
> <http://dl.acm.org/citation.cfm?id=2642780>)
> 
> [2] "Non-Blocking PMI Extensions for Fast MPI Startup", S. Chakraborty, H. 
> Subramoni, A. Moody, A. Venkatesh, J. Perkins, D. K. Panda, CCGrid '15 
> (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7152479&tag=1 
> <http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7152479&tag=1>)
> 
> [3] "On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI". S. 
> Chakraborty, H. Subramoni, J. Perkins, A. Awan, D. K. Panda, HIPS '15 
> (http://hpc.pnl.gov/conf/hips-lspp15/talks/chakraborty.pdf 
> <http://hpc.pnl.gov/conf/hips-lspp15/talks/chakraborty.pdf>)
> 
> Thanks,
> Sourav
> 
> 
> On Fri, Sep 25, 2015 at 10:59 AM, Ralph Castain <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Andy
> 
> Let me see if I can clarify this for you and the others on this mailing list. 
> The Ohio State group has been focused on improving the wireup algorithm in 
> PMI-2, which focuses on the allgather operation. Hence their “ring” 
> implementation.
> 
> PMIx has been focused on two goals:
> 
> 1. high-speed “instant on” application startup that basically allows parallel 
> programming models (MPI, OSHMEM, etc) to start/wireup as fast as the RM can 
> start them
> 
> 2. extending PMI to provide an enhanced application-RM partnership to support 
> emerging programming paradigms where the application works in concert with 
> the RM to steer execution. This takes many forms, including malleable 
> workflow execution, file prepositioning, flexible allocations, and fault 
> notifications.
> 
> We are accomplishing #1 by completely eliminating the wireup data exchange 
> operation, and therefore the allgather disappears. Hence my comment that this 
> question is moot. With the adoption of PMIx into essentially all RMs in use 
> out there, we provide the infrastructure by which this is accomplished in a 
> standard enough way that programming model implementations can reasonably 
> count on it being there. We still have to maintain “backward compatibility” 
> for those instances where we don’t have access to it, but we can treat those 
> as lower-performance exception code paths.
> 
> OpenMPI will soon release the first use of that integration to achieve 
> “instant on” behaviors. I can’t speak for the other MPI implementations, but 
> history indicates that the community picks things up from each other rather 
> quickly. So I expect that a year or so from now, the question of having a 
> really good allgather will largely be moot. I’m not saying it will never be 
> used, because we do have times when a “barrier” operation can be useful, and 
> so PMIx will be adopting appropriate algorithms for such occasions (and OSU’s 
> ring will surely be one of them). I’m just saying it won’t be in the critical 
> startup path.
> 
> You can see more about the direction of PMIx, and how it is expected to be 
> used in a broader vision of RM in general, in the following two presentations:
> 
> http://www.slideshare.net/rcastain/exascale-process-management-interface 
> <http://www.slideshare.net/rcastain/exascale-process-management-interface>
> 
> http://www.slideshare.net/rcastain/hpc-controls-future 
> <http://www.slideshare.net/rcastain/hpc-controls-future>
> 
> PMIx will be hosting a BoF at SC’15 in Austin, TX on Thurs, Nov 19th, at 
> 12:15-1:15pm to discuss the roadmap for this effort. Anyone interested in 
> participating in the discussion, and contributing to the project(!), is 
> welcome
> 
> HTH
> Ralph
> 
> 
>> On Sep 25, 2015, at 6:22 AM, Andy Riebs <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Synthesizing what I think I've learned over the past 24 hours,
>> The PMIx implementation described at 
>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf> 
>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf> describes a complete, 
>> upward-compatible [I hope!] highly-scalable (exascale) replacement for the 
>> PMI-1 and PMI-2 job launch facilities
>> The "PMIX" paper at <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> 
>> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> describes creative 
>> ways to use and extend the existing PMI-2 interface to exascale levels which 
>> will [I hope!] be fully compatible with the PMIx implementation
>> Clarifications and corrections gratefully accepted!
>> Andy
>> 
>> On 09/24/2015 06:49 PM, Andy Riebs wrote:
>>> 
>>> Ralph, Artem, and Sourav, thanks for the explanation! 
>>> 
>>> Andy 
>>> 
>>> On 09/24/2015 04:52 PM, Ralph Castain wrote: 
>>>> Hi Andy. 
>>>> 
>>>> I honestly have no idea why those guys did that :-). We’ve known about 
>>>> their efforts for awhile, but this is the first I’ve seen them labeled as 
>>>> PMIX. Guess we do differ on the capitalization of the last letter. 
>>>> 
>>>> Anyway, the PMIx effort is already being integrated in several popular 
>>>> RMs, including Slurm, so I imagine it’s a moot point accept for possibly 
>>>> confusing people searching publications. 
>>>> 
>>>> Ralph 
>>>> 
>>>>> On Sep 24, 2015, at 1:42 PM, Andy Riebs <[email protected]> 
>>>>> <mailto:[email protected]> wrote: 
>>>>> 
>>>>> 
>>>>>  From the presentations at the Slurm Users' Group meeting, 
>>>>> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> 
>>>>> <http://slurm.schedmd.com/SLUG15/chakrabs-slug15.pdf> and 
>>>>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf> 
>>>>> <http://slurm.schedmd.com/SLUG15/PMIx.pdf>, it appears that two 
>>>>> implementations of an improved PMI interface, both tagged "PMIX", are 
>>>>> underway. Are these cooperating, competing, or "only just realized that 
>>>>> they could be cooperating" activities? 
>>>>> 
>>>>> Andy 
>>>>> 
>>>>> -- 
>>>>> Andy Riebs 
>>>>> New email address! [email protected] <mailto:[email protected]> 
>>>>> Hewlett-Packard Company 
>>>>> High Performance Computing Software Engineering 
>>>>> +1 404 648 9024 <tel:%2B1%20404%20648%209024> 
>>>>> My opinions are not necessarily those of HP 
>> 
> 
>

[slurm-dev] Re: slurm-dev summary, was Re: What follows PMI-2?

Reply via email to