[slurm-dev] Re: more detailed installation guide

Ralph Castain Wed, 06 Jan 2016 13:27:26 -0800

> On Jan 6, 2016, at 12:27 PM, Bruce Roberts <[email protected]> wrote:
> 
> 
> 
> On 01/06/16 11:53, Ralph Castain wrote:
>> 
>>> On Jan 6, 2016, at 9:53 AM, Bruce Roberts <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> PMIx sounds really nice.
>>> 
>>> Forgive my naive question, but for mpirun would sstat and step accounting 
>>> continue to work as it does when using srun?
>> 
>> It does to an extent. You generally execute an mpirun for each job step. The 
>> mpirun launches its own daemons for each invocation, and the app procs are 
>> children of these daemons. So Slurm sees the daemons and will aggregate the 
>> accounting for its children into the daemon’s usage. However, the daemon 
>> mostly just sleeps once the app is running, and so the accounting should be 
>> okay (though you won’t get it for each individual app process).
>> 
>> Perhaps others out there who have used this can chime in with their 
>> experience?
> So it sounds like mpirun must use srun under the covers to launch the 
> daemons.  If using the cgroups proctrack plugin I'm guessing accounting will 
> work for the entire step just not down to the individual ranks as you state.  
> sstat probably isn't that useful at that point.  That is a large difference.


True on all accounts

>> 
>>>   Does mpirun also support Slurm's task placement/layout/binding/signaling? 
>>>  Our users use most of the features quite heavily as I am guessing others 
>>> do as well.
>> 
>> What mpirun supports depends on the MPI implementation, so I can only 
>> address your question for OpenMPI. You’ll find that OMPI’s mpirun provides a 
>> superset of Slurms options (i.e., we implemented a broader level of 
>> support), but the names and syntax of those options is different as it 
>> reflects that broader support. For example, we have the ability to allow 
>> more fine-grained layout/binding patterns and combinations.
> That is interesting.  I am able to lay out tasks in any order I would like 
> with srun on cores or threads of cores.  What finer-grained layout/binding 
> patterns are you referring to?

I believe srun only supports some specific patterns (e.g., cyclic), and those 
patterns combine placement and rank assignment. We separate out all three 
phases of mapping so you can control each independently:

* location (our “map-by” option) determines how the procs are laid out. 
Includes the ability to specify #cpus/proc. Note that you can also specify 
whether you want to treat cores as cpus, or use individual HTs as independent 
cpus (if HT is enabled)

* rank assignment (“rank-by”) allows you to define different algorithms for 
assigning the ranks to those procs. Depending on how the app has laid out its 
communication, this can sometimes be helpful

* binding (“bind-to”) let’s you bind the resulting location to whatever level 
you want

We also have a “rank_file” mapper that let’s you specify exact proc location on 
a rank-by-rank basis, and a “sequential” mapper that takes the list of hosts 
and places sequential ranks on each one in that specific order (i.e., you can 
get totally non-cyclic locations)

The resulting pattern map is rather large, and I fully confess that all the 
options aren’t regularly used. Most users just let the default algorithms run. 
However, researchers continue to explore the performance impact of these 
options, and every so often someone finds some measurable performance 
improvement by laying a particular application out in a new manner. So we 
maintain the flexibility.

HTH
Ralph


> 
> Thanks for your insights so far they helpful!
>> 
>> 
>>> 
>>> Thanks!
>>> 
>>> On 01/06/16 07:54, Ralph Castain wrote:
>>>> As with all such rumors, there is some truth and some inaccuracies to it. 
>>>> Note that the various MPIs have historically differed significantly in how 
>>>> they implement mpirun, though the differences in terms of behavior and 
>>>> performance have been closing. So it is hard to provide a clearcut answer 
>>>> that spans time, and I’ll just report where we are now and looking ahead a 
>>>> bit.
>>>> 
>>>> PMI-1 support doesn't scale as well as what was done in mpirun from some 
>>>> of the MPI libraries, and so your (A) is certainly true. Remember that 
>>>> Slurm provides PMI-1 out-of-the-box and that you have to do a second build 
>>>> step to add PMI-2 support. So for people that just do the std install and 
>>>> run, this will be the expected situation.
>>>> 
>>>> For those that install PMI-2 (or the new extended PMI-2 for MVAPICH), 
>>>> you’ll see some improved performance. I suspect you’ll find that srun and 
>>>> mpirun are pretty close to each other at that point, and the choice really 
>>>> just comes down to your desired cmd line options.
>>>> 
>>>> The test results with PMIx indicate that the performance gap between 
>>>> direct (srun) launch and indirect (mpirun) launch is pretty much gone. You 
>>>> have to remember that the overhead of mapping the job isn’t very large 
>>>> (and the time is roughly equal anyway), and that both srun and mpirun 
>>>> distribute the launch cmd in the same way (via a tree-based algorithm). 
>>>> Likewise, both involve starting a user-level daemon and wiring those up.
>>>> 
>>>> So when you break down the steps, and given that mpirun and srun are using 
>>>> the same wireup support, you can see that the two should be equivalent. 
>>>> Really just a question of which cmd line options you prefer.
>>>> 
>>>> HTH
>>>> Ralph
>>>> 
>>>> 
>>>>> On Jan 6, 2016, at 6:03 AM, Novosielski, Ryan <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Since this is an audience that might know, and this is related (but 
>>>>> off-topic, sorry): is there any truth to the suggestions on the Internet 
>>>>> that using srun is /slower/ than mpirun/mpiexec? There were some old 
>>>>> mailing list messages someplace that seem to indicate A) yes, in the old 
>>>>> days of PMI1 only or B) likely it was a misconfigured system in the first 
>>>>> place. I haven't found anything definitive though and those threads sort 
>>>>> of petered out without an answer. 
>>>>> 
>>>>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>>>>> || \\UTGERS <smb://UTGERS>      
>>>>> |---------------------*O*---------------------
>>>>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>>>>> || \\ and Health | [email protected] <mailto:[email protected]>- 
>>>>> 973/972.0922 (2x0922)
>>>>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>>>>     `'
>>>>> 
>>>>> On Jan 6, 2016, at 01:43, Ralph Castain <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>>> 
>>>>>> Simple reason, Chris - the PMI support is GPL 2.0, and so anything built 
>>>>>> against it automatically becomes GPL. So OpenHPC cannot distribute Slurm 
>>>>>> with those libraries.
>>>>>> 
>>>>>> Instead, we are looking to use the new PMIx library to provide wireup 
>>>>>> support, which includes backward support for PMI 1 and 2. I’m supposed 
>>>>>> to complete that backport in my copious free time :-)
>>>>>> 
>>>>>> Until then, you can only launch via mpirun - which is just as fast, 
>>>>>> actually, but does indeed have different cmd line options.
>>>>>> 
>>>>>> 
>>>>>>> On Jan 5, 2016, at 9:22 PM, Christopher Samuel <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 06/01/16 01:46, David Carlet wrote:
>>>>>>> 
>>>>>>>> Depending on where you are in the design/development phase for your
>>>>>>>> project, you might also consider switching to using the OpenHPC build.
>>>>>>> 
>>>>>>> Caution: for reasons that are unclear OpenHPC disables Slurm PMI 
>>>>>>> support:
>>>>>>> 
>>>>>>> https://github.com/openhpc/ohpc/releases/download/v1.0.GA/Install_guide-CentOS7.1-1.0.pdf
>>>>>>>  
>>>>>>> <https://github.com/openhpc/ohpc/releases/download/v1.0.GA/Install_guide-CentOS7.1-1.0.pdf>
>>>>>>> 
>>>>>>> # At present, OpenHPC is unable to include the PMI process
>>>>>>> # management server normally included within Slurm which
>>>>>>> # implies that srun cannot be use for MPI job launch. Instead,
>>>>>>> # native job launch mechanisms provided by the MPI stacks are
>>>>>>> # utilized and prun abstracts this process for the various
>>>>>>> # stacks to retain a single launch command.
>>>>>>> 
>>>>>>> Their spec file does:
>>>>>>> 
>>>>>>> # 6/16/15 [email protected] <mailto:[email protected]> - do 
>>>>>>> not package Slurm's version of libpmi with OpenHPC.
>>>>>>> %if 0%{?OHPC_BUILD}
>>>>>>>  rm -f $RPM_BUILD_ROOT/%{_libdir}/libpmi*
>>>>>>>  rm -f $RPM_BUILD_ROOT/%{_libdir}/mpi_pmi2*
>>>>>>> %endif
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Christopher Samuel        Senior Systems Administrator
>>>>>>> VLSCI - Victorian Life Sciences Computation Initiative
>>>>>>> Email: [email protected] <mailto:[email protected]> Phone: +61 
>>>>>>> (0)3 903 55545
>>>>>>> http://www.vlsci.org.au/ <http://www.vlsci.org.au/>      
>>>>>>> http://twitter.com/vlsci <http://twitter.com/vlsci>
>>>> 
>>> 
>> 
>

[slurm-dev] Re: more detailed installation guide

Reply via email to