PMIx sounds really nice.
Forgive my naive question, but for mpirun would sstat and step
accounting continue to work as it does when using srun? Does mpirun
also support Slurm's task placement/layout/binding/signaling? Our users
use most of the features quite heavily as I am guessing others do as well.
Thanks!
On 01/06/16 07:54, Ralph Castain wrote:
As with all such rumors, there is some truth and some inaccuracies to
it. Note that the various MPIs have historically differed
significantly in how they implement mpirun, though the differences in
terms of behavior and performance have been closing. So it is hard to
provide a clearcut answer that spans time, and I’ll just report where
we are now and looking ahead a bit.
PMI-1 support doesn't scale as well as what was done in mpirun from
some of the MPI libraries, and so your (A) is certainly true. Remember
that Slurm provides PMI-1 out-of-the-box and that you have to do a
second build step to add PMI-2 support. So for people that just do the
std install and run, this will be the expected situation.
For those that install PMI-2 (or the new extended PMI-2 for MVAPICH),
you’ll see some improved performance. I suspect you’ll find that srun
and mpirun are pretty close to each other at that point, and the
choice really just comes down to your desired cmd line options.
The test results with PMIx indicate that the performance gap between
direct (srun) launch and indirect (mpirun) launch is pretty much gone.
You have to remember that the overhead of mapping the job isn’t very
large (and the time is roughly equal anyway), and that both srun and
mpirun distribute the launch cmd in the same way (via a tree-based
algorithm). Likewise, both involve starting a user-level daemon and
wiring those up.
So when you break down the steps, and given that mpirun and srun are
using the same wireup support, you can see that the two should be
equivalent. Really just a question of which cmd line options you prefer.
HTH
Ralph
On Jan 6, 2016, at 6:03 AM, Novosielski, Ryan
<[email protected] <mailto:[email protected]>> wrote:
Since this is an audience that might know, and this is related (but
off-topic, sorry): is there any truth to the suggestions on the
Internet that using srun is /slower/ than mpirun/mpiexec? There were
some old mailing list messages someplace that seem to indicate A)
yes, in the old days of PMI1 only or B) likely it was a misconfigured
system in the first place. I haven't found anything definitive though
and those threads sort of petered out without an answer.
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS <smb://UTGERS>
|---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | [email protected]
<mailto:[email protected]>- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
On Jan 6, 2016, at 01:43, Ralph Castain <[email protected]
<mailto:[email protected]>> wrote:
Simple reason, Chris - the PMI support is GPL 2.0, and so anything
built against it automatically becomes GPL. So OpenHPC cannot
distribute Slurm with those libraries.
Instead, we are looking to use the new PMIx library to provide
wireup support, which includes backward support for PMI 1 and 2. I’m
supposed to complete that backport in my copious free time :-)
Until then, you can only launch via mpirun - which is just as fast,
actually, but does indeed have different cmd line options.
On Jan 5, 2016, at 9:22 PM, Christopher Samuel
<[email protected] <mailto:[email protected]>> wrote:
On 06/01/16 01:46, David Carlet wrote:
Depending on where you are in the design/development phase for your
project, you might also consider switching to using the OpenHPC build.
Caution: for reasons that are unclear OpenHPC disables Slurm PMI
support:
https://github.com/openhpc/ohpc/releases/download/v1.0.GA/Install_guide-CentOS7.1-1.0.pdf
# At present, OpenHPC is unable to include the PMI process
# management server normally included within Slurm which
# implies that srun cannot be use for MPI job launch. Instead,
# native job launch mechanisms provided by the MPI stacks are
# utilized and prun abstracts this process for the various
# stacks to retain a single launch command.
Their spec file does:
# 6/16/15 [email protected] <mailto:[email protected]>
- do not package Slurm's version of libpmi with OpenHPC.
%if 0%{?OHPC_BUILD}
rm -f $RPM_BUILD_ROOT/%{_libdir}/libpmi*
rm -f $RPM_BUILD_ROOT/%{_libdir}/mpi_pmi2*
%endif
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: [email protected] <mailto:[email protected]> Phone:
+61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci