Thanks for all of that Ralph. I was getting a lot of "help" from users 
debugging a performance problem and they were pointing to the use of srun. The 
more concrete info I had, the better (and for my own edification as I'd really 
otherwise not prefer to switch since it makes it easier to be using one 
software package to launch this stuff).

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | [email protected]<mailto:[email protected]>- 
973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Jan 7, 2016, at 18:32, Ralph Castain 
<[email protected]<mailto:[email protected]>> wrote:

Just following up as promised with some data. The graphs below were generated 
using the SLURM master with the PMIx plugin based on PMIx v1.1.0, running 64 
procs/node, using a simple MPI_Init/MPI_Finalize app. The blue line used srun 
to start the job, and used PMI-2. The red line also was started by srun, but 
used PMIx. As you can see, there is some performance benefit from use of PMIx.

The gray line used srun to start the job and the PMIx plugin, but also used the 
new optional features to reduce the startup time. There are two features:

(a) we only do a modex “recv” (i.e., a PMI-get) upon first communication to a 
specific peer

(b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do not 
execute a barrier. Instead, there is an async exchange of the data. We only 
block when the proc requests a specific piece of data


The final yellow line is mpirun (which uses PMIx) using the new optional 
features. As you can see, it’s a little faster than srun-based launch.

We are extending these tests to larger scale, and continuing to push the 
performance as discussed before.

HTH
Ralph


<PastedGraphic-1.tiff>


On Jan 6, 2016, at 11:58 PM, Ralph Castain 
<[email protected]<mailto:[email protected]>> wrote:



On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan 
<[email protected]<mailto:[email protected]>> wrote:


On Jan 6, 2016, at 23:31, Christopher Samuel 
<[email protected]<mailto:[email protected]>> wrote:

On 07/01/16 01:03, Novosielski, Ryan wrote:

Since this is an audience that might know, and this is related (but
off-topic, sorry): is there any truth to the suggestions on the Internet
that using srun is /slower/ than mpirun/mpiexec?

In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower
with srun than with mpirun.  This was tested with NAMD.

Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with
mpirun.

Thanks very much to both of you who have responded with an answer to this 
question. Both of you have said "about the same" if I'm not mistaken. So I 
guess they're still is a very slight performance penalty to using PMI2 instead 
of mpirun? Probably worth it anyway, but I'm just curious to know the real 
score. Not a lot of info about this other than the mailing list.

FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to the 
(1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2 difference, 
but also because OMPI’s mpirun slowed down significantly between the 1.6 and 
1.8 series. We didn’t catch the loss of performance in time, but are addressing 
it for the upcoming 2.0 series.

In 2.0, mpirun will natively use PMIx, and you can additionally use two new 
optional features to dramatically improve the launch time. I’ll provide a graph 
tomorrow to show the different performance vs PMI-2 even at small scale. Those 
features may become the default behavior at some point - hasn’t fully been 
decided yet as they need time to mature.

However, the situation is fluid. Using the SLURM PMix plugin (in master now and 
tentatively scheduled for release later this year) will effectively close the 
gap. Somewhere in that same timeframe, OMPI will be implementing further 
improvements to mpirun (using fabric instead of mgmt Ethernet to perform 
barriers, distributing the launch mapping procedure, etc.) and will likely move 
ahead again - and then members of the PMIx community are already planning to 
propose some of those changes for SLURM. If accepted, you’ll see the gap close 
again.

So I expect to see this surge and recover pattern to continue for the next 
couple of years, with mpirun ahead for awhile and then even with SLURM when 
using the PMIx plugin.

HTH - and I’ll provide the graph in the morning.
Ralph




Thanks again.

Reply via email to