Thanks for all of that Ralph. I was getting a lot of "help" from users debugging a performance problem and they were pointing to the use of srun. The more concrete info I had, the better (and for my own edification as I'd really otherwise not prefer to switch since it makes it easier to be using one software package to launch this stuff).
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Senior Technologist || \\ and Health | [email protected]<mailto:[email protected]>- 973/972.0922 (2x0922) || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark `' On Jan 7, 2016, at 18:32, Ralph Castain <[email protected]<mailto:[email protected]>> wrote: Just following up as promised with some data. The graphs below were generated using the SLURM master with the PMIx plugin based on PMIx v1.1.0, running 64 procs/node, using a simple MPI_Init/MPI_Finalize app. The blue line used srun to start the job, and used PMI-2. The red line also was started by srun, but used PMIx. As you can see, there is some performance benefit from use of PMIx. The gray line used srun to start the job and the PMIx plugin, but also used the new optional features to reduce the startup time. There are two features: (a) we only do a modex “recv” (i.e., a PMI-get) upon first communication to a specific peer (b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do not execute a barrier. Instead, there is an async exchange of the data. We only block when the proc requests a specific piece of data The final yellow line is mpirun (which uses PMIx) using the new optional features. As you can see, it’s a little faster than srun-based launch. We are extending these tests to larger scale, and continuing to push the performance as discussed before. HTH Ralph <PastedGraphic-1.tiff> On Jan 6, 2016, at 11:58 PM, Ralph Castain <[email protected]<mailto:[email protected]>> wrote: On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan <[email protected]<mailto:[email protected]>> wrote: On Jan 6, 2016, at 23:31, Christopher Samuel <[email protected]<mailto:[email protected]>> wrote: On 07/01/16 01:03, Novosielski, Ryan wrote: Since this is an audience that might know, and this is related (but off-topic, sorry): is there any truth to the suggestions on the Internet that using srun is /slower/ than mpirun/mpiexec? In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower with srun than with mpirun. This was tested with NAMD. Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with mpirun. Thanks very much to both of you who have responded with an answer to this question. Both of you have said "about the same" if I'm not mistaken. So I guess they're still is a very slight performance penalty to using PMI2 instead of mpirun? Probably worth it anyway, but I'm just curious to know the real score. Not a lot of info about this other than the mailing list. FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to the (1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2 difference, but also because OMPI’s mpirun slowed down significantly between the 1.6 and 1.8 series. We didn’t catch the loss of performance in time, but are addressing it for the upcoming 2.0 series. In 2.0, mpirun will natively use PMIx, and you can additionally use two new optional features to dramatically improve the launch time. I’ll provide a graph tomorrow to show the different performance vs PMI-2 even at small scale. Those features may become the default behavior at some point - hasn’t fully been decided yet as they need time to mature. However, the situation is fluid. Using the SLURM PMix plugin (in master now and tentatively scheduled for release later this year) will effectively close the gap. Somewhere in that same timeframe, OMPI will be implementing further improvements to mpirun (using fabric instead of mgmt Ethernet to perform barriers, distributing the launch mapping procedure, etc.) and will likely move ahead again - and then members of the PMIx community are already planning to propose some of those changes for SLURM. If accepted, you’ll see the gap close again. So I expect to see this surge and recover pattern to continue for the next couple of years, with mpirun ahead for awhile and then even with SLURM when using the PMIx plugin. HTH - and I’ll provide the graph in the morning. Ralph Thanks again.
