I'd like to jump in with one fact. In OMPI s2 component has one problem that slows down application when running under PMI2. s2 will allways push a couple of keys making any fence non zero-byte. This slows down 2 empty Fences in OMPI. I believe that if this will be fixed we will have comparable performance of mpirun and srun --mpi=pmi2. Ralph, I'll point to the place in the code in the next email.
пятница, 8 января 2016 г. пользователь Novosielski, Ryan написал: > Thanks for all of that Ralph. I was getting a lot of "help" from users > debugging a performance problem and they were pointing to the use of srun. > The more concrete info I had, the better (and for my own edification as I'd > really otherwise not prefer to switch since it makes it easier to be using > one software package to launch this stuff). > > ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* > || \\UTGERS |---------------------*O*--------------------- > ||_// Biomedical | Ryan Novosielski - Senior Technologist > || \\ and Health | [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>- 973/972.0922 > (2x0922) > || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark > `' > > On Jan 7, 2016, at 18:32, Ralph Castain <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > Just following up as promised with some data. The graphs below were > generated using the SLURM master with the PMIx plugin based on PMIx v1.1.0, > running 64 procs/node, using a simple MPI_Init/MPI_Finalize app. The blue > line used srun to start the job, and used PMI-2. The red line also was > started by srun, but used PMIx. As you can see, there is some performance > benefit from use of PMIx. > > The gray line used srun to start the job and the PMIx plugin, but also > used the new optional features to reduce the startup time. There are two > features: > > (a) we only do a modex “recv” (i.e., a PMI-get) upon first communication > to a specific peer > > (b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do > not execute a barrier. Instead, there is an async exchange of the data. We > only block when the proc requests a specific piece of data > > > The final yellow line is mpirun (which uses PMIx) using the new optional > features. As you can see, it’s a little faster than srun-based launch. > > We are extending these tests to larger scale, and continuing to push the > performance as discussed before. > > HTH > Ralph > > > <PastedGraphic-1.tiff> > > > On Jan 6, 2016, at 11:58 PM, Ralph Castain <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > On Jan 6, 2016, at 23:31, Christopher Samuel <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > On 07/01/16 01:03, Novosielski, Ryan wrote: > > Since this is an audience that might know, and this is related (but > off-topic, sorry): is there any truth to the suggestions on the Internet > that using srun is /slower/ than mpirun/mpiexec? > > > In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower > with srun than with mpirun. This was tested with NAMD. > > Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with > mpirun. > > > Thanks very much to both of you who have responded with an answer to this > question. Both of you have said "about the same" if I'm not mistaken. So I > guess they're still is a very slight performance penalty to using PMI2 > instead of mpirun? Probably worth it anyway, but I'm just curious to know > the real score. Not a lot of info about this other than the mailing list. > > > FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to > the (1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2 > difference, but also because OMPI’s mpirun slowed down significantly > between the 1.6 and 1.8 series. We didn’t catch the loss of performance in > time, but are addressing it for the upcoming 2.0 series. > > In 2.0, mpirun will natively use PMIx, and you can additionally use two > new optional features to dramatically improve the launch time. I’ll provide > a graph tomorrow to show the different performance vs PMI-2 even at small > scale. Those features may become the default behavior at some point - > hasn’t fully been decided yet as they need time to mature. > > However, the situation is fluid. Using the SLURM PMix plugin (in master > now and tentatively scheduled for release later this year) will effectively > close the gap. Somewhere in that same timeframe, OMPI will be implementing > further improvements to mpirun (using fabric instead of mgmt Ethernet to > perform barriers, distributing the launch mapping procedure, etc.) and will > likely move ahead again - and then members of the PMIx community are > already planning to propose some of those changes for SLURM. If accepted, > you’ll see the gap close again. > > So I expect to see this surge and recover pattern to continue for the next > couple of years, with mpirun ahead for awhile and then even with SLURM when > using the PMIx plugin. > > HTH - and I’ll provide the graph in the morning. > Ralph > > > > > Thanks again. > > > -- ----- Best regards, Artem Polyakov (Mobile mail)
