I'd like to jump in with one fact. In OMPI s2 component has one problem
that slows down application when running under PMI2. s2 will allways push a
couple of keys making any fence non zero-byte. This slows down 2
empty Fences in OMPI.
I believe that if this will be fixed we will have comparable performance of
mpirun and srun --mpi=pmi2.
Ralph, I'll point to the place in the code in the next email.

пятница, 8 января 2016 г. пользователь Novosielski, Ryan написал:

> Thanks for all of that Ralph. I was getting a lot of "help" from users
> debugging a performance problem and they were pointing to the use of srun.
> The more concrete info I had, the better (and for my own edification as I'd
> really otherwise not prefer to switch since it makes it easier to be using
> one software package to launch this stuff).
>
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS      |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>- 973/972.0922
> (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>     `'
>
> On Jan 7, 2016, at 18:32, Ralph Castain <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
> Just following up as promised with some data. The graphs below were
> generated using the SLURM master with the PMIx plugin based on PMIx v1.1.0,
> running 64 procs/node, using a simple MPI_Init/MPI_Finalize app. The blue
> line used srun to start the job, and used PMI-2. The red line also was
> started by srun, but used PMIx. As you can see, there is some performance
> benefit from use of PMIx.
>
> The gray line used srun to start the job and the PMIx plugin, but also
> used the new optional features to reduce the startup time. There are two
> features:
>
> (a) we only do a modex “recv” (i.e., a PMI-get) upon first communication
> to a specific peer
>
> (b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do
> not execute a barrier. Instead, there is an async exchange of the data. We
> only block when the proc requests a specific piece of data
>
>
> The final yellow line is mpirun (which uses PMIx) using the new optional
> features. As you can see, it’s a little faster than srun-based launch.
>
> We are extending these tests to larger scale, and continuing to push the
> performance as discussed before.
>
> HTH
> Ralph
>
>
> <PastedGraphic-1.tiff>
>
>
> On Jan 6, 2016, at 11:58 PM, Ralph Castain <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>
>
> On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>
> On Jan 6, 2016, at 23:31, Christopher Samuel <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
> On 07/01/16 01:03, Novosielski, Ryan wrote:
>
> Since this is an audience that might know, and this is related (but
> off-topic, sorry): is there any truth to the suggestions on the Internet
> that using srun is /slower/ than mpirun/mpiexec?
>
>
> In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower
> with srun than with mpirun.  This was tested with NAMD.
>
> Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with
> mpirun.
>
>
> Thanks very much to both of you who have responded with an answer to this
> question. Both of you have said "about the same" if I'm not mistaken. So I
> guess they're still is a very slight performance penalty to using PMI2
> instead of mpirun? Probably worth it anyway, but I'm just curious to know
> the real score. Not a lot of info about this other than the mailing list.
>
>
> FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to
> the (1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2
> difference, but also because OMPI’s mpirun slowed down significantly
> between the 1.6 and 1.8 series. We didn’t catch the loss of performance in
> time, but are addressing it for the upcoming 2.0 series.
>
> In 2.0, mpirun will natively use PMIx, and you can additionally use two
> new optional features to dramatically improve the launch time. I’ll provide
> a graph tomorrow to show the different performance vs PMI-2 even at small
> scale. Those features may become the default behavior at some point -
> hasn’t fully been decided yet as they need time to mature.
>
> However, the situation is fluid. Using the SLURM PMix plugin (in master
> now and tentatively scheduled for release later this year) will effectively
> close the gap. Somewhere in that same timeframe, OMPI will be implementing
> further improvements to mpirun (using fabric instead of mgmt Ethernet to
> perform barriers, distributing the launch mapping procedure, etc.) and will
> likely move ahead again - and then members of the PMIx community are
> already planning to propose some of those changes for SLURM. If accepted,
> you’ll see the gap close again.
>
> So I expect to see this surge and recover pattern to continue for the next
> couple of years, with mpirun ahead for awhile and then even with SLURM when
> using the PMIx plugin.
>
> HTH - and I’ll provide the graph in the morning.
> Ralph
>
>
>
>
> Thanks again.
>
>
>

-- 
-----
Best regards, Artem Polyakov
(Mobile mail)

Reply via email to