I'm assuming the jobs are running across multiple nodes, using MPI for
communication?

I'm guessing that srun is resulting in communication going across a GigE
fabric rather than IB, where mpirun directly is using the IB. A ~20%
performance penalty would make sense in that context.

- Tim

--
Tim Wickberg
[email protected]
Senior HPC Systems Administrator
The George Washington University


On Tue, Jul 23, 2013 at 3:06 AM, Christopher Samuel
<[email protected]>wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi there slurm-dev and OMPI devel lists,
>
> Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case
> and noticed that if I run it with srun rather than mpirun it goes over
> 20% slower.  These are all launched from an sbatch script too.
>
> Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.
>
> Here are some timings as reported as the WallClock time by NAMD itself
> (so not including startup/tear down overhead from Slurm).
>
> srun:
>
> run1/slurm-93744.out:WallClock: 695.079773  CPUTime: 695.079773
> run4/slurm-94011.out:WallClock: 723.907959  CPUTime: 723.907959
> run5/slurm-94013.out:WallClock: 726.156799  CPUTime: 726.156799
> run6/slurm-94017.out:WallClock: 724.828918  CPUTime: 724.828918
>
> Average of 692 seconds
>
> mpirun:
>
> run2/slurm-93746.out:WallClock: 559.311035  CPUTime: 559.311035
> run3/slurm-93910.out:WallClock: 544.116333  CPUTime: 544.116333
> run7/slurm-94019.out:WallClock: 586.072693  CPUTime: 586.072693
>
> Average of 563 seconds.
>
> So that's about 23% slower.
>
> Everything is identical (they're all symlinks to the same golden
> master) *except* for the srun / mpirun which is modified by copying
> the batch script and substituting mpirun for srun.
>
> When they are running I can see that for jobs launched with srun they
> are direct children of slurmstepd whereas when started with mpirun
> they are children of Open-MPI's orted (or mpirun on the launch node)
> which itself is a child of slurmstepd.
>
> Has anyone else seen anything like this, or got any ideas?
>
> cheers,
> Chris
> - --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: [email protected] Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlHuKxoACgkQO2KABBYQAh8cYQCfT/YIFkyeDaNb/ksT2xk4W416
> kycAoJfdZInLwy+nTIL7CzWapZZU20qm
> =ZJ1B
> -----END PGP SIGNATURE-----
>

Reply via email to