-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi there slurm-dev and OMPI devel lists,

Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case
and noticed that if I run it with srun rather than mpirun it goes over
20% slower.  These are all launched from an sbatch script too.

Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.

Here are some timings as reported as the WallClock time by NAMD itself
(so not including startup/tear down overhead from Slurm).

srun:

run1/slurm-93744.out:WallClock: 695.079773  CPUTime: 695.079773
run4/slurm-94011.out:WallClock: 723.907959  CPUTime: 723.907959
run5/slurm-94013.out:WallClock: 726.156799  CPUTime: 726.156799
run6/slurm-94017.out:WallClock: 724.828918  CPUTime: 724.828918

Average of 692 seconds

mpirun:

run2/slurm-93746.out:WallClock: 559.311035  CPUTime: 559.311035
run3/slurm-93910.out:WallClock: 544.116333  CPUTime: 544.116333
run7/slurm-94019.out:WallClock: 586.072693  CPUTime: 586.072693

Average of 563 seconds.

So that's about 23% slower.

Everything is identical (they're all symlinks to the same golden
master) *except* for the srun / mpirun which is modified by copying
the batch script and substituting mpirun for srun.

When they are running I can see that for jobs launched with srun they
are direct children of slurmstepd whereas when started with mpirun
they are children of Open-MPI's orted (or mpirun on the launch node)
which itself is a child of slurmstepd.

Has anyone else seen anything like this, or got any ideas?

cheers,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHuKxoACgkQO2KABBYQAh8cYQCfT/YIFkyeDaNb/ksT2xk4W416
kycAoJfdZInLwy+nTIL7CzWapZZU20qm
=ZJ1B
-----END PGP SIGNATURE-----

Reply via email to