-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there slurm-dev and OMPI devel lists,
Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case and noticed that if I run it with srun rather than mpirun it goes over 20% slower. These are all launched from an sbatch script too. Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB. Here are some timings as reported as the WallClock time by NAMD itself (so not including startup/tear down overhead from Slurm). srun: run1/slurm-93744.out:WallClock: 695.079773 CPUTime: 695.079773 run4/slurm-94011.out:WallClock: 723.907959 CPUTime: 723.907959 run5/slurm-94013.out:WallClock: 726.156799 CPUTime: 726.156799 run6/slurm-94017.out:WallClock: 724.828918 CPUTime: 724.828918 Average of 692 seconds mpirun: run2/slurm-93746.out:WallClock: 559.311035 CPUTime: 559.311035 run3/slurm-93910.out:WallClock: 544.116333 CPUTime: 544.116333 run7/slurm-94019.out:WallClock: 586.072693 CPUTime: 586.072693 Average of 563 seconds. So that's about 23% slower. Everything is identical (they're all symlinks to the same golden master) *except* for the srun / mpirun which is modified by copying the batch script and substituting mpirun for srun. When they are running I can see that for jobs launched with srun they are direct children of slurmstepd whereas when started with mpirun they are children of Open-MPI's orted (or mpirun on the launch node) which itself is a child of slurmstepd. Has anyone else seen anything like this, or got any ideas? cheers, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlHuKxoACgkQO2KABBYQAh8cYQCfT/YIFkyeDaNb/ksT2xk4W416 kycAoJfdZInLwy+nTIL7CzWapZZU20qm =ZJ1B -----END PGP SIGNATURE-----