Are these sruns already in an allocation or not? If not, you might
consider setting PrologFlags=alloc in slurm.conf, which should perform
much of the remote job setup when the head node is configured (presuming
that might be your issue, or you have a configuration that might make
that an issue). Otherwise, checking slurmd logs (ideally with debug
level logging) on the different nodes may give a clue.
-Doug
On 1/11/17 11:23 AM, Pritchard Jr., Howard wrote:
Hi SLURM folks,
I recently got SLURM (16.05.6) set up on a small cluster (48 nodes
x86_64 + Intel OPA)
and things appear to be nominal except for one odd performance problem
as far as srun launch times go. I don’t observe this on other
clusters running
SLURM at our site.
What I’m observing is that regardless of whether or not the
application being
launched is a command (e.g. /bin/hostname) or an MPI application, I
get reasonable
job launch times when using one node, but as soon as I use two or
morenodes, there
is about a 10 second overhead to get the processes on the additional
nodes started:
For example:
[hpp@hi-master ~]$ srun -n 8 -N 1 date
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
Wed Jan 11 12:11:29 MST 2017
[hpp@hi-master ~]$ srun -n 8 -N 2 date
Wed Jan 11 12:10:35 MST 2017
Wed Jan 11 12:10:35 MST 2017
Wed Jan 11 12:10:35 MST 2017
Wed Jan 11 12:10:35 MST 2017
Wed Jan 11 12:10:44 MST 2017
Wed Jan 11 12:10:44 MST 2017
Wed Jan 11 12:10:44 MST 2017
Wed Jan 11 12:10:44 MST 2017
[hpp@hi-master ~]$ srun -n 8 -N 4 date
Wed Jan 11 12:10:57 MST 2017
Wed Jan 11 12:10:57 MST 2017
Wed Jan 11 12:11:07 MST 2017
Wed Jan 11 12:11:06 MST 2017
Wed Jan 11 12:11:07 MST 2017
Wed Jan 11 12:11:06 MST 2017
Wed Jan 11 12:11:07 MST 2017
Wed Jan 11 12:11:07 MST 2017
Anyone observed this problem before?
Any suggestions on how to resolve this problem would be much
appreciated.
Thanks,
Howard
--
Howard Pritchard
HPC-DES
Los Alamos National Laboratory