Are these sruns already in an allocation or not? If not, you might consider setting PrologFlags=alloc in slurm.conf, which should perform much of the remote job setup when the head node is configured (presuming that might be your issue, or you have a configuration that might make that an issue). Otherwise, checking slurmd logs (ideally with debug level logging) on the different nodes may give a clue.

-Doug


On 1/11/17 11:23 AM, Pritchard Jr., Howard wrote:
Hi SLURM folks,

I recently got SLURM (16.05.6) set up on a small cluster (48 nodes x86_64 + Intel OPA)
and things appear to be nominal except for one odd performance problem
as far as srun launch times go. I don’t observe this on other clusters running
SLURM at our site.

What I’m observing is that regardless of whether or not the application being launched is a command (e.g. /bin/hostname) or an MPI application, I get reasonable job launch times when using one node, but as soon as I use two or morenodes, there is about a 10 second overhead to get the processes on the additional nodes started:

For example:

[hpp@hi-master ~]$ srun -n 8 -N 1 date

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017


[hpp@hi-master ~]$ srun -n 8 -N 2 date

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017


[hpp@hi-master ~]$ srun -n 8 -N 4 date

Wed Jan 11 12:10:57 MST 2017

Wed Jan 11 12:10:57 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:06 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:06 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:07 MST 2017


Anyone observed this problem before?


Any suggestions on how to resolve this problem would be much

appreciated.


Thanks,


Howard


--
Howard Pritchard
HPC-DES
Los Alamos National Laboratory


Reply via email to