On 04/01/17 04:20, Koziol, Lucas wrote:

> The hope was that all 16 tasks would run on Node 1, and 16 tasks would
> run on Node 2. Unfortunately what happens is that all 32 jobs get
> assigned to Node 1. I thought –m cyclic was supposed to avoid this.

You're only running a single task at a time, so it's a bit hard for srun
to distribute 1 task over multiple nodes. :-)

The confusion is, I suspect, that job steps (an srun instance) are not
the same as tasks (individual processes launched in a job step).

The behaviour in the manual page is for things like MPI jobs where you
want to distribute the many ranks (tasks) over nodes/sockets/cores in a
particular way - in this instance a single srun might be launching 10's
through to 100,000's of tasks (or more) at once.

What might work better for you is to use a job array for your work
instead of a single Slurm job and then have this in your slurm.conf:

SelectType=select/cons_res
SelectTypeParameters=CR_LLN

This should get Slurm to distribute the job array elements across nodes
picking the least loaded (allocated) node in each case.

Job arrays are documented here:

https://slurm.schedmd.com/job_array.html

Hope this helps!

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to