On 04/01/17 04:20, Koziol, Lucas wrote: > The hope was that all 16 tasks would run on Node 1, and 16 tasks would > run on Node 2. Unfortunately what happens is that all 32 jobs get > assigned to Node 1. I thought –m cyclic was supposed to avoid this.
You're only running a single task at a time, so it's a bit hard for srun to distribute 1 task over multiple nodes. :-) The confusion is, I suspect, that job steps (an srun instance) are not the same as tasks (individual processes launched in a job step). The behaviour in the manual page is for things like MPI jobs where you want to distribute the many ranks (tasks) over nodes/sockets/cores in a particular way - in this instance a single srun might be launching 10's through to 100,000's of tasks (or more) at once. What might work better for you is to use a job array for your work instead of a single Slurm job and then have this in your slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_LLN This should get Slurm to distribute the job array elements across nodes picking the least loaded (allocated) node in each case. Job arrays are documented here: https://slurm.schedmd.com/job_array.html Hope this helps! All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci