I recommend the LLN option for partitions:

*LLN*
   Schedule resources to jobs on the least loaded nodes (based upon the
   number of idle CPUs). This is generally only recommended for an
   environment with serial jobs as idle resources will tend to be
   highly fragmented, resulting in parallel jobs being distributed
   across many nodes. Note that node *Weight* takes precedence over how
   many idle resources are on each node. Also see the
   *SelectParameters* configuration parameter *CR_LLN* to use the least
loaded nodes in every partition.
-Paul Edmon-

On 11/15/2018 4:25 AM, Aravindh Sampathkumar wrote:
Hi All.

I'm having some trouble finding appropriate section of the documentation to change slurm resource allocation policy.

We have configured CPU and memory as consumable resources, and our nodes can run multiple jobs as long as there are CPU memory available.

What I want is for Slurm to spread jobs across all available servers in a partition instead of loading up few servers while others are idling.

For example, I have a partition nav which has 5 compute nodes(node[1-5]) dedicated to it. when users submit 3 jobs to nav partition, each requesting 1 CPU core and 1 GB of memory, SLURM schedules all the jobs in node1 because it has enough CPU cores and memory to satisfy job requirements. nodes - 2,3,4,5 are idle.

What I want instead is for slurm to schedule job1 to node1, job2 to node2, job3 to node3.. and then in the future if there are more jobs than there are nodes, slurm must utilise the rest of resources available in node1.


Why?
A small group that is using this partition is concerned that all their jobs get scheduled on the same node, and they  need to share network bandwidth, and bandwidth to local disk. If they were spread out instead, they could use better bandwidth.

Appreciate any advice how I can make this happen.

Thanks,
  Aravindh Sampathkumar
  aravi...@fastmail.com


Reply via email to