Good evening, We have configured our cluster so that the "Weight" field for each node is proportional with the number of cores in the blade. The thinking was that we would like Slurm to allocate smaller nodes first, unless people explicitly ask for larger resources; however, Slurm seems to be scheduling things in precisely the opposite way: Nodes with larger weights are being scheduled before nodes with smaller weights. Even if I ask for 1 core on 1 node with limited memory, I get assigned to the biggest node with the most memory ... even thought its weight is larger.
I guess it is possible that my understanding is reversed from what is actually so; however, I've included the relevant quote form the slurm.conf documentation below. I wonder: Does the actual *value* matter? That is, should the weights be orders of magnitude different from one another, or is it just ranked? (I've assumed that the order matters but the value is not important). Also, does the topo pluging change things? From slurm.conf docs: "Weight The priority of the node for scheduling purposes. All things being equal, jobs will be allocated the nodes with the lowest weight which satisfies their requirements. For example, a heterogeneous collection of nodes might be placed into a single partition for greater system utilization, responsiveness and capability. It would be preferable to allocate smaller memory nodes rather than larger memory nodes if either will satisfy a job's requirements. The units of weight are arbitrary, but larger weights should be assigned to nodes with more processors, memory, disk space, higher processor speed, etc. Note that if a job allocation request can not be satisfied using the nodes with the lowest weight, the set of nodes with the next lowest weight is added to the set of nodes under consideration for use (repeat as needed for higher weight values). If you absolutely want to minimize the number of higher weight nodes allocated to a job (at a cost of higher scheduling overhead), give each node a distinct Weight value and they will be added to the pool of nodes being considered for scheduling individually. The default value is 1." Thanks, Paul.
