Quoting Miguel Méndez <[email protected]>: > Hi, > > Job Size Factor in Multifactor Priority Plugin gets its value considering > relative job size, and this size is relative to "node_record_count". The > problems I see with this are two: > > - "node_record_count" includes my login node, which is never going to be > used to run jobs. I would solve this by just substracting one to this value.
Only compute nodes are needed in your node list. The login node would generally not be included. > - "node_record_count" includes all existing nodes in the cluster, doesn't > matter if they are down. I think Job Size priority should be relative to > the maximun size of a job that could be run if there were no other jobs > running in the cluster. So if I have a 70 node cluster, with 2 nodes down, > and a 10 node job, priority for this job should be 10/68, not 10/70. > > What would be the easiest way of getting the number of allocated or idle > nodes? I have been trough slurmctld and sinfo code, but I understand they > use loops for this, and I would prefer not having to do this every time I > recalculate priorities. bit_set_count(avail_node_bitmap) will give you the count of nodes up and available very quickly in the slurmctld daemon. > Thanks, > > Miguel >
