Hi all,
I'm currently seeing a behavior I don't understand using MaxNodesPerUser in
a QoS setting.
The sacctmgr(1) documentation states:
SPECIFICATIONS FOR QOS
[...]
MaxNodesPerUser
Maximum number of nodes each user is able to use.
I'm using the following QOS:
# sacctmgr list qos test
format=name,maxcpusperuser,maxjobsperuser,maxnodesperuser
Name MaxCPUsPU MaxJobsPU MaxNodesPU
---------- --------- --------- ----------
test 12 3 2
I submit a bunch of 1-CPU jobs, and a I find out that only 2 jobs can run
at the same time:
$ for i in {1..4}; do sbatch --qos test -p test -n 1 --wrap="sleep 200";
done
Submitted batch job 1516
Submitted batch job 1517
Submitted batch job 1518
Submitted batch job 1519
$ squeue -o "%.5i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %.6C %R" -u kilian
JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES
CPUS NODELIST(REASON)
1518 test sbatch kilian PENDING 0:00 UNLIMITED 1
1 (QOSResourceLimit)
1519 test sbatch kilian PENDING 0:00 UNLIMITED 1
1 (QOSResourceLimit)
1517 test sbatch kilian RUNNING 1:20 UNLIMITED 1
1 sh-5-33
1516 test sbatch kilian RUNNING 1:22 UNLIMITED 1
1 sh-5-33
The 2 running jobs actually use the same 16-CPUs node, so I would expect 1
more job to start before hitting the MaxJobsPU=3 limit.
It looks to me like MaxNodesPU acts on the number of CPUs a user is using,
rather than on the number of nodes.
Is there anything I'm missing?
Thanks,
--
Kilian