Hi all,

I'm currently seeing a behavior I don't understand using MaxNodesPerUser in
a QoS setting.

The sacctmgr(1) documentation states:

SPECIFICATIONS FOR QOS
[...]
       MaxNodesPerUser
              Maximum number of nodes each user is able to use.


I'm using the following QOS:

# sacctmgr list qos test
format=name,maxcpusperuser,maxjobsperuser,maxnodesperuser
      Name MaxCPUsPU MaxJobsPU MaxNodesPU
---------- --------- --------- ----------
      test        12         3          2


I submit a bunch of 1-CPU jobs, and a I find out that only 2 jobs can run
at the same time:

$ for i in {1..4}; do sbatch --qos test -p test -n 1 --wrap="sleep 200";
done
Submitted batch job 1516
Submitted batch job 1517
Submitted batch job 1518
Submitted batch job 1519
$ squeue -o "%.5i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %.6C %R" -u kilian
JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES
CPUS NODELIST(REASON)
 1518      test   sbatch   kilian  PENDING       0:00 UNLIMITED      1
 1 (QOSResourceLimit)
 1519      test   sbatch   kilian  PENDING       0:00 UNLIMITED      1
 1 (QOSResourceLimit)
 1517      test   sbatch   kilian  RUNNING       1:20 UNLIMITED      1
 1 sh-5-33
 1516      test   sbatch   kilian  RUNNING       1:22 UNLIMITED      1
 1 sh-5-33

The 2 running jobs actually use the same 16-CPUs node, so I would expect 1
more job to start before hitting the MaxJobsPU=3 limit.
It looks to me like MaxNodesPU acts on the number of CPUs a user is using,
rather than on the number of nodes.

Is there anything I'm missing?

Thanks,
-- 
Kilian

Reply via email to