Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.
I wanted to limit queue-stuffing on my GPU nodes, similar to what
AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to
having 8 active or queued jobs in the production GPU partition, and block (not
reject) other jobs to allow other users fair access to the queue. I'm good with
a time limit instead of a job number limit, too.
I'd assumed a partition QOS was the way to go, as the sacctmgr man page reads
in part:
Flags Used by the slurmctld to override or enforce certain characteristics.
Valid options are
DenyOnLimit
If set, jobs using this QOS will be rejected at submission time if
they do not conform to the QOS 'Max' limits. Group limits will also be treated
like 'Max' limits as well and will be denied if they go over. By default jobs
that go over these limits will pend until they conform. This currently only
applies to QOS and Association limits.
So avoid setting the DenyOnLimit flag, and extra jobs will pend until they
conform, right? My QOS settings for 8 active or pending GPU jobs per user are
as follows:
$ sacctmgr list qos normal,gpu
format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,flags
Name Priority GraceTime PreemptMode UsageFactor GrpTRESRunMin
MaxSubmitPU Flags
---------- ---------- ---------- ----------- ----------- -------------
----------- --------------------
normal 0 00:00:00 cluster 1.000000
gpu 0 00:00:00 cluster 1.000000
8
Partition settings, where the gpu QOS is applied to jobs in the gpu partition:
$ egrep 'PartitionName=(batch|gpu) ' /etc/slurm/slurm.conf
PartitionName=batch Default=YES MinNodes=1 MaxNodes=40
DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1
PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL
LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP
Nodes=node[001-040]
PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF
ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO
MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0
State=UP Nodes=gpunode[001-004]
Original submission specifying CPUs, time, GRES, QOS, and partition, which
accepts jobs 1-8, and rejects job 9 even though I haven't set the DenyOnLimit
flag:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00
--gres=gpu --qos=gpu --partition=gpu omp_hw.sh; done
Submitted batch job 150548
Submitted batch job 150549
Submitted batch job 150550
Submitted batch job 150551
Submitted batch job 150552
Submitted batch job 150553
Submitted batch job 150554
Submitted batch job 150555
sbatch: error: Batch job submission failed: Job violates accounting/QOS
policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p gpu
Minimized down to just the specification for CPUs, time, and partition, same
results, since the gpu QOS is automatically applied to jobs in the gpu
partition:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00
--partition=gpu omp_hw.sh; done
Submitted batch job 150556
Submitted batch job 150557
Submitted batch job 150558
Submitted batch job 150559
Submitted batch job 150560
Submitted batch job 150561
Submitted batch job 150562
Submitted batch job 150563
sbatch: error: Batch job submission failed: Job violates accounting/QOS
policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p gpu
Running in the batch partition with the normal QOS, all 9 jobs are accepted:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00
--partition=batch omp_hw.sh; done
Submitted batch job 150564
Submitted batch job 150565
Submitted batch job 150566
Submitted batch job 150567
Submitted batch job 150568
Submitted batch job 150569
Submitted batch job 150570
Submitted batch job 150571
Submitted batch job 150572
$ scancel -u $USER -p batch
Running in the batch partition with the gpu QOS explicitly specified, accepts
jobs 1-8, and rejects job 9:
$ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00
--partition=batch --qos=gpu omp_hw.sh; done
Submitted batch job 150573
Submitted batch job 150574
Submitted batch job 150575
Submitted batch job 150576
Submitted batch job 150577
Submitted batch job 150578
Submitted batch job 150579
Submitted batch job 150580
sbatch: error: Batch job submission failed: Job violates accounting/QOS
policy (job submit limit, user's size and/or time limits)
$ scancel -u $USER -p batch
So the behavior appears to be triggered by the gpu QOS. What might I have
missed?
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University