[slurm-users] DenyOnLimit flag ignored for QOS, always rejects?

Renfro, Michael Fri, 25 Jan 2019 08:37:29 -0800

Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.

I wanted to limit queue-stuffing on my GPU nodes, similar to what 
AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to 
having 8 active or queued jobs in the production GPU partition, and block (not 
reject) other jobs to allow other users fair access to the queue. I'm good with 
a time limit instead of a job number limit, too.


I'd assumed a partition QOS was the way to go, as the sacctmgr man page reads 
in part:

    Flags  Used by the slurmctld to override or enforce certain characteristics.
           Valid options are

           DenyOnLimit
             If set, jobs using this QOS will be rejected at submission time if 
they do not conform to the QOS 'Max' limits. Group limits will also be treated 
like 'Max' limits as well and will be denied if they go over. By default jobs 
that go over these limits will pend until they conform. This currently only 
applies to QOS and Association limits.

So avoid setting the DenyOnLimit flag, and extra jobs will pend until they 
conform, right? My QOS settings for 8 active or pending GPU jobs per user are 
as follows:

    $ sacctmgr list qos normal,gpu 
format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,flags
          Name   Priority  GraceTime PreemptMode UsageFactor GrpTRESRunMin 
MaxSubmitPU                Flags
    ---------- ---------- ---------- ----------- ----------- ------------- 
----------- --------------------
        normal          0   00:00:00     cluster    1.000000
           gpu          0   00:00:00     cluster    1.000000                    
     8

Partition settings, where the gpu QOS is applied to jobs in the gpu partition:

    $ egrep 'PartitionName=(batch|gpu) ' /etc/slurm/slurm.conf
    PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 
DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 
PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 
PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL 
LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
Nodes=node[001-040]
    PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 
State=UP Nodes=gpunode[001-004]

Original submission specifying CPUs, time, GRES, QOS, and partition, which 
accepts jobs 1-8, and rejects job 9 even though I haven't set the DenyOnLimit 
flag:

    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 
--gres=gpu --qos=gpu --partition=gpu omp_hw.sh; done
    Submitted batch job 150548
    Submitted batch job 150549
    Submitted batch job 150550
    Submitted batch job 150551
    Submitted batch job 150552
    Submitted batch job 150553
    Submitted batch job 150554
    Submitted batch job 150555
    sbatch: error: Batch job submission failed: Job violates accounting/QOS 
policy (job submit limit, user's size and/or time limits)
    $ scancel -u $USER -p gpu

Minimized down to just the specification for CPUs, time, and partition, same 
results, since the gpu QOS is automatically applied to jobs in the gpu 
partition:

    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 
--partition=gpu omp_hw.sh; done
    Submitted batch job 150556
    Submitted batch job 150557
    Submitted batch job 150558
    Submitted batch job 150559
    Submitted batch job 150560
    Submitted batch job 150561
    Submitted batch job 150562
    Submitted batch job 150563
    sbatch: error: Batch job submission failed: Job violates accounting/QOS 
policy (job submit limit, user's size and/or time limits)
    $ scancel -u $USER -p gpu

Running in the batch partition with the normal QOS, all 9 jobs are accepted:

    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 
--partition=batch omp_hw.sh; done
    Submitted batch job 150564
    Submitted batch job 150565
    Submitted batch job 150566
    Submitted batch job 150567
    Submitted batch job 150568
    Submitted batch job 150569
    Submitted batch job 150570
    Submitted batch job 150571
    Submitted batch job 150572
    $ scancel -u $USER -p batch

Running in the batch partition with the gpu QOS explicitly specified, accepts 
jobs 1-8, and rejects job 9:

    $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 
--partition=batch --qos=gpu omp_hw.sh; done
    Submitted batch job 150573
    Submitted batch job 150574
    Submitted batch job 150575
    Submitted batch job 150576
    Submitted batch job 150577
    Submitted batch job 150578
    Submitted batch job 150579
    Submitted batch job 150580
    sbatch: error: Batch job submission failed: Job violates accounting/QOS 
policy (job submit limit, user's size and/or time limits)
    $ scancel -u $USER -p batch

So the behavior appears to be triggered by the gpu QOS. What might I have 
missed?

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University

[slurm-users] DenyOnLimit flag ignored for QOS, always rejects?

Reply via email to