[slurm-dev] qos - MaxSubmitPU

Schmidtmann, Carl Mon, 07 Dec 2015 07:35:42 -0800

Either this is a bug or a complete misunderstanding on my part. I set the 
MaxSubmitPU (MaxSubmitJobs) for a qos and I expected from the man page that any 
more jobs past the limit would be rejected. From the sacctmgr man page


       MaxSubmitJobs
              Maximum number of jobs pending or running state at any time per 
user.

This seems very straight forward - No user should be allowed to run or queue 
more than MaxSubmitJobs. Now we have a plugin that sets the qos to match the 
partition but I get the same results if I explicitly set the qos which I did 
below to remove the ambiguity. I have also set the DenyOnLimit flag to make 
sure the job is rejected. But I am allowed to submit and even execute more than 
the limit using a job array smaller than the limit. 

Is this a big misunderstanding on my part, a bug or a job array thing? See 
below for relative example of the problem.

Thanks for any clarifications,
Carl

Here is my demonstration of the problem. I set the QOS for our debug partition 
with a MaxSubmitJobs of 5. When I try to submit 10 jobs it is rejected. When I 
try to submit 5 jobs it is accepted and runs. Then I try to submit 5 more and 
it is rejected. So far so good. Then I submit 3 jobs and it is accepted and 
they start running. Then I submit 3 more jobs which are also accepted and 
started. So now I have 11 jobs running when the max is set to 5. The output of 
this example is below with relative commands to show versions and what the 
run-someting script produces as input to sbatch.

From my system running RHEL 6.6 and Slurm 14.11.9

[cschmid7_local@bhdev]$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
standard*      up 5-00:00:00      1   idle bhc0017
interactive    up    8:00:00      2   idle bhg0001,bhp0001
debug          up      30:00      2   idle bhg0001,bhp0001
gpu            up 5-00:00:00      1   idle bhg0001
phi            up 5-00:00:00      1   idle bhp0001
visual         up 5-00:00:00      1  drain bhx0101
preempt        up 1-00:00:00      2   idle bhc0017,bhg0001

12/07/2015 09:41:22 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ squeue -a
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)

12/07/2015 09:41:31 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ sacctmgr show qos debug
      Name   Priority  GraceTime    Preempt PreemptMode                         
           Flags UsageThres UsageFactor  GrpCPUs  GrpCPUMins GrpCPURunMins 
GrpJobs  GrpMem GrpNodes GrpSubmit     GrpWall  MaxCPUs  MaxCPUMins MaxNodes    
 MaxWall MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU  MinCPUs 
---------- ---------- ---------- ---------- ----------- 
---------------------------------------- ---------- ----------- -------- 
----------- ------------- ------- ------- -------- --------- ----------- 
-------- ----------- -------- ----------- --------- --------- ---------- 
----------- -------- 
     debug       1000   00:00:00    preempt     cluster                         
     DenyOnLimit               1.000000                                         
                                                12                         
00:30:00                  50                      5        1 

12/07/2015 09:41:48 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 10
#!/bin/bash
#SBATCH -p debug -a 1-10 -t 2 -o run-something.%a
. /software/modules/init/bash
for i in `seq 1 1`
do
  sleep 60
  echo slept $i min
done
echo done.

12/07/2015 09:42:16 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 10 | sbatch --qos debug
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy 
(job submit limit, user's size and/or time limits)

12/07/2015 09:43:18 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 5 | sbatch --qos debug
Submitted batch job 32208

12/07/2015 09:43:27 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 5 | sbatch --qos debug
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy 
(job submit limit, user's size and/or time limits)

12/07/2015 09:43:30 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 3 | sbatch --qos debug
Submitted batch job 32213

12/07/2015 09:43:37 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ squeue -a
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
           32213_1     debug   sbatch cschmid7  R       0:04      1 bhg0001
           32213_2     debug   sbatch cschmid7  R       0:04      1 bhg0001
           32213_3     debug   sbatch cschmid7  R       0:04      1 bhg0001
           32208_4     debug   sbatch cschmid7  R       0:14      1 bhg0001
           32208_5     debug   sbatch cschmid7  R       0:14      1 bhg0001
           32208_1     debug   sbatch cschmid7  R       0:15      1 bhg0001
           32208_2     debug   sbatch cschmid7  R       0:15      1 bhg0001
           32208_3     debug   sbatch cschmid7  R       0:15      1 bhg0001

12/07/2015 09:43:42 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ run-something -p debug -n 3 | sbatch --qos debug
Submitted batch job 32216

12/07/2015 09:44:13 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ squeue -a
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
           32216_1     debug   sbatch cschmid7  R       0:03      1 bhg0001
           32216_2     debug   sbatch cschmid7  R       0:03      1 bhg0001
           32216_3     debug   sbatch cschmid7  R       0:03      1 bhg0001
           32213_1     debug   sbatch cschmid7  R       0:39      1 bhg0001
           32213_2     debug   sbatch cschmid7  R       0:39      1 bhg0001
           32213_3     debug   sbatch cschmid7  R       0:39      1 bhg0001
           32208_4     debug   sbatch cschmid7  R       0:49      1 bhg0001
           32208_5     debug   sbatch cschmid7  R       0:49      1 bhg0001
           32208_1     debug   sbatch cschmid7  R       0:50      1 bhg0001
           32208_2     debug   sbatch cschmid7  R       0:50      1 bhg0001
           32208_3     debug   sbatch cschmid7  R       0:50      1 bhg0001

12/07/2015 09:44:17 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ squeue -V
slurm 14.11.9

12/07/2015 09:48:43 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.6 (Santiago)

12/07/2015 09:51:03 AM
[/var/home/cschmid7_local]
[cschmid7_local@bhdev]$ grep -v '^#' /software/slurm-dev/etc/slurm.conf | more

ControlMachine=bhdev
ControlAddr=192.168.223.24
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
EnforcePartLimits=YES
Epilog=/software/slurm-dev/sbin/slurm-epilog.bash
GresTypes=gpu,mic
JobSubmitPlugins=qos_part
KillOnBadExit=1
MpiDefault=none
ProctrackType=proctrack/linuxproc
Prolog=/software/slurm-dev/sbin/slurm-prolog.bash
PropagateResourceLimits=NONE
RebootProgram="/sbin/shutdown -r +5 Slurm ordered reboot"
ReturnToService=1
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm
SlurmUser=slurm
SrunEpilog=/software/slurm-dev/sbin/step-epilog.bash
SrunProlog=/software/slurm-dev/sbin/step-prolog.bash
StateSaveLocation=/software/slurm-dev/state
SwitchType=switch/none
TaskEpilog=/software/slurm-dev/bin/step-epilog.bash
TaskPlugin=task/cgroup
TaskProlog=/software/slurm-dev/bin/step-prolog.bash
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
DefMemPerCPU=2048
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityDecayHalfLife=5-0
PriorityCalcPeriod=00:05:00
PriorityFavorSmall=no
PriorityMaxAge=7-0
PriorityWeightAge=200
PriorityWeightFairshare=500
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=200
PreemptMode=REQUEUE
PreemptType=preempt/qos
AccountingStorageEnforce=all
AccountingStorageHost=localhost
AccountingStorageLoc=slurm-acct
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
ClusterName=bluehive-dev
JobCompHost=localhost
JobCompLoc=/software/slurm-dev/data/jobcomp.txt
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmSchedLogFile= /var/log/slurm/slurmsched.log
SlurmSchedLogLevel=3
ResumeTimeout=900

NodeName=bhc0017 CPUs=24 RealMemory=64530 Sockets=2 CoresPerSocket=12 
ThreadsPerCore=1 TmpDisk=2048 Weight=500 State=UNKNOWN
NodeName=bhg0001 CPUs=24 RealMemory=64530 Sockets=2 CoresPerSocket=12 
ThreadsPerCore=1 Gres=gpu:2 TmpDisk=2048 Weight=600 State=UN
KNOWN
NodeName=bhp0001 CPUs=24 RealMemory=64530 Sockets=2 CoresPerSocket=12 
ThreadsPerCore=1 Gres=mic:2 TmpDisk=2048 Weight=1100 State=U
NKNOWN
NodeName=bhx0101 CPUs=12 RealMemory=20480 Sockets=2 CoresPersocket=6 
ThreadsPerCore=1 TmpDisk=2048 Weight=1100 State=UNKNOWN

PartitionName=DEFAULT State=UP DefaultTime=00:00:05 Shared=No 
PreemptMode=REQUEUE
PartitionName=standard Nodes=bhc0017 Default=YES MaxTime=120:00:00 Priority=1000
PartitionName=interactive Nodes=bhg0001,bhp0001 MaxCPUsPerNode=22 
MaxTime=8:00:00 Priority=1000
PartitionName=debug Nodes=bhp0001,bhg0001 MaxCPUsPerNode=22 MaxTime=0:30:00 
Priority=1000
PartitionName=gpu Nodes=bhg0001 MaxCPUsPerNode=24 MaxTime=120:00:00 
Priority=1000
PartitionName=phi Nodes=bhp0001 MaxTime=120:00:00 Priority=1000
PartitionName=visual Nodes=bhx0101 MaxTime=120:00:00 Priority=1000
PartitionName=preempt Nodes=bhc0017,bhg0001 MaxTime=24:00:00 Priority=100











Carl Schmidtmann
Center for Integrated Research Computing
University of Rochester

[slurm-dev] qos - MaxSubmitPU

Reply via email to