Hi We’ve just updated our slurm installation to 15.08.11 from 14.11 and have run into that GrpCPUMins/GrpTRESMins no longer seems to be enforced. Has anybody else had this problem?
Reading the release notes from 15.08.11 nothing in particular springs in mind. Do we need to do something special to get the GrpCPUMins/GrpTRESMins enforcement setup right? The documentation <http://slurm.schedmd.com/resource_limits.html> mentions both GrpCPUMins and GrpTRESMins, but I guess that GrpCPUMins is only there for historical reasons... An example: I can submit a four node = 96 core, 24 hour job, even when the account only has less than 1440 cpu minutes left, i.e., less than 1 node hour. Previously the job would have been rejected or left in the jobstate=AssocGrpCPUMinsLimit [root@slurm1 slurm]#scontrol show job 326248 JobId=326248 JobName=run.sh UserId=svalle(6003) GroupId=sdu(3010) Priority=7411 Nice=0 Account=sdutest01_slim QOS=slim WCKey=* JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2016-05-31T15:43:00 EligibleTime=2016-05-31T15:43:00 StartTime=2016-05-31T20:44:58 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=slim AllocNode:Sid=fe2:439 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=s51p[21,24-25,28] NumNodes=4-4 NumCPUs=96 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=96,node=4 Socks/Node=* NtasksPerN:B:S:C=24:0:*:* CoreSpec=* MinCPUsNode=24 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/gpfs/gss1/work/sdutest01/svalle/run.sh WorkDir=/gpfs/gss1/work/sdutest01/svalle StdErr=/gpfs/gss1/work/sdutest01/svalle/slurm-326248.out StdIn=/dev/null StdOut=/gpfs/gss1/work/sdutest01/svalle/slurm-326248.out Power= SICP=0 [root@slurm1 slurm]# sshare -u svalle -l Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------ root 0.000000 14149970808 1.000000 cpu=14155926,mem=0,energy=0,n+ ... sdutest01_slim 1 0.000000 384 0.000000 0.000000 12.275540 cpu=1440 cpu=0,mem=0,energy=0,node=0 sdutest01_slim svalle 100 0.142857 384 0.000000 1.000000 0.739785 0.142857 cpu=0,mem=0,energy=0,node=0 ... [root@slurm1 tmp]# scontrol show config | grep Account AccountingStorageBackupHost = (null) AccountingStorageEnforce = associations,limits,qos,safe,wckeys AccountingStorageHost = slurm1 AccountingStorageLoc = N/A AccountingStoragePort = 6819 AccountingStorageTRES = cpu,mem,energy,node AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreJobComment = Yes [root@slurm1 slurm]# sacctmgr show qos Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MinTRES ---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- ... slim 0 00:00:00 cluster 1.000000 [root@slurm1 ~]# sacctmgr dump deic ... Cluster - 'deic':Fairshare=1:QOS=‘normal' Parent - 'root' ... Account - 'sdutest01_slim':Description='sdutest01_slim':Organization='sdutest01':Fairshare=1:GrpTRESMins=cpu=1440:QOS=‘slim' ... Parent - 'sdutest01_slim' User - 'svalle':DefaultAccount='sysops_workq':Fairshare=100:QOS='+slim’ ... Thanks, Jens Svalgaard Kohrt DeIC Nationale HPC Center, SDU Syddansk Universitet sdu.dk/staff/jesk<http://sdu.dk/staff/jesk>
