Hi

We’ve just updated our slurm installation to 15.08.11 from 14.11 and have run 
into that GrpCPUMins/GrpTRESMins no longer seems to be enforced.
Has anybody else had this problem?

Reading the release notes from 15.08.11 nothing in particular springs in mind.
Do we need to do something special to get the GrpCPUMins/GrpTRESMins 
enforcement setup right?

The documentation <http://slurm.schedmd.com/resource_limits.html> mentions both 
GrpCPUMins and GrpTRESMins, but I guess that GrpCPUMins is only there for 
historical reasons...

An example:
I can submit a four node = 96 core, 24 hour job, even when the account only has 
less than 1440 cpu minutes left, i.e., less than 1 node hour.
Previously the job would have been rejected or left in the 
jobstate=AssocGrpCPUMinsLimit

[root@slurm1 slurm]#scontrol show job 326248
JobId=326248 JobName=run.sh
   UserId=svalle(6003) GroupId=sdu(3010)
   Priority=7411 Nice=0 Account=sdutest01_slim QOS=slim WCKey=*
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2016-05-31T15:43:00 EligibleTime=2016-05-31T15:43:00
   StartTime=2016-05-31T20:44:58 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=slim AllocNode:Sid=fe2:439
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null) SchedNodeList=s51p[21,24-25,28]
   NumNodes=4-4 NumCPUs=96 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=96,node=4
   Socks/Node=* NtasksPerN:B:S:C=24:0:*:* CoreSpec=*
   MinCPUsNode=24 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/gpfs/gss1/work/sdutest01/svalle/run.sh
   WorkDir=/gpfs/gss1/work/sdutest01/svalle
   StdErr=/gpfs/gss1/work/sdutest01/svalle/slurm-326248.out
   StdIn=/dev/null
   StdOut=/gpfs/gss1/work/sdutest01/svalle/slurm-326248.out
   Power= SICP=0



[root@slurm1 slurm]# sshare -u svalle -l
             Account       User  RawShares  NormShares    RawUsage   NormUsage  
EffectvUsage  FairShare    LevelFS                    GrpTRESMins               
     TRESRunMins
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ---------- ------------------------------ 
------------------------------
root                                          0.000000 14149970808              
    1.000000                                                      
cpu=14155926,mem=0,energy=0,n+

...
 sdutest01_slim                          1    0.000000         384    0.000000  
    0.000000             12.275540                       cpu=1440    
cpu=0,mem=0,energy=0,node=0
  sdutest01_slim         svalle        100    0.142857         384    0.000000  
    1.000000   0.739785   0.142857                                   
cpu=0,mem=0,energy=0,node=0
...


[root@slurm1 tmp]# scontrol show config | grep Account
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations,limits,qos,safe,wckeys
AccountingStorageHost   = slurm1
AccountingStorageLoc    = N/A
AccountingStoragePort   = 6819
AccountingStorageTRES   = cpu,mem,energy,node
AccountingStorageType   = accounting_storage/slurmdbd
AccountingStorageUser   = N/A
AccountingStoreJobComment = Yes

[root@slurm1 slurm]# sacctmgr show qos
      Name   Priority  GraceTime    Preempt PreemptMode                         
           Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins 
GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   
MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU       MinTRES
---------- ---------- ---------- ---------- ----------- 
---------------------------------------- ---------- ----------- ------------- 
------------- ------------- ------- --------- ----------- ------------- 
-------------- ------------- ----------- ------------- --------- ----------- 
-------------
...
      slim          0   00:00:00                cluster                         
                               1.000000


[root@slurm1 ~]# sacctmgr dump deic
...
Cluster - 'deic':Fairshare=1:QOS=‘normal'
Parent - 'root'
...
Account - 
'sdutest01_slim':Description='sdutest01_slim':Organization='sdutest01':Fairshare=1:GrpTRESMins=cpu=1440:QOS=‘slim'
...
Parent - 'sdutest01_slim'
User - 'svalle':DefaultAccount='sysops_workq':Fairshare=100:QOS='+slim’
...


Thanks,

Jens Svalgaard Kohrt
DeIC Nationale HPC Center, SDU
Syddansk Universitet
sdu.dk/staff/jesk<http://sdu.dk/staff/jesk>

Reply via email to