Slurm 14.11.5 w/ slurmdbd & mysql It seems, both from testing (see below) and through examination of source, that QOS UsageFactor (typically qos_ptr->usage_factor in code) is NOT used to adjust actual usage but ONLY for adjusting items like the decay in the priority/multifactor plugin and (possibly) in modifying the behavior of backfill operations.
The documentation[1] suggests that usage in the accounting database will be adjusted... "UsageFactor Usage factor when running with this QOS (i.e. .5 would make it use only half the time as normal in accounting and 2 would make it use twice as much.)" Is this a bug, documentation 'error' or simply a misunderstanding of the functionality? What we want is to be able to "charge" different amounts for different partitions and/or QOS's that provide access that differs from 'normal' access (ie. priority, resource, etc). Having the job "usage" adjusted automatically, as the AllocCPUs value is for Shared=EXCLUSIVE partitions is what we are after (ask for 1 cpu in EXCLUSIVE charged for ALL cpus automatically). Thanks, -- Trevor [1] - http://slurm.schedmd.com/qos.html#qos_other =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Example =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- $ egrep -i "qos|gpu-shared" /etc/slurm/slurm.conf PriorityWeightQOS=0 AccountingStorageEnforce=association,limits,qos PartitionName=gpu-shared Nodes=tux-1,tux-2 Default=NO Shared=FORCE:1 MaxTime=48:00:00 MaxNodes=2 MaxMemPerNode=122880 Priority=1000 AllowQOS=gpu-shared State=UP $ sacctmgr list assoc where user=tcooper format=user,qos User QOS ---------- -------------------- tcooper gpu-shared,normal $ sacctmgr list qos where name=gpu-shared format=name,priority,flags,usagefactor,maxnodes,maxcpusperuser Name Priority Flags UsageFactor MaxNodes MaxCPUsPU ---------- ---------- -------------------- ----------- -------- --------- gpu-shared 0 DenyOnLimit 2.000000 1 24 NOTE: Here we want to charge a premium for GPU's $ sbatch -p gpu-shared --qos=gpu-shared --nodes=1 --ntasks-per-node=1 --gres=gpu:1 -t 00:10:00 sleep_for_walltime.run Submitted batch job 451721 NOTE: Job actually sleeps for walltime request less 5 seconds to prevent being timed out thus this job should have a walltime of 09:55 plus a second or two for prolog/epilog. $ scontrol show job 451721 JobId=451721 JobName=sleep_for_walltime UserId=tcooper(500) GroupId=tcooper(500) Priority=653 Nice=0 Account=tcooper QOS=gpu-shared JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:15 TimeLimit=00:10:00 TimeMin=N/A SubmitTime=2015-05-06T10:51:02 EligibleTime=2015-05-06T10:51:02 StartTime=2015-05-06T10:51:04 EndTime=2015-05-06T11:01:04 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=gpu-shared AllocNode:Sid=tux-ln1:18918 ReqNodeList=(null) ExcNodeList=(null) NodeList=tux-1 BatchHost=tux-1 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=5G MinTmpDiskNode=0 Features=(null) Gres=gpu:1 Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/tcooper/jobs/sleep_for_walltime.run WorkDir=/home/tcooper/jobs StdErr=/home/tcooper/jobs/sleep_for_walltime.451721.%N.err StdIn=/dev/null StdOut=/home/tcooper/jobs/sleep_for_walltime.451721.%N.out $ sacct --format=jobid,alloccpus,start,end,cputime,state,exitcode -j 451721 JobID AllocCPUS Start End CPUTime State ExitCode ------------ ---------- ------------------- ------------------- ---------- ---------- -------- 451721 1 2015-05-06T10:51:04 2015-05-06T11:01:00 00:09:56 COMPLETED 0:0 451721.batch 1 2015-05-06T10:51:04 2015-05-06T11:01:00 00:09:56 COMPLETED 0:0 NOTE: Here I expect the CPUTime to be multiplied (ie. doubled) by the QOS->UsageFactor $ sreport cluster UserUtilizationByAccount start=2015-05-06T10:00:00 end=now format=login,account,used -------------------------------------------------------------------------------- Cluster/User/Account Utilization 2015-05-06T10:00:00 - 2015-05-06T11:59:59 (7200 secs) Time reported in CPU Minutes -------------------------------------------------------------------------------- Login Account Used --------- --------------- ---------- tcooper sys200 10 NOTE: Since the previous represents the 'actual' job runtime it's possible the QOS->UsageFactor could be applied in the rollup of usage. If that were the case I expect Used to be multiplied (ie. doubled) by the QOS->UsageFactor.