From "man sacctmgr":
SPECIFICATIONS FOR QOS
Flags Used by the slurmctld to override or enforce certain
characteris?
tics.
Valid options are
EnforceUsageThreshold
If set, and the QOS also has a
UsageThreshold, any jobs
submitted with this QOS that fall below the
UsageThreshold
will be held until their Fairshare Usage goes
above the
Threshold.
NoReserve
If this flag is set and backfill scheduling is
used, jobs
using this QOS will not reserve resources in
the backfill
schedule's map of resources allocated through
time. This
flag is intended for use with a QOS that may be
preempted
by jobs associated with all other QOS (e.g
use with a
"standby" QOS). If the allocated is used with a
QOS which
can not be preempted by all other QOS, it could
result in
starvation of larger jobs.
PartitionMaxNodes
If set jobs using this QOS will be able to
override the
requested partition's MaxNodes limit.
PartitionMinNodes
If set jobs using this QOS will be able to
override the
requested partition's MinNodes limit.
PartitionTimeLimit
If set jobs using this QOS will be able to
override the
requested partition's TimeLimit.
Quoting Lyn Gerner <[email protected]>:
Certainly; how about the Flags?
Thanks again,
Lyn
On Mon, Oct 31, 2011 at 1:08 PM, Moe Jette <[email protected]> wrote:
SLURM's QOS and resource limits web pages describe most of this:
http://www.schedmd.com/**slurmdocs/qos.html<http://www.schedmd.com/slurmdocs/qos.html>
http://www.schedmd.com/**slurmdocs/resource_limits.html<http://www.schedmd.com/slurmdocs/resource_limits.html>
Quoting Lyn Gerner <[email protected]>:
PS: Moe, is there a related document? Couldn't find anything obvious.
Thanks,
Lyn
On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>**
wrote:
Great, thanks Moe.
On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:
This works for me.
What version of SLURM are you running?
You might want to look at your SlurmctldLogFile.
Lyn,
You can use the QOS mechanism was Matt is with flags (e.g.
"Flags=PartitionTimeLimit") to override partition time and/or size
limits.
Quoting Matteo Guglielmi <[email protected]>:
Dear All,
I'm trying to create a simple qos called 1week which
I would like to associate to those users who do need
to run for one week instead of 2 days at maximum:
### slurm.conf ###
EnforcePartLimits=YES
TaskPlugin=task/affinity
TaskPluginParam=Sched
TopologyPlugin=topology/none
TrackWCKey=no
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_****Memory
PriorityType=priority/****multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=5
PriorityFavorSmall=YES
PriorityMaxAge=7-0
PriorityUsageResetPeriod=NONE
PriorityWeightAge=1000
PriorityWeightFairshare=1000
PriorityWeightJobSize=10000
PriorityWeightPartition=10000
PriorityWeightQOS=10000
AccountingStorageEnforce=****limits,qos
AccountingStorageType=****accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_****gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_****prio
NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2
ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch Nodes=foff[01-13] Default=YES
PartitionName=foff1 Nodes=foff[01-08] Priority=1000
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
#################
sacctmgr list associations format=Account,Cluster,User,**
Fairshare,Partition,****defaultqos,qos tree withd
Account Cluster User Share Partition Def QOS
QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
normal
gr-fo superb belushki 1 foff2
normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
sacctmgr add qos Name=1week MaxWall=7-0 Priority=100
PreemptMode=Cluster
Flags=PartitionTimeLimit
sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
sacctmgr list associations format=Account,Cluster,User,**
Fairshare,Partition,****defaultqos,qos tree withd
Account Cluster User Share Partition Def QOS
QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
1week,normal
gr-fo superb belushki 1 foff2
1week,normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
/etc/init.d/slurmd restart (same command was issued on all nodes too)
su - belushki
srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
srun: error: Unable to allocate resources: Requested time limit is
invalid (exceeds some limit)
Could you tell me what I still miss in order to make it working for
user
"belushki"?
Thanks,
--matt