This works for me.
What version of SLURM are you running?
You might want to look at your SlurmctldLogFile.
Lyn,
You can use the QOS mechanism was Matt is with flags (e.g.
"Flags=PartitionTimeLimit") to override partition time and/or size
limits.
Quoting Matteo Guglielmi <[email protected]>:
Dear All,
I'm trying to create a simple qos called 1week which
I would like to associate to those users who do need
to run for one week instead of 2 days at maximum:
### slurm.conf ###
EnforcePartLimits=YES
TaskPlugin=task/affinity
TaskPluginParam=Sched
TopologyPlugin=topology/none
TrackWCKey=no
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=5
PriorityFavorSmall=YES
PriorityMaxAge=7-0
PriorityUsageResetPeriod=NONE
PriorityWeightAge=1000
PriorityWeightFairshare=1000
PriorityWeightJobSize=10000
PriorityWeightPartition=10000
PriorityWeightQOS=10000
AccountingStorageEnforce=limits,qos
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio
NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2
ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch Nodes=foff[01-13] Default=YES
PartitionName=foff1 Nodes=foff[01-08] Priority=1000
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
#################
sacctmgr list associations
format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree
withd
Account Cluster User Share Partition
Def QOS QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
normal
gr-fo superb belushki 1 foff2
normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
sacctmgr add qos Name=1week MaxWall=7-0 Priority=100
PreemptMode=Cluster Flags=PartitionTimeLimit
sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
sacctmgr list associations
format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree
withd
Account Cluster User Share Partition
Def QOS QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
1week,normal
gr-fo superb belushki 1 foff2
1week,normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
/etc/init.d/slurmd restart (same command was issued on all nodes too)
su - belushki
srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
srun: error: Unable to allocate resources: Requested time limit is
invalid (exceeds some limit)
Could you tell me what I still miss in order to make it working for
user "belushki"?
Thanks,
--matt