Dear All,
I'm trying to create a simple qos called 1week which
I would like to associate to those users who do need
to run for one week instead of 2 days at maximum:
### slurm.conf ###
EnforcePartLimits=YES
TaskPlugin=task/affinity
TaskPluginParam=Sched
TopologyPlugin=topology/none
TrackWCKey=no
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=5
PriorityFavorSmall=YES
PriorityMaxAge=7-0
PriorityUsageResetPeriod=NONE
PriorityWeightAge=1000
PriorityWeightFairshare=1000
PriorityWeightJobSize=10000
PriorityWeightPartition=10000
PriorityWeightQOS=10000
AccountingStorageEnforce=limits,qos
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio
NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 ThreadsPerCore=1
RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1
RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED MaxTime=2-0
PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch Nodes=foff[01-13] Default=YES
PartitionName=foff1 Nodes=foff[01-08] Priority=1000
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
#################
sacctmgr list associations
format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd
Account Cluster User Share Partition Def QOS
QOS
-------------------- ---------- ---------- --------- ---------- ---------
--------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
normal
gr-fo superb belushki 1 foff2
normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster
Flags=PartitionTimeLimit
sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
sacctmgr list associations
format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd
Account Cluster User Share Partition Def QOS
QOS
-------------------- ---------- ---------- --------- ---------- ---------
--------------------
root superb 1
normal
root superb root 1
normal
sb superb 1
normal
sb superb belushki 1 batch
normal
sb superb fiocco 1 batch
normal
gr-fo superb 1
normal
gr-fo superb belushki 1 foff1
1week,normal
gr-fo superb belushki 1 foff2
1week,normal
gr-fo superb fiocco 1 foff1
normal
gr-fo superb fiocco 1 foff2
normal
/etc/init.d/slurmd restart (same command was issued on all nodes too)
su - belushki
srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
srun: error: Unable to allocate resources: Requested time limit is invalid
(exceeds some limit)
Could you tell me what I still miss in order to make it working for user
"belushki"?
Thanks,
--matt