This works for me.
What version of SLURM are you running?
You might want to look at your SlurmctldLogFile.

Lyn,
You can use the QOS mechanism was Matt is with flags (e.g. "Flags=PartitionTimeLimit") to override partition time and/or size limits.

Quoting Matteo Guglielmi <[email protected]>:
Dear All,

I'm trying to create a simple qos called 1week which
I would like to associate to those users who do need
to run for one week instead of 2 days at maximum:

### slurm.conf ###
EnforcePartLimits=YES
TaskPlugin=task/affinity
TaskPluginParam=Sched
TopologyPlugin=topology/none
TrackWCKey=no
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=5
PriorityFavorSmall=YES
PriorityMaxAge=7-0
PriorityUsageResetPeriod=NONE
PriorityWeightAge=1000
PriorityWeightFairshare=1000
PriorityWeightJobSize=10000
PriorityWeightPartition=10000
PriorityWeightQOS=10000
AccountingStorageEnforce=limits,qos
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio

NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN

NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm

PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch   Nodes=foff[01-13] Default=YES
PartitionName=foff1   Nodes=foff[01-08] Priority=1000
PartitionName=foff2   Nodes=foff[09-13] Priority=1000
#################

sacctmgr list associations format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd

Account Cluster User Share Partition Def QOS QOS -------------------- ---------- ---------- --------- ---------- --------- -------------------- root superb 1 normal root superb root 1 normal sb superb 1 normal sb superb belushki 1 batch normal sb superb fiocco 1 batch normal gr-fo superb 1 normal gr-fo superb belushki 1 foff1 normal gr-fo superb belushki 1 foff2 normal gr-fo superb fiocco 1 foff1 normal gr-fo superb fiocco 1 foff2 normal


sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster Flags=PartitionTimeLimit

sacctmgr modify user name=belushki Account=gr-fo set qos+=1week

sacctmgr list associations format=Account,Cluster,User,Fairshare,Partition,defaultqos,qos tree withd

Account Cluster User Share Partition Def QOS QOS -------------------- ---------- ---------- --------- ---------- --------- -------------------- root superb 1 normal root superb root 1 normal sb superb 1 normal sb superb belushki 1 batch normal sb superb fiocco 1 batch normal gr-fo superb 1 normal gr-fo superb belushki 1 foff1 1week,normal gr-fo superb belushki 1 foff2 1week,normal gr-fo superb fiocco 1 foff1 normal gr-fo superb fiocco 1 foff2 normal

/etc/init.d/slurmd restart (same command was issued on all nodes too)

su - belushki

srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
srun: error: Unable to allocate resources: Requested time limit is invalid (exceeds some limit)


Could you tell me what I still miss in order to make it working for user "belushki"?

Thanks,

--matt




Reply via email to