Great, thanks Moe. On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:
> This works for me. > What version of SLURM are you running? > You might want to look at your SlurmctldLogFile. > > Lyn, > You can use the QOS mechanism was Matt is with flags (e.g. > "Flags=PartitionTimeLimit") to override partition time and/or size limits. > > > Quoting Matteo Guglielmi <[email protected]>: > >> Dear All, >> >> I'm trying to create a simple qos called 1week which >> I would like to associate to those users who do need >> to run for one week instead of 2 days at maximum: >> >> ### slurm.conf ### >> EnforcePartLimits=YES >> TaskPlugin=task/affinity >> TaskPluginParam=Sched >> TopologyPlugin=topology/none >> TrackWCKey=no >> SchedulerType=sched/backfill >> SelectType=select/cons_res >> SelectTypeParameters=CR_Core_**Memory >> PriorityType=priority/**multifactor >> PriorityDecayHalfLife=7-0 >> PriorityCalcPeriod=5 >> PriorityFavorSmall=YES >> PriorityMaxAge=7-0 >> PriorityUsageResetPeriod=NONE >> PriorityWeightAge=1000 >> PriorityWeightFairshare=1000 >> PriorityWeightJobSize=10000 >> PriorityWeightPartition=10000 >> PriorityWeightQOS=10000 >> AccountingStorageEnforce=**limits,qos >> AccountingStorageType=**accounting_storage/slurmdbd >> JobCompType=jobcomp/none >> JobAcctGatherType=jobacct_**gather/linux >> PreemptMode=suspend,gang >> PreemptType=preempt/partition_**prio >> >> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN >> >> NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 >> ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm >> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 >> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm >> >> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED >> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO >> PartitionName=batch Nodes=foff[01-13] Default=YES >> PartitionName=foff1 Nodes=foff[01-08] Priority=1000 >> PartitionName=foff2 Nodes=foff[09-13] Priority=1000 >> ################# >> >> sacctmgr list associations format=Account,Cluster,User,** >> Fairshare,Partition,**defaultqos,qos tree withd >> >> Account Cluster User Share Partition Def QOS >> QOS >> -------------------- ---------- ---------- --------- ---------- --------- >> -------------------- >> root superb 1 >> normal >> root superb root 1 >> normal >> sb superb 1 >> normal >> sb superb belushki 1 batch >> normal >> sb superb fiocco 1 batch >> normal >> gr-fo superb 1 >> normal >> gr-fo superb belushki 1 foff1 >> normal >> gr-fo superb belushki 1 foff2 >> normal >> gr-fo superb fiocco 1 foff1 >> normal >> gr-fo superb fiocco 1 foff2 >> normal >> >> >> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster >> Flags=PartitionTimeLimit >> >> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week >> >> sacctmgr list associations format=Account,Cluster,User,** >> Fairshare,Partition,**defaultqos,qos tree withd >> >> Account Cluster User Share Partition Def QOS >> QOS >> -------------------- ---------- ---------- --------- ---------- --------- >> -------------------- >> root superb 1 >> normal >> root superb root 1 >> normal >> sb superb 1 >> normal >> sb superb belushki 1 batch >> normal >> sb superb fiocco 1 batch >> normal >> gr-fo superb 1 >> normal >> gr-fo superb belushki 1 foff1 >> 1week,normal >> gr-fo superb belushki 1 foff2 >> 1week,normal >> gr-fo superb fiocco 1 foff1 >> normal >> gr-fo superb fiocco 1 foff2 >> normal >> >> /etc/init.d/slurmd restart (same command was issued on all nodes too) >> >> su - belushki >> >> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname >> srun: error: Unable to allocate resources: Requested time limit is >> invalid (exceeds some limit) >> >> >> Could you tell me what I still miss in order to make it working for user >> "belushki"? >> >> Thanks, >> >> --matt >> >> > > >
