PS: Moe, is there a related document? Couldn't find anything obvious. Thanks, Lyn
On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>wrote: > Great, thanks Moe. > > > On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote: > >> This works for me. >> What version of SLURM are you running? >> You might want to look at your SlurmctldLogFile. >> >> Lyn, >> You can use the QOS mechanism was Matt is with flags (e.g. >> "Flags=PartitionTimeLimit") to override partition time and/or size limits. >> >> >> Quoting Matteo Guglielmi <[email protected]>: >> >>> Dear All, >>> >>> I'm trying to create a simple qos called 1week which >>> I would like to associate to those users who do need >>> to run for one week instead of 2 days at maximum: >>> >>> ### slurm.conf ### >>> EnforcePartLimits=YES >>> TaskPlugin=task/affinity >>> TaskPluginParam=Sched >>> TopologyPlugin=topology/none >>> TrackWCKey=no >>> SchedulerType=sched/backfill >>> SelectType=select/cons_res >>> SelectTypeParameters=CR_Core_**Memory >>> PriorityType=priority/**multifactor >>> PriorityDecayHalfLife=7-0 >>> PriorityCalcPeriod=5 >>> PriorityFavorSmall=YES >>> PriorityMaxAge=7-0 >>> PriorityUsageResetPeriod=NONE >>> PriorityWeightAge=1000 >>> PriorityWeightFairshare=1000 >>> PriorityWeightJobSize=10000 >>> PriorityWeightPartition=10000 >>> PriorityWeightQOS=10000 >>> AccountingStorageEnforce=**limits,qos >>> AccountingStorageType=**accounting_storage/slurmdbd >>> JobCompType=jobcomp/none >>> JobAcctGatherType=jobacct_**gather/linux >>> PreemptMode=suspend,gang >>> PreemptType=preempt/partition_**prio >>> >>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN >>> >>> NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 >>> ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm >>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 >>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm >>> >>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED >>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO >>> PartitionName=batch Nodes=foff[01-13] Default=YES >>> PartitionName=foff1 Nodes=foff[01-08] Priority=1000 >>> PartitionName=foff2 Nodes=foff[09-13] Priority=1000 >>> ################# >>> >>> sacctmgr list associations format=Account,Cluster,User,** >>> Fairshare,Partition,**defaultqos,qos tree withd >>> >>> Account Cluster User Share Partition Def QOS >>> QOS >>> -------------------- ---------- ---------- --------- ---------- >>> --------- -------------------- >>> root superb 1 >>> normal >>> root superb root 1 >>> normal >>> sb superb 1 >>> normal >>> sb superb belushki 1 batch >>> normal >>> sb superb fiocco 1 batch >>> normal >>> gr-fo superb 1 >>> normal >>> gr-fo superb belushki 1 foff1 >>> normal >>> gr-fo superb belushki 1 foff2 >>> normal >>> gr-fo superb fiocco 1 foff1 >>> normal >>> gr-fo superb fiocco 1 foff2 >>> normal >>> >>> >>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster >>> Flags=PartitionTimeLimit >>> >>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week >>> >>> sacctmgr list associations format=Account,Cluster,User,** >>> Fairshare,Partition,**defaultqos,qos tree withd >>> >>> Account Cluster User Share Partition Def QOS >>> QOS >>> -------------------- ---------- ---------- --------- ---------- >>> --------- -------------------- >>> root superb 1 >>> normal >>> root superb root 1 >>> normal >>> sb superb 1 >>> normal >>> sb superb belushki 1 batch >>> normal >>> sb superb fiocco 1 batch >>> normal >>> gr-fo superb 1 >>> normal >>> gr-fo superb belushki 1 foff1 >>> 1week,normal >>> gr-fo superb belushki 1 foff2 >>> 1week,normal >>> gr-fo superb fiocco 1 foff1 >>> normal >>> gr-fo superb fiocco 1 foff2 >>> normal >>> >>> /etc/init.d/slurmd restart (same command was issued on all nodes too) >>> >>> su - belushki >>> >>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname >>> srun: error: Unable to allocate resources: Requested time limit is >>> invalid (exceeds some limit) >>> >>> >>> Could you tell me what I still miss in order to make it working for user >>> "belushki"? >>> >>> Thanks, >>> >>> --matt >>> >>> >> >> >> >
