Certainly; how about the Flags? Thanks again, Lyn
On Mon, Oct 31, 2011 at 1:08 PM, Moe Jette <[email protected]> wrote: > SLURM's QOS and resource limits web pages describe most of this: > http://www.schedmd.com/**slurmdocs/qos.html<http://www.schedmd.com/slurmdocs/qos.html> > http://www.schedmd.com/**slurmdocs/resource_limits.html<http://www.schedmd.com/slurmdocs/resource_limits.html> > > > Quoting Lyn Gerner <[email protected]>: > > PS: Moe, is there a related document? Couldn't find anything obvious. >> >> Thanks, >> Lyn >> >> On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>** >> wrote: >> >> Great, thanks Moe. >>> >>> >>> On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote: >>> >>> This works for me. >>>> What version of SLURM are you running? >>>> You might want to look at your SlurmctldLogFile. >>>> >>>> Lyn, >>>> You can use the QOS mechanism was Matt is with flags (e.g. >>>> "Flags=PartitionTimeLimit") to override partition time and/or size >>>> limits. >>>> >>>> >>>> Quoting Matteo Guglielmi <[email protected]>: >>>> >>>> Dear All, >>>>> >>>>> I'm trying to create a simple qos called 1week which >>>>> I would like to associate to those users who do need >>>>> to run for one week instead of 2 days at maximum: >>>>> >>>>> ### slurm.conf ### >>>>> EnforcePartLimits=YES >>>>> TaskPlugin=task/affinity >>>>> TaskPluginParam=Sched >>>>> TopologyPlugin=topology/none >>>>> TrackWCKey=no >>>>> SchedulerType=sched/backfill >>>>> SelectType=select/cons_res >>>>> SelectTypeParameters=CR_Core_****Memory >>>>> PriorityType=priority/****multifactor >>>>> >>>>> PriorityDecayHalfLife=7-0 >>>>> PriorityCalcPeriod=5 >>>>> PriorityFavorSmall=YES >>>>> PriorityMaxAge=7-0 >>>>> PriorityUsageResetPeriod=NONE >>>>> PriorityWeightAge=1000 >>>>> PriorityWeightFairshare=1000 >>>>> PriorityWeightJobSize=10000 >>>>> PriorityWeightPartition=10000 >>>>> PriorityWeightQOS=10000 >>>>> AccountingStorageEnforce=****limits,qos >>>>> AccountingStorageType=****accounting_storage/slurmdbd >>>>> JobCompType=jobcomp/none >>>>> JobAcctGatherType=jobacct_****gather/linux >>>>> PreemptMode=suspend,gang >>>>> PreemptType=preempt/partition_****prio >>>>> >>>>> >>>>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN >>>>> >>>>> NodeName=foff[01-08] Procs=8 CoresPerSocket=4 Sockets=2 >>>>> ThreadsPerCore=1 RealMemory=7000 Weight=1 Feature=X5482,foff,fofflm >>>>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 >>>>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm >>>>> >>>>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED >>>>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO >>>>> PartitionName=batch Nodes=foff[01-13] Default=YES >>>>> PartitionName=foff1 Nodes=foff[01-08] Priority=1000 >>>>> PartitionName=foff2 Nodes=foff[09-13] Priority=1000 >>>>> ################# >>>>> >>>>> sacctmgr list associations format=Account,Cluster,User,** >>>>> Fairshare,Partition,****defaultqos,qos tree withd >>>>> >>>>> >>>>> Account Cluster User Share Partition Def QOS >>>>> QOS >>>>> -------------------- ---------- ---------- --------- ---------- >>>>> --------- -------------------- >>>>> root superb 1 >>>>> normal >>>>> root superb root 1 >>>>> normal >>>>> sb superb 1 >>>>> normal >>>>> sb superb belushki 1 batch >>>>> normal >>>>> sb superb fiocco 1 batch >>>>> normal >>>>> gr-fo superb 1 >>>>> normal >>>>> gr-fo superb belushki 1 foff1 >>>>> normal >>>>> gr-fo superb belushki 1 foff2 >>>>> normal >>>>> gr-fo superb fiocco 1 foff1 >>>>> normal >>>>> gr-fo superb fiocco 1 foff2 >>>>> normal >>>>> >>>>> >>>>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 >>>>> PreemptMode=Cluster >>>>> Flags=PartitionTimeLimit >>>>> >>>>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week >>>>> >>>>> sacctmgr list associations format=Account,Cluster,User,** >>>>> Fairshare,Partition,****defaultqos,qos tree withd >>>>> >>>>> >>>>> Account Cluster User Share Partition Def QOS >>>>> QOS >>>>> -------------------- ---------- ---------- --------- ---------- >>>>> --------- -------------------- >>>>> root superb 1 >>>>> normal >>>>> root superb root 1 >>>>> normal >>>>> sb superb 1 >>>>> normal >>>>> sb superb belushki 1 batch >>>>> normal >>>>> sb superb fiocco 1 batch >>>>> normal >>>>> gr-fo superb 1 >>>>> normal >>>>> gr-fo superb belushki 1 foff1 >>>>> 1week,normal >>>>> gr-fo superb belushki 1 foff2 >>>>> 1week,normal >>>>> gr-fo superb fiocco 1 foff1 >>>>> normal >>>>> gr-fo superb fiocco 1 foff2 >>>>> normal >>>>> >>>>> /etc/init.d/slurmd restart (same command was issued on all nodes too) >>>>> >>>>> su - belushki >>>>> >>>>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname >>>>> srun: error: Unable to allocate resources: Requested time limit is >>>>> invalid (exceeds some limit) >>>>> >>>>> >>>>> Could you tell me what I still miss in order to make it working for >>>>> user >>>>> "belushki"? >>>>> >>>>> Thanks, >>>>> >>>>> --matt >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> > > >
