Re: [slurm-dev] Simple QoS for Extending Max Walltime Limit

Danny Auble Mon, 31 Oct 2011 13:20:51 -0700

Look at the sacctmgr man page on the subject

http://www.schedmd.com/slurmdocs/sacctmgr.html


Look for SPECIFICATIONS FOR QOS

On Monday October 31 2011 1:17:17 PM Lyn Gerner wrote:
> Certainly; how about the Flags?
> 
> Thanks again,
> Lyn
> 
> On Mon, Oct 31, 2011 at 1:08 PM, Moe Jette <[email protected]> wrote:
> 
> > SLURM's QOS and resource limits web pages describe most of this:
> > http://www.schedmd.com/**slurmdocs/qos.html<http://www.schedmd.com/slurmdocs/qos.html>
> > http://www.schedmd.com/**slurmdocs/resource_limits.html<http://www.schedmd.com/slurmdocs/resource_limits.html>
> >
> >
> > Quoting Lyn Gerner <[email protected]>:
> >
> >  PS: Moe, is there a related document?  Couldn't find anything obvious.
> >>
> >> Thanks,
> >> Lyn
> >>
> >> On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>**
> >> wrote:
> >>
> >>  Great, thanks Moe.
> >>>
> >>>
> >>> On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:
> >>>
> >>>  This works for me.
> >>>> What version of SLURM are you running?
> >>>> You might want to look at your SlurmctldLogFile.
> >>>>
> >>>> Lyn,
> >>>> You can use the QOS mechanism was Matt is with flags (e.g.
> >>>> "Flags=PartitionTimeLimit") to override partition time and/or size
> >>>> limits.
> >>>>
> >>>>
> >>>> Quoting Matteo Guglielmi <[email protected]>:
> >>>>
> >>>>  Dear All,
> >>>>>
> >>>>> I'm trying to create a simple qos called 1week which
> >>>>> I would like to associate to those users who do need
> >>>>> to run for one week instead of 2 days at maximum:
> >>>>>
> >>>>> ### slurm.conf ###
> >>>>> EnforcePartLimits=YES
> >>>>> TaskPlugin=task/affinity
> >>>>> TaskPluginParam=Sched
> >>>>> TopologyPlugin=topology/none
> >>>>> TrackWCKey=no
> >>>>> SchedulerType=sched/backfill
> >>>>> SelectType=select/cons_res
> >>>>> SelectTypeParameters=CR_Core_****Memory
> >>>>> PriorityType=priority/****multifactor
> >>>>>
> >>>>> PriorityDecayHalfLife=7-0
> >>>>> PriorityCalcPeriod=5
> >>>>> PriorityFavorSmall=YES
> >>>>> PriorityMaxAge=7-0
> >>>>> PriorityUsageResetPeriod=NONE
> >>>>> PriorityWeightAge=1000
> >>>>> PriorityWeightFairshare=1000
> >>>>> PriorityWeightJobSize=10000
> >>>>> PriorityWeightPartition=10000
> >>>>> PriorityWeightQOS=10000
> >>>>> AccountingStorageEnforce=****limits,qos
> >>>>> AccountingStorageType=****accounting_storage/slurmdbd
> >>>>> JobCompType=jobcomp/none
> >>>>> JobAcctGatherType=jobacct_****gather/linux
> >>>>> PreemptMode=suspend,gang
> >>>>> PreemptType=preempt/partition_****prio
> >>>>>
> >>>>>
> >>>>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
> >>>>>
> >>>>> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
> >>>>> ThreadsPerCore=1 RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
> >>>>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
> >>>>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
> >>>>>
> >>>>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
> >>>>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
> >>>>> PartitionName=batch   Nodes=foff[01-13] Default=YES
> >>>>> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
> >>>>> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
> >>>>> #################
> >>>>>
> >>>>> sacctmgr list associations format=Account,Cluster,User,**
> >>>>> Fairshare,Partition,****defaultqos,qos tree withd
> >>>>>
> >>>>>
> >>>>>            Account    Cluster       User     Share  Partition   Def QOS
> >>>>>                 QOS
> >>>>> -------------------- ---------- ---------- --------- ----------
> >>>>> --------- --------------------
> >>>>> root                     superb                    1
> >>>>>               normal
> >>>>>  root                    superb       root         1
> >>>>>               normal
> >>>>>  sb                      superb                    1
> >>>>>               normal
> >>>>>  sb                     superb   belushki         1      batch
> >>>>>              normal
> >>>>>  sb                     superb     fiocco         1      batch
> >>>>>              normal
> >>>>>  gr-fo                  superb                    1
> >>>>>               normal
> >>>>>  gr-fo                 superb   belushki         1      foff1
> >>>>>              normal
> >>>>>  gr-fo                 superb   belushki         1      foff2
> >>>>>              normal
> >>>>>  gr-fo                 superb     fiocco         1      foff1
> >>>>>              normal
> >>>>>  gr-fo                 superb     fiocco         1      foff2
> >>>>>              normal
> >>>>>
> >>>>>
> >>>>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100
> >>>>> PreemptMode=Cluster
> >>>>> Flags=PartitionTimeLimit
> >>>>>
> >>>>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
> >>>>>
> >>>>> sacctmgr list associations format=Account,Cluster,User,**
> >>>>> Fairshare,Partition,****defaultqos,qos tree withd
> >>>>>
> >>>>>
> >>>>>            Account    Cluster       User     Share  Partition   Def QOS
> >>>>>                 QOS
> >>>>> -------------------- ---------- ---------- --------- ----------
> >>>>> --------- --------------------
> >>>>> root                     superb                    1
> >>>>>               normal
> >>>>>  root                    superb       root         1
> >>>>>               normal
> >>>>>  sb                      superb                    1
> >>>>>               normal
> >>>>>  sb                     superb   belushki         1      batch
> >>>>>              normal
> >>>>>  sb                     superb     fiocco         1      batch
> >>>>>              normal
> >>>>>  gr-fo                  superb                    1
> >>>>>               normal
> >>>>>  gr-fo                 superb   belushki         1      foff1
> >>>>>        1week,normal
> >>>>>  gr-fo                 superb   belushki         1      foff2
> >>>>>        1week,normal
> >>>>>  gr-fo                 superb     fiocco         1      foff1
> >>>>>              normal
> >>>>>  gr-fo                 superb     fiocco         1      foff2
> >>>>>              normal
> >>>>>
> >>>>> /etc/init.d/slurmd restart (same command was issued on all nodes too)
> >>>>>
> >>>>> su - belushki
> >>>>>
> >>>>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
> >>>>> srun: error: Unable to allocate resources: Requested time limit is
> >>>>> invalid (exceeds some limit)
> >>>>>
> >>>>>
> >>>>> Could you tell me what I still miss in order to make it working for
> >>>>> user
> >>>>> "belushki"?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> --matt
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
> >
> >

Re: [slurm-dev] Simple QoS for Extending Max Walltime Limit

Reply via email to