Perfect; thanks, Danny and Moe.

On Mon, Oct 31, 2011 at 1:19 PM, Danny Auble <[email protected]> wrote:

> Look at the sacctmgr man page on the subject
>
> http://www.schedmd.com/slurmdocs/sacctmgr.html
>
> Look for SPECIFICATIONS FOR QOS
>
> On Monday October 31 2011 1:17:17 PM Lyn Gerner wrote:
> > Certainly; how about the Flags?
> >
> > Thanks again,
> > Lyn
> >
> > On Mon, Oct 31, 2011 at 1:08 PM, Moe Jette <[email protected]> wrote:
> >
> > > SLURM's QOS and resource limits web pages describe most of this:
> > > http://www.schedmd.com/**slurmdocs/qos.html<
> http://www.schedmd.com/slurmdocs/qos.html>
> > > http://www.schedmd.com/**slurmdocs/resource_limits.html<
> http://www.schedmd.com/slurmdocs/resource_limits.html>
> > >
> > >
> > > Quoting Lyn Gerner <[email protected]>:
> > >
> > >  PS: Moe, is there a related document?  Couldn't find anything obvious.
> > >>
> > >> Thanks,
> > >> Lyn
> > >>
> > >> On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <
> [email protected]>**
> > >> wrote:
> > >>
> > >>  Great, thanks Moe.
> > >>>
> > >>>
> > >>> On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]>
> wrote:
> > >>>
> > >>>  This works for me.
> > >>>> What version of SLURM are you running?
> > >>>> You might want to look at your SlurmctldLogFile.
> > >>>>
> > >>>> Lyn,
> > >>>> You can use the QOS mechanism was Matt is with flags (e.g.
> > >>>> "Flags=PartitionTimeLimit") to override partition time and/or size
> > >>>> limits.
> > >>>>
> > >>>>
> > >>>> Quoting Matteo Guglielmi <[email protected]>:
> > >>>>
> > >>>>  Dear All,
> > >>>>>
> > >>>>> I'm trying to create a simple qos called 1week which
> > >>>>> I would like to associate to those users who do need
> > >>>>> to run for one week instead of 2 days at maximum:
> > >>>>>
> > >>>>> ### slurm.conf ###
> > >>>>> EnforcePartLimits=YES
> > >>>>> TaskPlugin=task/affinity
> > >>>>> TaskPluginParam=Sched
> > >>>>> TopologyPlugin=topology/none
> > >>>>> TrackWCKey=no
> > >>>>> SchedulerType=sched/backfill
> > >>>>> SelectType=select/cons_res
> > >>>>> SelectTypeParameters=CR_Core_****Memory
> > >>>>> PriorityType=priority/****multifactor
> > >>>>>
> > >>>>> PriorityDecayHalfLife=7-0
> > >>>>> PriorityCalcPeriod=5
> > >>>>> PriorityFavorSmall=YES
> > >>>>> PriorityMaxAge=7-0
> > >>>>> PriorityUsageResetPeriod=NONE
> > >>>>> PriorityWeightAge=1000
> > >>>>> PriorityWeightFairshare=1000
> > >>>>> PriorityWeightJobSize=10000
> > >>>>> PriorityWeightPartition=10000
> > >>>>> PriorityWeightQOS=10000
> > >>>>> AccountingStorageEnforce=****limits,qos
> > >>>>> AccountingStorageType=****accounting_storage/slurmdbd
> > >>>>> JobCompType=jobcomp/none
> > >>>>> JobAcctGatherType=jobacct_****gather/linux
> > >>>>> PreemptMode=suspend,gang
> > >>>>> PreemptType=preempt/partition_****prio
> > >>>>>
> > >>>>>
> > >>>>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
> > >>>>>
> > >>>>> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
> > >>>>> ThreadsPerCore=1 RealMemory=7000   Weight=1
> Feature=X5482,foff,fofflm
> > >>>>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
> > >>>>> ThreadsPerCore=1 RealMemory=127000 Weight=1
> Feature=6176,foff,foffhm
> > >>>>>
> > >>>>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
> > >>>>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
> > >>>>> PartitionName=batch   Nodes=foff[01-13] Default=YES
> > >>>>> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
> > >>>>> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
> > >>>>> #################
> > >>>>>
> > >>>>> sacctmgr list associations format=Account,Cluster,User,**
> > >>>>> Fairshare,Partition,****defaultqos,qos tree withd
> > >>>>>
> > >>>>>
> > >>>>>            Account    Cluster       User     Share  Partition
> Def QOS
> > >>>>>                 QOS
> > >>>>> -------------------- ---------- ---------- --------- ----------
> > >>>>> --------- --------------------
> > >>>>> root                     superb                    1
> > >>>>>               normal
> > >>>>>  root                    superb       root         1
> > >>>>>               normal
> > >>>>>  sb                      superb                    1
> > >>>>>               normal
> > >>>>>  sb                     superb   belushki         1      batch
> > >>>>>              normal
> > >>>>>  sb                     superb     fiocco         1      batch
> > >>>>>              normal
> > >>>>>  gr-fo                  superb                    1
> > >>>>>               normal
> > >>>>>  gr-fo                 superb   belushki         1      foff1
> > >>>>>              normal
> > >>>>>  gr-fo                 superb   belushki         1      foff2
> > >>>>>              normal
> > >>>>>  gr-fo                 superb     fiocco         1      foff1
> > >>>>>              normal
> > >>>>>  gr-fo                 superb     fiocco         1      foff2
> > >>>>>              normal
> > >>>>>
> > >>>>>
> > >>>>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100
> > >>>>> PreemptMode=Cluster
> > >>>>> Flags=PartitionTimeLimit
> > >>>>>
> > >>>>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
> > >>>>>
> > >>>>> sacctmgr list associations format=Account,Cluster,User,**
> > >>>>> Fairshare,Partition,****defaultqos,qos tree withd
> > >>>>>
> > >>>>>
> > >>>>>            Account    Cluster       User     Share  Partition
> Def QOS
> > >>>>>                 QOS
> > >>>>> -------------------- ---------- ---------- --------- ----------
> > >>>>> --------- --------------------
> > >>>>> root                     superb                    1
> > >>>>>               normal
> > >>>>>  root                    superb       root         1
> > >>>>>               normal
> > >>>>>  sb                      superb                    1
> > >>>>>               normal
> > >>>>>  sb                     superb   belushki         1      batch
> > >>>>>              normal
> > >>>>>  sb                     superb     fiocco         1      batch
> > >>>>>              normal
> > >>>>>  gr-fo                  superb                    1
> > >>>>>               normal
> > >>>>>  gr-fo                 superb   belushki         1      foff1
> > >>>>>        1week,normal
> > >>>>>  gr-fo                 superb   belushki         1      foff2
> > >>>>>        1week,normal
> > >>>>>  gr-fo                 superb     fiocco         1      foff1
> > >>>>>              normal
> > >>>>>  gr-fo                 superb     fiocco         1      foff2
> > >>>>>              normal
> > >>>>>
> > >>>>> /etc/init.d/slurmd restart (same command was issued on all nodes
> too)
> > >>>>>
> > >>>>> su - belushki
> > >>>>>
> > >>>>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
> > >>>>> srun: error: Unable to allocate resources: Requested time limit is
> > >>>>> invalid (exceeds some limit)
> > >>>>>
> > >>>>>
> > >>>>> Could you tell me what I still miss in order to make it working for
> > >>>>> user
> > >>>>> "belushki"?
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> --matt
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> > >
> > >
>

Reply via email to