Re: [slurm-dev] Simple QoS for Extending Max Walltime Limit

Lyn Gerner Mon, 31 Oct 2011 13:18:11 -0700

Certainly; how about the Flags?

Thanks again,
Lyn


On Mon, Oct 31, 2011 at 1:08 PM, Moe Jette <[email protected]> wrote:

> SLURM's QOS and resource limits web pages describe most of this:
> http://www.schedmd.com/**slurmdocs/qos.html<http://www.schedmd.com/slurmdocs/qos.html>
> http://www.schedmd.com/**slurmdocs/resource_limits.html<http://www.schedmd.com/slurmdocs/resource_limits.html>
>
>
> Quoting Lyn Gerner <[email protected]>:
>
>  PS: Moe, is there a related document?  Couldn't find anything obvious.
>>
>> Thanks,
>> Lyn
>>
>> On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>**
>> wrote:
>>
>>  Great, thanks Moe.
>>>
>>>
>>> On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:
>>>
>>>  This works for me.
>>>> What version of SLURM are you running?
>>>> You might want to look at your SlurmctldLogFile.
>>>>
>>>> Lyn,
>>>> You can use the QOS mechanism was Matt is with flags (e.g.
>>>> "Flags=PartitionTimeLimit") to override partition time and/or size
>>>> limits.
>>>>
>>>>
>>>> Quoting Matteo Guglielmi <[email protected]>:
>>>>
>>>>  Dear All,
>>>>>
>>>>> I'm trying to create a simple qos called 1week which
>>>>> I would like to associate to those users who do need
>>>>> to run for one week instead of 2 days at maximum:
>>>>>
>>>>> ### slurm.conf ###
>>>>> EnforcePartLimits=YES
>>>>> TaskPlugin=task/affinity
>>>>> TaskPluginParam=Sched
>>>>> TopologyPlugin=topology/none
>>>>> TrackWCKey=no
>>>>> SchedulerType=sched/backfill
>>>>> SelectType=select/cons_res
>>>>> SelectTypeParameters=CR_Core_****Memory
>>>>> PriorityType=priority/****multifactor
>>>>>
>>>>> PriorityDecayHalfLife=7-0
>>>>> PriorityCalcPeriod=5
>>>>> PriorityFavorSmall=YES
>>>>> PriorityMaxAge=7-0
>>>>> PriorityUsageResetPeriod=NONE
>>>>> PriorityWeightAge=1000
>>>>> PriorityWeightFairshare=1000
>>>>> PriorityWeightJobSize=10000
>>>>> PriorityWeightPartition=10000
>>>>> PriorityWeightQOS=10000
>>>>> AccountingStorageEnforce=****limits,qos
>>>>> AccountingStorageType=****accounting_storage/slurmdbd
>>>>> JobCompType=jobcomp/none
>>>>> JobAcctGatherType=jobacct_****gather/linux
>>>>> PreemptMode=suspend,gang
>>>>> PreemptType=preempt/partition_****prio
>>>>>
>>>>>
>>>>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
>>>>>
>>>>> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
>>>>> ThreadsPerCore=1 RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
>>>>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
>>>>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
>>>>>
>>>>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
>>>>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
>>>>> PartitionName=batch   Nodes=foff[01-13] Default=YES
>>>>> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
>>>>> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
>>>>> #################
>>>>>
>>>>> sacctmgr list associations format=Account,Cluster,User,**
>>>>> Fairshare,Partition,****defaultqos,qos tree withd
>>>>>
>>>>>
>>>>>            Account    Cluster       User     Share  Partition   Def QOS
>>>>>                 QOS
>>>>> -------------------- ---------- ---------- --------- ----------
>>>>> --------- --------------------
>>>>> root                     superb                    1
>>>>>               normal
>>>>>  root                    superb       root         1
>>>>>               normal
>>>>>  sb                      superb                    1
>>>>>               normal
>>>>>  sb                     superb   belushki         1      batch
>>>>>              normal
>>>>>  sb                     superb     fiocco         1      batch
>>>>>              normal
>>>>>  gr-fo                  superb                    1
>>>>>               normal
>>>>>  gr-fo                 superb   belushki         1      foff1
>>>>>              normal
>>>>>  gr-fo                 superb   belushki         1      foff2
>>>>>              normal
>>>>>  gr-fo                 superb     fiocco         1      foff1
>>>>>              normal
>>>>>  gr-fo                 superb     fiocco         1      foff2
>>>>>              normal
>>>>>
>>>>>
>>>>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100
>>>>> PreemptMode=Cluster
>>>>> Flags=PartitionTimeLimit
>>>>>
>>>>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
>>>>>
>>>>> sacctmgr list associations format=Account,Cluster,User,**
>>>>> Fairshare,Partition,****defaultqos,qos tree withd
>>>>>
>>>>>
>>>>>            Account    Cluster       User     Share  Partition   Def QOS
>>>>>                 QOS
>>>>> -------------------- ---------- ---------- --------- ----------
>>>>> --------- --------------------
>>>>> root                     superb                    1
>>>>>               normal
>>>>>  root                    superb       root         1
>>>>>               normal
>>>>>  sb                      superb                    1
>>>>>               normal
>>>>>  sb                     superb   belushki         1      batch
>>>>>              normal
>>>>>  sb                     superb     fiocco         1      batch
>>>>>              normal
>>>>>  gr-fo                  superb                    1
>>>>>               normal
>>>>>  gr-fo                 superb   belushki         1      foff1
>>>>>        1week,normal
>>>>>  gr-fo                 superb   belushki         1      foff2
>>>>>        1week,normal
>>>>>  gr-fo                 superb     fiocco         1      foff1
>>>>>              normal
>>>>>  gr-fo                 superb     fiocco         1      foff2
>>>>>              normal
>>>>>
>>>>> /etc/init.d/slurmd restart (same command was issued on all nodes too)
>>>>>
>>>>> su - belushki
>>>>>
>>>>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
>>>>> srun: error: Unable to allocate resources: Requested time limit is
>>>>> invalid (exceeds some limit)
>>>>>
>>>>>
>>>>> Could you tell me what I still miss in order to make it working for
>>>>> user
>>>>> "belushki"?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --matt
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
>

Re: [slurm-dev] Simple QoS for Extending Max Walltime Limit

Reply via email to