PS: Moe, is there a related document?  Couldn't find anything obvious.

Thanks,
Lyn

On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>wrote:

> Great, thanks Moe.
>
>
> On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:
>
>> This works for me.
>> What version of SLURM are you running?
>> You might want to look at your SlurmctldLogFile.
>>
>> Lyn,
>> You can use the QOS mechanism was Matt is with flags (e.g.
>> "Flags=PartitionTimeLimit") to override partition time and/or size limits.
>>
>>
>> Quoting Matteo Guglielmi <[email protected]>:
>>
>>> Dear All,
>>>
>>> I'm trying to create a simple qos called 1week which
>>> I would like to associate to those users who do need
>>> to run for one week instead of 2 days at maximum:
>>>
>>> ### slurm.conf ###
>>> EnforcePartLimits=YES
>>> TaskPlugin=task/affinity
>>> TaskPluginParam=Sched
>>> TopologyPlugin=topology/none
>>> TrackWCKey=no
>>> SchedulerType=sched/backfill
>>> SelectType=select/cons_res
>>> SelectTypeParameters=CR_Core_**Memory
>>> PriorityType=priority/**multifactor
>>> PriorityDecayHalfLife=7-0
>>> PriorityCalcPeriod=5
>>> PriorityFavorSmall=YES
>>> PriorityMaxAge=7-0
>>> PriorityUsageResetPeriod=NONE
>>> PriorityWeightAge=1000
>>> PriorityWeightFairshare=1000
>>> PriorityWeightJobSize=10000
>>> PriorityWeightPartition=10000
>>> PriorityWeightQOS=10000
>>> AccountingStorageEnforce=**limits,qos
>>> AccountingStorageType=**accounting_storage/slurmdbd
>>> JobCompType=jobcomp/none
>>> JobAcctGatherType=jobacct_**gather/linux
>>> PreemptMode=suspend,gang
>>> PreemptType=preempt/partition_**prio
>>>
>>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
>>>
>>> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
>>> ThreadsPerCore=1 RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
>>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
>>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
>>>
>>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
>>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
>>> PartitionName=batch   Nodes=foff[01-13] Default=YES
>>> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
>>> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
>>> #################
>>>
>>> sacctmgr list associations format=Account,Cluster,User,**
>>> Fairshare,Partition,**defaultqos,qos tree withd
>>>
>>>             Account    Cluster       User     Share  Partition   Def QOS
>>>                  QOS
>>> -------------------- ---------- ---------- --------- ----------
>>> --------- --------------------
>>> root                     superb                    1
>>>                normal
>>>  root                    superb       root         1
>>>                normal
>>>  sb                      superb                    1
>>>                normal
>>>  sb                     superb   belushki         1      batch
>>>               normal
>>>  sb                     superb     fiocco         1      batch
>>>               normal
>>>  gr-fo                  superb                    1
>>>                normal
>>>   gr-fo                 superb   belushki         1      foff1
>>>               normal
>>>   gr-fo                 superb   belushki         1      foff2
>>>               normal
>>>   gr-fo                 superb     fiocco         1      foff1
>>>               normal
>>>   gr-fo                 superb     fiocco         1      foff2
>>>               normal
>>>
>>>
>>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster
>>> Flags=PartitionTimeLimit
>>>
>>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
>>>
>>> sacctmgr list associations format=Account,Cluster,User,**
>>> Fairshare,Partition,**defaultqos,qos tree withd
>>>
>>>             Account    Cluster       User     Share  Partition   Def QOS
>>>                  QOS
>>> -------------------- ---------- ---------- --------- ----------
>>> --------- --------------------
>>> root                     superb                    1
>>>                normal
>>>  root                    superb       root         1
>>>                normal
>>>  sb                      superb                    1
>>>                normal
>>>  sb                     superb   belushki         1      batch
>>>               normal
>>>  sb                     superb     fiocco         1      batch
>>>               normal
>>>  gr-fo                  superb                    1
>>>                normal
>>>   gr-fo                 superb   belushki         1      foff1
>>>         1week,normal
>>>   gr-fo                 superb   belushki         1      foff2
>>>         1week,normal
>>>   gr-fo                 superb     fiocco         1      foff1
>>>               normal
>>>   gr-fo                 superb     fiocco         1      foff2
>>>               normal
>>>
>>> /etc/init.d/slurmd restart (same command was issued on all nodes too)
>>>
>>> su - belushki
>>>
>>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
>>> srun: error: Unable to allocate resources: Requested time limit is
>>> invalid (exceeds some limit)
>>>
>>>
>>> Could you tell me what I still miss in order to make it working for user
>>> "belushki"?
>>>
>>> Thanks,
>>>
>>> --matt
>>>
>>>
>>
>>
>>
>

Reply via email to