Great, thanks Moe.

On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:

> This works for me.
> What version of SLURM are you running?
> You might want to look at your SlurmctldLogFile.
>
> Lyn,
> You can use the QOS mechanism was Matt is with flags (e.g.
> "Flags=PartitionTimeLimit") to override partition time and/or size limits.
>
>
> Quoting Matteo Guglielmi <[email protected]>:
>
>> Dear All,
>>
>> I'm trying to create a simple qos called 1week which
>> I would like to associate to those users who do need
>> to run for one week instead of 2 days at maximum:
>>
>> ### slurm.conf ###
>> EnforcePartLimits=YES
>> TaskPlugin=task/affinity
>> TaskPluginParam=Sched
>> TopologyPlugin=topology/none
>> TrackWCKey=no
>> SchedulerType=sched/backfill
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_Core_**Memory
>> PriorityType=priority/**multifactor
>> PriorityDecayHalfLife=7-0
>> PriorityCalcPeriod=5
>> PriorityFavorSmall=YES
>> PriorityMaxAge=7-0
>> PriorityUsageResetPeriod=NONE
>> PriorityWeightAge=1000
>> PriorityWeightFairshare=1000
>> PriorityWeightJobSize=10000
>> PriorityWeightPartition=10000
>> PriorityWeightQOS=10000
>> AccountingStorageEnforce=**limits,qos
>> AccountingStorageType=**accounting_storage/slurmdbd
>> JobCompType=jobcomp/none
>> JobAcctGatherType=jobacct_**gather/linux
>> PreemptMode=suspend,gang
>> PreemptType=preempt/partition_**prio
>>
>> NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
>>
>> NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
>> ThreadsPerCore=1 RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
>> NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
>> ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
>>
>> PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
>> MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
>> PartitionName=batch   Nodes=foff[01-13] Default=YES
>> PartitionName=foff1   Nodes=foff[01-08] Priority=1000
>> PartitionName=foff2   Nodes=foff[09-13] Priority=1000
>> #################
>>
>> sacctmgr list associations format=Account,Cluster,User,**
>> Fairshare,Partition,**defaultqos,qos tree withd
>>
>>             Account    Cluster       User     Share  Partition   Def QOS
>>                  QOS
>> -------------------- ---------- ---------- --------- ---------- ---------
>> --------------------
>> root                     superb                    1
>>                normal
>>  root                    superb       root         1
>>                normal
>>  sb                      superb                    1
>>                normal
>>  sb                     superb   belushki         1      batch
>>               normal
>>  sb                     superb     fiocco         1      batch
>>               normal
>>  gr-fo                  superb                    1
>>              normal
>>   gr-fo                 superb   belushki         1      foff1
>>               normal
>>   gr-fo                 superb   belushki         1      foff2
>>               normal
>>   gr-fo                 superb     fiocco         1      foff1
>>               normal
>>   gr-fo                 superb     fiocco         1      foff2
>>               normal
>>
>>
>> sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster
>> Flags=PartitionTimeLimit
>>
>> sacctmgr modify user name=belushki Account=gr-fo set qos+=1week
>>
>> sacctmgr list associations format=Account,Cluster,User,**
>> Fairshare,Partition,**defaultqos,qos tree withd
>>
>>             Account    Cluster       User     Share  Partition   Def QOS
>>                  QOS
>> -------------------- ---------- ---------- --------- ---------- ---------
>> --------------------
>> root                     superb                    1
>>                normal
>>  root                    superb       root         1
>>                normal
>>  sb                      superb                    1
>>                normal
>>  sb                     superb   belushki         1      batch
>>               normal
>>  sb                     superb     fiocco         1      batch
>>               normal
>>  gr-fo                  superb                    1
>>              normal
>>   gr-fo                 superb   belushki         1      foff1
>>         1week,normal
>>   gr-fo                 superb   belushki         1      foff2
>>         1week,normal
>>   gr-fo                 superb     fiocco         1      foff1
>>               normal
>>   gr-fo                 superb     fiocco         1      foff2
>>               normal
>>
>> /etc/init.d/slurmd restart (same command was issued on all nodes too)
>>
>> su - belushki
>>
>> srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
>> srun: error: Unable to allocate resources: Requested time limit is
>> invalid (exceeds some limit)
>>
>>
>> Could you tell me what I still miss in order to make it working for user
>> "belushki"?
>>
>> Thanks,
>>
>> --matt
>>
>>
>
>
>

Reply via email to