SLURM's QOS and resource limits web pages describe most of this:
http://www.schedmd.com/slurmdocs/qos.html
http://www.schedmd.com/slurmdocs/resource_limits.html


Quoting Lyn Gerner <[email protected]>:

PS: Moe, is there a related document?  Couldn't find anything obvious.

Thanks,
Lyn

On Mon, Oct 31, 2011 at 12:59 PM, Lyn Gerner <[email protected]>wrote:

Great, thanks Moe.


On Mon, Oct 31, 2011 at 10:39 AM, Moe Jette <[email protected]> wrote:

This works for me.
What version of SLURM are you running?
You might want to look at your SlurmctldLogFile.

Lyn,
You can use the QOS mechanism was Matt is with flags (e.g.
"Flags=PartitionTimeLimit") to override partition time and/or size limits.


Quoting Matteo Guglielmi <[email protected]>:

Dear All,

I'm trying to create a simple qos called 1week which
I would like to associate to those users who do need
to run for one week instead of 2 days at maximum:

### slurm.conf ###
EnforcePartLimits=YES
TaskPlugin=task/affinity
TaskPluginParam=Sched
TopologyPlugin=topology/none
TrackWCKey=no
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_**Memory
PriorityType=priority/**multifactor
PriorityDecayHalfLife=7-0
PriorityCalcPeriod=5
PriorityFavorSmall=YES
PriorityMaxAge=7-0
PriorityUsageResetPeriod=NONE
PriorityWeightAge=1000
PriorityWeightFairshare=1000
PriorityWeightJobSize=10000
PriorityWeightPartition=10000
PriorityWeightQOS=10000
AccountingStorageEnforce=**limits,qos
AccountingStorageType=**accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_**gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_**prio

NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN

NodeName=foff[01-08] Procs=8  CoresPerSocket=4  Sockets=2
ThreadsPerCore=1 RealMemory=7000   Weight=1 Feature=X5482,foff,fofflm
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff,foffhm

PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=2-0 PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch   Nodes=foff[01-13] Default=YES
PartitionName=foff1   Nodes=foff[01-08] Priority=1000
PartitionName=foff2   Nodes=foff[09-13] Priority=1000
#################

sacctmgr list associations format=Account,Cluster,User,**
Fairshare,Partition,**defaultqos,qos tree withd

            Account    Cluster       User     Share  Partition   Def QOS
                 QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root                     superb                    1
               normal
 root                    superb       root         1
               normal
 sb                      superb                    1
               normal
 sb                     superb   belushki         1      batch
              normal
 sb                     superb     fiocco         1      batch
              normal
 gr-fo                  superb                    1
               normal
  gr-fo                 superb   belushki         1      foff1
              normal
  gr-fo                 superb   belushki         1      foff2
              normal
  gr-fo                 superb     fiocco         1      foff1
              normal
  gr-fo                 superb     fiocco         1      foff2
              normal


sacctmgr add qos Name=1week MaxWall=7-0 Priority=100 PreemptMode=Cluster
Flags=PartitionTimeLimit

sacctmgr modify user name=belushki Account=gr-fo set qos+=1week

sacctmgr list associations format=Account,Cluster,User,**
Fairshare,Partition,**defaultqos,qos tree withd

            Account    Cluster       User     Share  Partition   Def QOS
                 QOS
-------------------- ---------- ---------- --------- ----------
--------- --------------------
root                     superb                    1
               normal
 root                    superb       root         1
               normal
 sb                      superb                    1
               normal
 sb                     superb   belushki         1      batch
              normal
 sb                     superb     fiocco         1      batch
              normal
 gr-fo                  superb                    1
               normal
  gr-fo                 superb   belushki         1      foff1
        1week,normal
  gr-fo                 superb   belushki         1      foff2
        1week,normal
  gr-fo                 superb     fiocco         1      foff1
              normal
  gr-fo                 superb     fiocco         1      foff2
              normal

/etc/init.d/slurmd restart (same command was issued on all nodes too)

su - belushki

srun -p foff2 -A gr-fo --qos=1week -t 7-0 hostname
srun: error: Unable to allocate resources: Requested time limit is
invalid (exceeds some limit)


Could you tell me what I still miss in order to make it working for user
"belushki"?

Thanks,

--matt










Reply via email to