Hi, I'm trying to get a feel for fairshare scheduling.

I've got 3 VMs as scheduler and 2 worker nodes. I've created a
couple of accounts and assigned a couple of users to those
accounts:

slurmtest-sched# sacctmgr list assoc tree format=account,user,share
             Account       User     Share 
-------------------- ---------- --------- 
root                                    1 
 atlas                                 80 
  atlas                  alexis    parent 
 belle                                 20 
  belle                 alexis2    parent 
slurmtest-sched# 

Now I submit 20000 jobs alternately as the two users:

for X in {1..10000}; do su - alexis -c "sbatch slurmtest.sh"; su - alexis2 -c 
"sbatch slurmtest.sh"; done

Both users run the same job, which is just:

alexis@slurmtest-sched:~$ cat ~/slurmtest.sh 
#!/bin/bash

#SBATCH --job-name=slurmtest
#SBATCH --mem=10
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --mail-user=me@mine
#SBATCH --partition=normal

true
alexis@slurmtest-sched:~$ 

I made the job deliberately quick so that I would see results
slurm responding quickly too.

Then I just run 'watch sshare -al' and try to understand
what I see, which includes:

alexis2@slurmtest-sched:~$ sshare -al
             Account       User Raw Shares Norm Shares   Raw Usage  Norm Usage 
Effectv Usage  FairShare  GrpCPUMins      CPURunMins 
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ----------- --------------- 
root                                          1.000000        1555              
    0.000000   1.000000                       14450 
 atlas                                  80    0.800000         752    0.476011  
    0.476011   0.662038                       11560 
  atlas                  alexis     parent    0.800000         752    0.476717  
    0.476717   0.661633                       11560 
 belle                                  20    0.200000         802    0.523989  
    0.523989   0.162674                        2890 
  belle                 alexis2     parent    0.200000         802    0.523283  
    0.523283   0.163073                        2890 
alexis2@slurmtest-sched:~$

and a few seconds later:

alexis2@slurmtest-sched:~$ sshare -al
             Account       User Raw Shares Norm Shares   Raw Usage  Norm Usage 
Effectv Usage  FairShare  GrpCPUMins      CPURunMins 
-------------------- ---------- ---------- ----------- ----------- ----------- 
------------- ---------- ----------- --------------- 
root                                          1.000000        1578              
    0.000000   1.000000                       20230 
 atlas                                  80    0.800000         763    0.476011  
    0.476011   0.662038                        8670 
  atlas                  alexis     parent    0.800000         763    0.476717  
    0.476717   0.661633                        8670 
 belle                                  20    0.200000         814    0.523989  
    0.523989   0.162674                       11560 
  belle                 alexis2     parent    0.200000         814    0.523283  
    0.523283   0.163073                       11560 
alexis2@slurmtest-sched:~$ 

and so on.

So now to my questions:

The fairshare values are weightings for the priorities of *future*
jobs based on *past* scheduling behaviour; as such the fact that 
their current values are close to an 80:20 split is just because that's
the prioritisation weighting for future jobs from the two users
that would be required to "redress the balance" (because currently
the two  users have *not* had yet their fair shares of CPU-time).

Is that a correct reading of the fareshare values?

If that is correct, then I would expect the effective usage and
CPURunMins to tend towards an 80:20 split in line with the dictates
of the fairshare value, but this doesn't seem to be happening. Why? ... 
am I just not waiting long enough? ... is there a way to make slurm
respond faster? ...at least for testing purposes?

Any suggestions appreciated!

I'm running:

slurmtest-sched# dpkg -l | grep slurm
ii  slurm-llnl                          2.6.5-1                          amd64  
      Simple Linux Utility for Resource Management
ii  slurm-llnl-basic-plugins            2.6.5-1                          amd64  
      SLURM basic plugins
ii  slurm-llnl-slurmdbd                 2.6.5-1                          amd64  
      Secure enterprise-wide interface to a database for SLURM
slurmtest-sched# cat /etc/issue
Ubuntu 14.04.4 LTS \n \l

slurmtest-sched# uname -a
Linux slurmtest-sched 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58 UTC 
2016 x86_64 x86_64 x86_64 GNU/Linux
slurmtest-sched# 

and the same on the two worker nodes.

slurm.conf contains:

slurmtest-sched# egrep -v '^(#| *$)' /etc/slurm-llnl/slurm.conf | sort
AccountingStorageEnforce=associations,limits
AccountingStorageHost=localhost
AccountingStorageType=accounting_storage/slurmdbd
AuthType=auth/munge
CacheGroups=0
ClusterName=slurmtest
ControlMachine=slurmtest-sched
CryptoType=crypto/munge
FastSchedule=2
InactiveLimit=0
JobAcctGatherType=jobacct_gather/linux
JobCompType=jobcomp/none
KillWait=30
MailProg=/usr/bin/mail
MinJobAge=300
MpiDefault=none
NodeName=slurmtest-wn[1-2] CPUS=4 Sockets=4 CoresPerSocket=1 ThreadsPerCore=1 
RealMemory=2048 State=UNKNOWN
PartitionName=normal Nodes=slurmtest-wn[1-2] MaxTime=48:10:00 State=UP 
AllowGroups=alexis,alexis2
PartitionName=short Nodes=slurmtest-wn[1-2] MaxTime=24:10:00 State=UP 
AllowGroups=alexis,alexis2
PriorityCalcPeriod=1
PriorityDecayHalfLife=14-0
PriorityFavorSmall=NO
PriorityMaxAge=14-0
PriorityType=priority/multifactor
PriorityWeightAge=1000
PriorityWeightFairshare=10000
PriorityWeightJobSize=1000
PriorityWeightPartition=1000
PriorityWeightQOS=0
Proctracktype=proctrack/linuxproc
ReturnToService=0
SchedulerPort=7321
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
SlurmUser=slurm
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmctldTimeout=300
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmdTimeout=300
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
Waittime=0
slurmtest-sched# 

Thanks!

Alexis

Reply via email to