Hi Chad,

for us (also running slurm 17.11), the crucial point was the balance between PriorityWeightFairshare, PriorityWeightAge and PriorityMaxAge.

We set the PriorityWeightAge high (higher than PriorityWeightFairshare, in fact), so that even a job by some power user will eventually be the first in the queue and can't be sort of DDoS-ed by jobs from little-used accounts. The question then is: How long must that job have already been waiting in the queue?

Consider the following simplified account tree:
 root
 /  \
 A  B
/ \ |
X Y Z

When the cluster is basically occupied by X, this also has an impact on Y's fair share value. This can lead to a situation where the difference between X's and Y's fair share value is pretty small, even though Y has hardly used any resources. With a low value of PriorityMaxAge, the situation is basically FIFO between X and Y, as X's jobs only need a couple of hours (or even less) in the queue to compensate the difference in fair share priority.

We're currently running with the following settings, and since the increase of PriorityMaxAge to three weeks it works fine:

PriorityMaxAge=21-0
PriorityWeightAge=1500000
PriorityWeightFairshare=1000000


For the array jobs, you can set MaxArraySize. But remember to increase MaxJobCount as well!

Best,
Christoph

On 29/05/2019 16.17, Julius, Chad wrote:
All,

We rushed our Slurm install due to a short timeframe and missed some important items.  We are now looking to implement a better system than the first in, first out we have now.  My question, are the defaults listed in the slurm.conf file a good start?  Would anyone be willing to share their Scheduling section in their .conf?  Also we are looking to increase the maximum array size but I don’t see that in the slurm.conf in version 17.  Am I looking at an upgrade of Slurm in the near future or can I just add MaxArraySize=somenumber?

The defaults as of 17.11.8 are:

# SCHEDULING

#SchedulerAuth=

#SchedulerPort=

#SchedulerRootFilter=

#PriorityType=priority/multifactor

#PriorityDecayHalfLife=14-0

#PriorityUsageResetPeriod=14-0

#PriorityWeightFairshare=100000

#PriorityWeightAge=1000

#PriorityWeightPartition=10000

#PriorityWeightJobSize=1000

#PriorityMaxAge=1-0

*Chad Julius*

Cyberinfrastructure Engineer Specialist

*Division of Technology & Security*

SOHO 207, Box 2231

Brookings, SD 57007

Phone: 605-688-5767

www.sdstate.edu <http://www.sdstate.edu/>

cid:image007.png@01D24AF4.6CEECA30


--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499

Reply via email to