AFAIU the major optimization wrt. array job scheduling is that if the scheduler
finds that it cannot schedule a job in a job array, it skips over all the rest
of the jobs in the array. There's also some memory benefits, e.g. a pending job
array is stored as a single object in the job queue, rather than being broken
up into a zillion separate jobs.
But, from the perspective of various limits (MaxJobCount, and the various job
number limits you can set per account/user if you use accounting, etc.), a job
array with N jobs counts as N jobs and not as one.
Back in the slurm 2.something days, we found that we had to keep the
MaxJobCount somewhat reasonable (10k or such) lest the scheduler would bog
down, but current versions are a lot better in this respect. Currently we have
MaxJobCount=300k and MaxArraySize=100k (similar to your case, we had some users
that wanted to run huge array jobs). In order to prevent individual users from
hogging the entire cluster, we use the GrpTRESRunMins limits (GrpCPURunMins if
you're stuck on an older slurm version).
From: Christopher benjamin Coffey [chris.cof...@nau.edu]
Sent: Wednesday, September 21, 2016 7:14
Subject: [slurm-dev] Slurm array scheduling question
When slurm is considering jobs to schedule including job arrays out of all
pending jobs, does slurm consider only the job array individually, or does it
consider the child jobs behind them? I’m curious as I’ve to date limited the
size of job arrays to 4000 to be proportional with our max queue limit of
13,000. I’ve done this in order to keep the job depth at a reasonable size for
efficient slurm scheduling and backfilling (maybe not needed!). But to date,
I, and the folks utilizing our cluster have been pleased with the scheduling
being done, and speed on our cluster; I don’t want to change that! ☺
I have a researcher now wanting to process 100K+ inputs with slurm arrays, and
my 4000 limit is becoming a burden where we’ve been looking into ways to work
around it. I’ve started rethinking my original 4000 number and am now wondering
if it’s necessary to keep the array size so low.
A man on slurm.conf gives the impression that if I change the slurm array size,
the max queue size has to be augmented to a higher value. This would indicate
to me that this would in fact impact the scheduling significantly as now for
backfill, there has to be potentially many more jobs tested before starting
I’d like to get some feedback on this please from other sites and the
developers if possible. Thank you!
Northern Arizona University