It seems that you have implemented something very similar to the
SchedulingParameters value of "bf_continue". It may be useful to
review this tutorial and adjust tuning parameters. Many system run
fine with far more jobs.
http://slurm.schedmd.com/SUG14/sched_tutorial.pdf
Quoting Magnus Jonsson <[email protected]>:
Hi!
For the last couple of month I have used an patch for Slurm that instead
of cancelling the backfill "round" with every little thing that touches
the job queue (start/stop of job step, job modification, start/stop of
jobs etc..) I only cancel the "round" then a running jobs dies.
This has worked very well for us. I have attached some graphs that shows
before and after. It's pretty short period of time but the results are
the same for longer time periods.
The files starting with 19-20 are using the stock backfill and 18-19 is
using my patch.
In the 18-19-files there is a huge dip due to a large number of
small jobs that basically kills the backfill.
But as you can see the scheduler goes deeper into the queue and this
will also get us less number of idle cores in the system.
I have a patch for this. I can share it if somebody likes to test this.
But It's not complete in any way. I think that someone with more
knowledge of the internals of Slurm needs to look at the
consequences of it internally.
I have an idea on how to increase the depth a little bit more.
If I remember correctly the backfill scheduler today looks for an start
position for a job in several different start positions to make an
reservation for the job.
If we only do this for the first N jobs of a user and after that only
look if the jobs can start at this point I think it might increase the
number of jobs that the scheduler actually looks at.
An other thing that is not working for us very well is that we have
some partitions with different priority. To give all users the
possibility to backfill we use the "bf max jobs per user" option
(N=20 for the example).
If a user puts in a number of jobs in the queue with high priority lets
say 25 jobs and 25 jobs into the queue with less (normal) priority.
Slurm till not even look at the users jobs that are in the normal
priority queue. I have almost completed a patch that adds an "bf max
jobs per user per partition" option. I will release it when I'm done.
For us Slurms backfill is on the edge of usable right now. Due to the
patch we actually get backfill of jobs time to times. But this will not
work very long. At this moment we have 1.2k jobs in the queue and that
is low for us. We have on an regular basis over 3k jobs in the queue.
Something major has to be done with the backfill scheduler for Slurm to
be usable for us in the future. Not just small patches that add some
tweaks to get us through the day.
Best regards,
Magnus
--
Magnus Jonsson, Developer, HPC2N, UmeƄ Universitet
--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support