We have a similar setup and this is our current setup. Without tuning these, you are in a world of hurt with your job mix and doing backfill.
SchedulerParameters=default_queue_depth=50,bf_interval=120,bf_window=300,bf_max_job_user=60 The bf_max_job_user is key for us. On Tue, May 21, 2013 at 3:10 PM, Carles Fenoy <[email protected]> wrote: > Hi all, > Use sdiag to see if the backfilling is too slow. If it is, tune the > scheduler parameters. There is a bf_max_jobs or something like this that > will limit the number of jobs evaluated and will decrease considerably the > scheduling time > Regards, > Carles Fenoy > Barcelona Supercomputing Center > El 21/05/2013 23:15, "Bjørn-Helge Mevik" <[email protected]> escribió: > > >> >> If you increase the log level, for instance set >> >> SlurmctldDebug=debug >> DebugFlags=Backfill >> >> you might get more information about what happens. If it is the >> backfilling that takes too long, you should see messages about backfill >> "yielding locks". If I recall correctly, the backfill scheduler used to >> time out after MessageTimeout/2 seconds, but looking at the code for >> 2.5.6 this seems to have changed. >> >> Keep us posted about what you find. I'm planning to switch to 2.5.6 >> tomorrow, and have from time to time had problems getting the >> backfilling to be fast enough. >> >> -- >> Regards, >> Bjørn-Helge Mevik, dr. scient, >> Department for Research Computing, University of Oslo > >
