Hi, After increasing the log level I could see lots of messages like:
backfill: completed yielding locks Also, sdiag said that the backfilling cycle was taking around 160 seconds. I'll try changing the SchedulerParameters as suggested and see if this helps. Thanks, Andrew. ________________________________ From: Tim Carlson [[email protected]] Sent: Wednesday, May 22, 2013 5:51 PM To: slurm-dev Subject: [slurm-dev] Re: Problems when using sched/backfill We have a similar setup and this is our current setup. Without tuning these, you are in a world of hurt with your job mix and doing backfill. SchedulerParameters=default_queue_depth=50,bf_interval=120,bf_window=300,bf_max_job_user=60 The bf_max_job_user is key for us. On Tue, May 21, 2013 at 3:10 PM, Carles Fenoy <[email protected]<mailto:[email protected]>> wrote: Hi all, Use sdiag to see if the backfilling is too slow. If it is, tune the scheduler parameters. There is a bf_max_jobs or something like this that will limit the number of jobs evaluated and will decrease considerably the scheduling time Regards, Carles Fenoy Barcelona Supercomputing Center El 21/05/2013 23:15, "Bjørn-Helge Mevik" <[email protected]<mailto:[email protected]>> escribió: If you increase the log level, for instance set SlurmctldDebug=debug DebugFlags=Backfill you might get more information about what happens. If it is the backfilling that takes too long, you should see messages about backfill "yielding locks". If I recall correctly, the backfill scheduler used to time out after MessageTimeout/2 seconds, but looking at the code for 2.5.6 this seems to have changed. Keep us posted about what you find. I'm planning to switch to 2.5.6 tomorrow, and have from time to time had problems getting the backfilling to be fast enough. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo -- Scanned by iCritical.
