Thanks for the responses. After enabling the DebugFlags to show backfill information it seemed the same jobs were being tested for this one user at the limit defined by bf_max_job_user. I've since set the following and backfill is now working as expected.
bf_max_job_start=100,bf_max_job_user=0,bf_max_job_test=400,bf_interval=60,sched_interval=120,default_queue_depth=10,partition_job_depth=100,bf_window=7200,bf_resolution=1800,bf_continue,max_sched_time=4,defer,preempt_strict_order - Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Mon, Nov 24, 2014 at 10:46 PM, <[email protected]> wrote: > > There are also a couple of DebugFlags that show in great (i.e. very > verbose) detail what the backfill scheduler is doing. > > Quoting Christopher Samuel <[email protected]>: > > On 22/11/14 05:39, Trey Dockendorf wrote: >> >> Currently this is one user who has the 1500 pending jobs and the reasons >>> in squeue is either (Resources) , (Priority) with the vast majority >>> being (None). >>> >> >> To me that sounds like the backfill scheduler is not getting to the ones >> labelled "None". >> >> This is our current SchedulerParameters: >>> >> >> This is what we use on our clusters and our BlueGene/Q, all of which can >> have many thousands of jobs queued waiting to run - for example one of >> our Intel clusters currently has over 1,400 jobs waiting and none are >> labelled as "None". >> >> SchedulerParameters=bf_window=43200,bf_resolution=600,bf_ >> max_job_user=5,max_job_bf=10000,bf_continue,defer >> >> Everything seems to perform well with those settings, slurmctld is at >> around 8GB virtual and only ~35MB RSS for instance. >> >> Best of luck! >> Chris >> -- >> Christopher Samuel Senior Systems Administrator >> VLSCI - Victorian Life Sciences Computation Initiative >> Email: [email protected] Phone: +61 (0)3 903 55545 >> http://www.vlsci.org.au/ http://twitter.com/vlsci >> > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC >
