Thanks for the responses.  After enabling the DebugFlags to show backfill
information it seemed the same jobs were being tested for this one user at
the limit defined by bf_max_job_user.  I've since set the following and
backfill is now working as expected.

bf_max_job_start=100,bf_max_job_user=0,bf_max_job_test=400,bf_interval=60,sched_interval=120,default_queue_depth=10,partition_job_depth=100,bf_window=7200,bf_resolution=1800,bf_continue,max_sched_time=4,defer,preempt_strict_order

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

On Mon, Nov 24, 2014 at 10:46 PM, <[email protected]> wrote:

>
> There are also a couple of DebugFlags that show in great (i.e. very
> verbose) detail what the backfill scheduler is doing.
>
> Quoting Christopher Samuel <[email protected]>:
>
>  On 22/11/14 05:39, Trey Dockendorf wrote:
>>
>>  Currently this is one user who has the 1500 pending jobs and the reasons
>>> in squeue is either (Resources) , (Priority) with the vast majority
>>> being (None).
>>>
>>
>> To me that sounds like the backfill scheduler is not getting to the ones
>> labelled "None".
>>
>>  This is our current SchedulerParameters:
>>>
>>
>> This is what we use on our clusters and our BlueGene/Q, all of which can
>> have many thousands of jobs queued waiting to run - for example one of
>> our Intel clusters currently has over 1,400 jobs waiting and none are
>> labelled as "None".
>>
>> SchedulerParameters=bf_window=43200,bf_resolution=600,bf_
>> max_job_user=5,max_job_bf=10000,bf_continue,defer
>>
>> Everything seems to perform well with those settings, slurmctld is at
>> around 8GB virtual and only ~35MB RSS for instance.
>>
>> Best of luck!
>> Chris
>> --
>>  Christopher Samuel        Senior Systems Administrator
>>  VLSCI - Victorian Life Sciences Computation Initiative
>>  Email: [email protected] Phone: +61 (0)3 903 55545
>>  http://www.vlsci.org.au/      http://twitter.com/vlsci
>>
>
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
>

Reply via email to