We have a similar setup and this is our current setup. Without tuning
these, you are in a world of hurt with your job mix and doing backfill.

SchedulerParameters=default_queue_depth=50,bf_interval=120,bf_window=300,bf_max_job_user=60

The bf_max_job_user is key for us.


On Tue, May 21, 2013 at 3:10 PM, Carles Fenoy <[email protected]> wrote:

>  Hi all,
> Use sdiag to see if the backfilling is too slow. If it is, tune the
> scheduler parameters. There is a bf_max_jobs or something like this that
> will limit the number of jobs evaluated and will decrease considerably the
> scheduling time
> Regards,
> Carles Fenoy
> Barcelona Supercomputing Center
> El 21/05/2013 23:15, "Bjørn-Helge Mevik" <[email protected]> escribió:
>
>
>>
>> If you increase the log level, for instance set
>>
>> SlurmctldDebug=debug
>> DebugFlags=Backfill
>>
>> you might get more information about what happens.  If it is the
>> backfilling that takes too long, you should see messages about backfill
>> "yielding locks".  If I recall correctly, the backfill scheduler used to
>> time out after MessageTimeout/2 seconds, but looking at the code for
>> 2.5.6 this seems to have changed.
>>
>> Keep us posted about what you find.  I'm planning to switch to 2.5.6
>> tomorrow, and have from time to time had problems getting the
>> backfilling to be fast enough.
>>
>> --
>> Regards,
>> Bjørn-Helge Mevik, dr. scient,
>> Department for Research Computing, University of Oslo
>
>

Reply via email to