[slurm-dev] Re: Debugging backfill in production - how?

jette Mon, 24 Nov 2014 20:46:04 -0800

There are also a couple of DebugFlags that show in great (i.e. veryverbose) detail what the backfill scheduler is doing.


Quoting Christopher Samuel <[email protected]>:

On 22/11/14 05:39, Trey Dockendorf wrote:

Currently this is one user who has the 1500 pending jobs and the reasons
in squeue is either (Resources) , (Priority) with the vast majority
being (None).


To me that sounds like the backfill scheduler is not getting to the ones
labelled "None".

This is our current SchedulerParameters:


This is what we use on our clusters and our BlueGene/Q, all of which can
have many thousands of jobs queued waiting to run - for example one of
our Intel clusters currently has over 1,400 jobs waiting and none are
labelled as "None".

SchedulerParameters=bf_window=43200,bf_resolution=600,bf_max_job_user=5,max_job_bf=10000,bf_continue,defer

Everything seems to perform well with those settings, slurmctld is at
around 8GB virtual and only ~35MB RSS for instance.

Best of luck!
Chris
--
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci



--
Morris "Moe" Jette
CTO, SchedMD LLC

[slurm-dev] Re: Debugging backfill in production - how?

Reply via email to