On 22/11/14 05:39, Trey Dockendorf wrote:

> Currently this is one user who has the 1500 pending jobs and the reasons
> in squeue is either (Resources) , (Priority) with the vast majority
> being (None). 

To me that sounds like the backfill scheduler is not getting to the ones
labelled "None".

> This is our current SchedulerParameters:

This is what we use on our clusters and our BlueGene/Q, all of which can
have many thousands of jobs queued waiting to run - for example one of
our Intel clusters currently has over 1,400 jobs waiting and none are
labelled as "None".

SchedulerParameters=bf_window=43200,bf_resolution=600,bf_max_job_user=5,max_job_bf=10000,bf_continue,defer

Everything seems to perform well with those settings, slurmctld is at
around 8GB virtual and only ~35MB RSS for instance.

Best of luck!
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to