On 22/11/14 05:39, Trey Dockendorf wrote: > Currently this is one user who has the 1500 pending jobs and the reasons > in squeue is either (Resources) , (Priority) with the vast majority > being (None).
To me that sounds like the backfill scheduler is not getting to the ones labelled "None". > This is our current SchedulerParameters: This is what we use on our clusters and our BlueGene/Q, all of which can have many thousands of jobs queued waiting to run - for example one of our Intel clusters currently has over 1,400 jobs waiting and none are labelled as "None". SchedulerParameters=bf_window=43200,bf_resolution=600,bf_max_job_user=5,max_job_bf=10000,bf_continue,defer Everything seems to perform well with those settings, slurmctld is at around 8GB virtual and only ~35MB RSS for instance. Best of luck! Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
