I'm running slurm 2.6.9: I've got the backfill scheduler set up with
some pretty ridiculous parameters as we have a large number of queued
jobs of various dimensions:

SchedulerParameters=default_queue_depth=10000,bf_continue,bf_interval=120,bf_max_job_user=10000,bf_resolution=600,bf_window=4320,bf_max_job_part=10000

This has been working fine- backfill was effectively going through the
full queue- but today it appears to have stopped- jobs which should be
backfilled onto idle resources aren't being run.  The scheduler log
shows:

[2014-06-04T13:16:10.107] sched: Running job scheduler
[2014-06-04T13:16:10.111] sched: JobId=7060218. State=PENDING.
Reason=Resources. Priority=10850. Partition=campus.
[2014-06-04T13:16:10.111] sched: JobId=7060219. State=PENDING.
Reason=Priority(Priority), Priority=10850, Partition=campus.
[2014-06-04T13:16:10.111] sched: already tested 3 jobs, breaking out

My understanding is that it shouldn't hit that limit until
default_queue_depth.  Has my controller lost it's mind?  I've got a
nearly identical test setup where this is working as I'd expect.

Any hints appreciated... thanks much

Michael

Reply via email to