Tme message is definitely from the main scheduling loop rather than backfill. I would guess that three batch jobs were submitted since the last time the scheduler ran and it is only testing three jobs for scheduling at that time.
Quoting Michael Gutteridge <[email protected]>:
I'm running slurm 2.6.9: I've got the backfill scheduler set up with some pretty ridiculous parameters as we have a large number of queued jobs of various dimensions: SchedulerParameters=default_queue_depth=10000,bf_continue,bf_interval=120,bf_max_job_user=10000,bf_resolution=600,bf_window=4320,bf_max_job_part=10000 This has been working fine- backfill was effectively going through the full queue- but today it appears to have stopped- jobs which should be backfilled onto idle resources aren't being run. The scheduler log shows: [2014-06-04T13:16:10.107] sched: Running job scheduler [2014-06-04T13:16:10.111] sched: JobId=7060218. State=PENDING. Reason=Resources. Priority=10850. Partition=campus. [2014-06-04T13:16:10.111] sched: JobId=7060219. State=PENDING. Reason=Priority(Priority), Priority=10850, Partition=campus. [2014-06-04T13:16:10.111] sched: already tested 3 jobs, breaking out My understanding is that it shouldn't hit that limit until default_queue_depth. Has my controller lost it's mind? I've got a nearly identical test setup where this is working as I'd expect. Any hints appreciated... thanks much Michael
