Hi.
I'm currently running slurm 2.6.3. Was previously running the 2.4 series with no issues. Since switching to slurm 2.6.3, I've noticed jobs that want more than 2 nodes never get to run unless nothing else wishes to use the queue, but small jobs will always jump the line and run if enough nodes are free. My slurm.conf has not been changed since 2.1.11 when it was first deployed. I'm using sched/backfill and priority/basic. The max run time on the single queue for this cluster is 48 hours. I've had a job requesting 20 nodes submitted on 10/15 to run for $max_time with an estimated start time of 10/19, while 551 single node jobs submitted requesting $max_time have been submitted and run/are running in the mean time. The small jobs are taking ~14 hours, but the submitter is requesting the full 48.
How can I tweak my slurm.conf so if a large job is at the top of the queue, it blocks everything of equal or greater run time until it can run? That is the behavior my end users are used to and expect.
-- Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium [email protected] (518) 276-4407
