We've been in production with SLURM 14.03.10 for a few weeks now and have
found our backfill configuration to be lacking. Currently there are 1500
jobs in a pending state each requesting 4 CPUs and only 4 hours of
walltime. We have approximately 1500 CPUs total that are idle and looking
at the other jobs that are pending I can not see why these are not being
backfilled. My theory is that our backfill limits are too low and the same
jobs keep being reevaluated for backfill and the others are ignored during
each cycle.
Currently this is one user who has the 1500 pending jobs and the reasons in
squeue is either (Resources) , (Priority) with the vast majority being
(None). This is our current SchedulerParameters:
SchedulerParameters=bf_max_job_user=35,bf_max_job_test=400,bf_interval=60,sched_interval=120,default_queue_depth=10,partition_job_depth=100,bf_window=7200,bf_resolution=1800,bf_continue,max_sched_time=4,defer,preempt_strict_order
My goal was to keep one user from overloading slurmctld with backfill
requests but to still allow efficient backfill when the cluster has idle
CPUs. I had been using max_job_test=100 and noticed backfill was never
taking place so increased to 300. This worked for about a week and now we
see no backfill taking place. This is the sdiag output which unsure how to
turn into useful information:
$ sdiag
*******************************************************
sdiag output at Fri Nov 21 12:32:01 2014
Data since Thu Nov 20 18:00:00 2014
*******************************************************
Server thread count: 3
Agent queue size: 0
Jobs submitted: 471
Jobs started: 338
Jobs completed: 320
Jobs canceled: 81
Jobs failed: 0
Main schedule statistics (microseconds):
Last cycle: 77565
Max cycle: 117189
Total cycles: 907
Mean cycle: 52777
Mean depth cycle: 211
Cycles per minute: 0
Last queue length: 1898
Backfilling stats
Total backfilled jobs (since last slurm start): 9529
Total backfilled jobs (since last stats cycle start): 271
Total cycles: 1108
Last cycle when: Fri Nov 21 12:31:36 2014
Last cycle: 502468
Max cycle: 507446
Mean cycle: 278070
Last depth cycle: 1926
Last depth cycle (try sched): 136
Depth Mean: 1908
Depth Mean (try depth): 96
Last queue length: 1926
Queue length mean: 1908
The user with 1500 pending jobs has a fairshare value of 0.000000. Is it
the case that this person's jobs are considered last for backfill based on
priority? (reading sdiag man page seems to hint that the cycle goes by job
priority order).
The system running slurmctld is a virtual machine with 4 CPUs and 4GB of
memory. I'd be interested to know other's experiences with tuning backfill
especially in the context of not overloading slurmctld.
Thanks,
- Trey
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]