On 2016-02-16 14:03, Ulf Markwardt wrote:
Dear all,
I have a problem with a large reservation in a few hours, ~1700
long-running jobs waiting to start afterwards and my short job (srun -t
1 hostname) with priority of 1 that would fill any gap...
"sdiag" always shows a value of about 100 as "Last depth cycle" for
backfilling. Does that mean that it only looks at the first 100 jobs?
Yes, see bf_max_job_test:
bf_max_job_test=#
The maximum number of jobs to attempt backfill scheduling for (i.e.
the queue depth). Higher values result in more overhead
and less responsiveness. Until an attempt is made to backfill schedule
a job, its expected initiation time value will not be set. The
default value is 100. In the case of large clusters, configuring a
relatively small value may be desirable. This option applies only to
SchedulerType=sched/backfill.
I thought, bf_continue should take care of this, so that the next
backfilling test starts where the last has finished.
At the moment we have 15.08.6 running with:
SchedulerParameters=bf_interval=30,bf_max_job_test=2000,bf_window=7200,default_queue_depth=5000,bf_continue,sched_interval=120,defer
(Some values might be too high for production, but I was desperate to
ge
my job running...)
Can anybody give me a hint on how to change this so that my low
priority
job gets scheduled?
Thanks a lot,
Ulf
PS. As soon as I give this job a Nice=-200 it starts, but that is not
the way I want it :-)