On 02/27/2015 04:25 PM, Christopher B Coffey wrote:
Now and then I find that jobs get stuck and it doesn’t make sense. In
this recent scenario I have one job from a user that has the highest
priority yet its not starting. The job has a requirement of 2 cpus, and
100GB of memory. This is available now, yet the job doesn’t start. I can
create a job with the exact resource requirements and submit, and it
starts immediately.
Here are my scheduling parameters:
SchedulerParameters=bf_window=20160,bf_resolution=600,default_queue_depth=1
2968,bf_max_job_test=13000,bf_max_job_start=100,bf_interval=30,pack_serial_
at_end
Slurm 14.11.4.
While having the backfill debug turned on I see something interesting.
Backfill says it tested 9234 jobs, but there are 10268 job in the queue.
Why didn’t backfill test all of the jobs? Maybe this is part of the
problem?
The only thing special about this users job was that it was part of a
chain of dependent jobs (which are all completed).
Is there any way to force a job to start? I’ve tried many things to get
the job to start but it won’t: release, requeue … etc.
Any help would be great, thanks!
Best,
Chris
Hi Chris,
A wild guess is that the dependencies for the job are not fulfilled. In that
case, the "Reason" for not starting is "DependencyNeverSatisfied", and the
cure for not keeping such jobs in queue is to include a "kill_invalid_depend,"
among the scheduling parameters. (The old default behaviour, before version
14.11 I think, was to automatically cancel jobs that were lacking the
asked-for dependencies.)
Otherwise, please tell us the output of "scontrol show job" for that job, to
give the readers more information about the patient (i.e. job).
Sometimes the backfill algorithm finds not time to go to the bottom of
the waiting queue. There exists a quick fix for that problem: a scheduling
parameter "bf_continue", that allows the scheduler to continue down the
queue instead of (as is the most correct behaviour) restarting with an
inspection
of the jobs with the highest priority.
Best wishes,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden