Checkout the kill_invalid_depend SchedulerParameter. This is probably
what you are looking for. You can do an "scontrol reconfigure" to read
in the change.
man slurm.conf:
kill_invalid_depend
If a job has an invalid dependency and it can
never run terminate it and set
its state to be JOB_CANCELLED. By default the
job stays pending with reason
DependencyNeverSatisfied.
On 02/27/2015 12:25 PM, Bill Wichser wrote:
Looking through waiting jobs, the list of jobs which will never run
due to dependency problems is ever growing. I have been notifying
users to tell them which jobs remain waiting, probably cascading to
others with dependencies on these jobs as well and asking them to
cancel these.
My question is, who is supposed to ultimately deal with this?
Obviously the scheduler realizes that these jobs are doomed. But is
it that component who should be dealing with these jobs here or will
it always require some human intervention? Just thinking that I must
have missed something along the way here!
Thanks,
Bill