On 10/24/2013 03:32 PM, Pär Lindfors wrote:

Hi Lennart,

Lennart Karlsson <[email protected]> writes:
I have set the configuration parameter JobRequeue to zero, so failed
jobs should not automatically requeue and rerun:
# scontrol show config|grep -i requeue
JobRequeue              = 0
#

But still jobs are rerun:

I was curious if we were affected by this bug, but it seems to not
affect the rather old slurm 2.4.5 we still use at NSC.

I could reproduce the bug on our slurm 2.6 test installation. However
JobRequeue is only ignored when a node reboots and returns to service
immediately.

When a node stops responding and is set to DOWN after SlurmdTimeout
seconds, then the JobRequeue configuration works as expected for jobs
that was running.

Until the bug gets fixed a temporary work around for you could be to
decrease your SlurmdTimeout to something lower than then the time your
nodes takes to reboot.

Regards,
Pär Lindfors, NSC

Hi Pär,

It is great to have you back in the SLURM world! Thanks for your
advice. At least one more of our users was bitten by the bug today.

Moe Jette says that a fix will be included in version 2.6.4,
so I will probably wait for that.

Cheers,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden

Reply via email to