Hi Lennart,

Lennart Karlsson <[email protected]> writes:
> I have set the configuration parameter JobRequeue to zero, so failed
> jobs should not automatically requeue and rerun:
> # scontrol show config|grep -i requeue
> JobRequeue              = 0
> #
>
> But still jobs are rerun:

I was curious if we were affected by this bug, but it seems to not
affect the rather old slurm 2.4.5 we still use at NSC.

I could reproduce the bug on our slurm 2.6 test installation. However
JobRequeue is only ignored when a node reboots and returns to service
immediately.

When a node stops responding and is set to DOWN after SlurmdTimeout
seconds, then the JobRequeue configuration works as expected for jobs
that was running.

Until the bug gets fixed a temporary work around for you could be to
decrease your SlurmdTimeout to something lower than then the time your
nodes takes to reboot.

Regards,
Pär Lindfors, NSC

Reply via email to