Hi Lennart, Lennart Karlsson <[email protected]> writes: > I have set the configuration parameter JobRequeue to zero, so failed > jobs should not automatically requeue and rerun: > # scontrol show config|grep -i requeue > JobRequeue = 0 > # > > But still jobs are rerun:
I was curious if we were affected by this bug, but it seems to not affect the rather old slurm 2.4.5 we still use at NSC. I could reproduce the bug on our slurm 2.6 test installation. However JobRequeue is only ignored when a node reboots and returns to service immediately. When a node stops responding and is set to DOWN after SlurmdTimeout seconds, then the JobRequeue configuration works as expected for jobs that was running. Until the bug gets fixed a temporary work around for you could be to decrease your SlurmdTimeout to something lower than then the time your nodes takes to reboot. Regards, Pär Lindfors, NSC
