[slurm-dev] Re: KNL node down after reboot

2017-05-17 Thread nico.faerber
t; Betreff: [slurm-dev] Re: KNL node down after reboot Hi, A few suggestions: 1) Try increasing the timeouts: SlurmctldTimeout=600 SlurmdTimeout=600 ResumeTimeout=600 2) Make sure that when slurm starts the node finished mounting file systems and the whole boot

[slurm-dev] Re: KNL node down after reboot

2017-05-16 Thread Costin Caramarcu
Hi, A few suggestions: 1) Try increasing the timeouts: SlurmctldTimeout=600 SlurmdTimeout=600 ResumeTimeout=600 2) Make sure that when slurm starts the node finished mounting file systems and the whole boot procedure is done, Regards, Costin On Tue, May 16, 2017 at

[slurm-dev] Re: KNL node down after reboot

2017-05-16 Thread Ryan Novosielski
SLURM has worked this way as long as I can remember. If you don't use scontrol reboot_nodes, nodes are "down" when they come back because SLURM wasn't notified about the reboot. This is configurable in slurm.conf. From: nico.faer...@id.unibe.ch