t;
Betreff: [slurm-dev] Re: KNL node down after reboot
Hi,
A few suggestions:
1) Try increasing the timeouts:
SlurmctldTimeout=600
SlurmdTimeout=600
ResumeTimeout=600
2) Make sure that when slurm starts the node finished mounting file systems
and the whole boot
Hi,
A few suggestions:
1) Try increasing the timeouts:
SlurmctldTimeout=600
SlurmdTimeout=600
ResumeTimeout=600
2) Make sure that when slurm starts the node finished mounting file
systems and the whole boot procedure is done,
Regards,
Costin
On Tue, May 16, 2017 at
SLURM has worked this way as long as I can remember. If you don't use scontrol
reboot_nodes, nodes are "down" when they come back because SLURM wasn't
notified about the reboot. This is configurable in slurm.conf.
From: nico.faer...@id.unibe.ch