This is controlled by the slurm.conf directive 'ReturnToService'.

From https://computing.llnl.gov/linux/slurm/slurm.conf.html

*ReturnToService* Controls when a DOWN node will be returned to service.
The default value is 0. Supported values include
*0* A node will remain in the DOWN state until a system administrator
explicitly changes its state (even if the slurmd daemon registers and
resumes communications). *1* A DOWN node will become available for use upon
registration with a valid configuration only if it was set DOWN due to
being non-responsive. If the node was set DOWN for any other reason (low
memory, prolog failure, epilog failure, unexpected reboot, etc.), its state
will not automatically be changed. *2* A DOWN node will become available
for use upon registration with a valid configuration. The node could have
been set DOWN for any reason. (Disabled on Cray systems.)


Brian


On Tue, Sep 9, 2014 at 9:11 AM, Brian B <[email protected]> wrote:

> Greetings,
>
> Is it possible to have slurm compute nodes added back into a partition
> after they perform an unscheduled reboot? That is we have some machines
> that fail on brown outs (we are working on solving that problem) and will
> reboot when this occurs. They come back fine but slurm doesn’t add them
> back into he partition. I am able to do so using control by updating their
> state to IDLE. Is this able to be automated?
>
> Regards,
> Brian
>
>

Reply via email to