Hi,

I'm using 14.03.0-0pre5 on SL6.4/RHEL6.4.

I want a node to reboot at the end of its current job under certain circumstances. To that end the prologue issues

scontrol reboot_nodes `hostname`

from the allocated node when these circumstances arise (to cleanly reset a 
graphics card as it happens).

The node enters the MAINT state as expected, and reboots as expected at the end of the job, but the MAINT state is not cleared when it comes back requiring administrative intervention.

From the man pages, this seems not to be the correct behaviour. I'm also seeing "Reason=Node unexpectedly rebooted" which makes no sense since the reboot was indeed expected.

I see the same behaviour when scontrol reboot_nodes is issued from outside the prologue on a different node, if the target node of the reboot has been allocated to a job at the time of the command.

If I issue the same command while the target node is idle, so that the node reboots immediately, the maint state is cleared correctly.

Any advice on this welcome.

Best regards,

Stuart

--
Dr. Stuart Rankin

Senior System Administrator
High Performance Computing Service
University of Cambridge
Email: [email protected]
Tel: (+)44 1223 763517

Reply via email to