Hi folks

A heads up for those using or looking to use node suspend/resume aka
power saving aka elastic computing in 16.05.x.

The slurmctld will lose your list of excluded nodes/partitions on an:

scontrol reconfigure

and then will treat all nodes as being eligible for power control,
putting them into a bad state. :-(

This is Slurm bug:

https://bugs.schedmd.com/show_bug.cgi?id=3078

which has been hit separately by two friends of mine at different
places, one of whom I'm helping out with elastic computing/cloudburst.

Hopefully this saves someone else from losing sleep over this!

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to