Hi folks A heads up for those using or looking to use node suspend/resume aka power saving aka elastic computing in 16.05.x.
The slurmctld will lose your list of excluded nodes/partitions on an: scontrol reconfigure and then will treat all nodes as being eligible for power control, putting them into a bad state. :-( This is Slurm bug: https://bugs.schedmd.com/show_bug.cgi?id=3078 which has been hit separately by two friends of mine at different places, one of whom I'm helping out with elastic computing/cloudburst. Hopefully this saves someone else from losing sleep over this! All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci