On 30/08/14 02:14, Lev Givon wrote:

> Is there a recommended way (on Ubuntu, at least) to ensure that slurmd isn't
> started before any GPU device files appear?

To be honest my policy has been for many years to never start queuing
system daemons on boot, it's too easy to have a node go bad, reboot,
come back up, take a job, go bad, reboot, take a job, go bad, reboot,
repeat until no jobs left.

DIMMs go bad, IB & accelerator cards go bad and cause NMIs, for us it's
not worth the risk.

We rarely reboot nodes other than hardware failure or for a software
upgrade so if one does go bad we want to go and find out why before we
let it back into the cluster.

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to