[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-31 Thread Lev Givon
Received from Andy Riebs on Fri, Aug 29, 2014 at 01:48:53PM EDT: On 08/29/2014 12:13 PM, Lev Givon wrote: I recently set up slurm 2.6.5 on a cluster of Ubuntu 14.04.1 systems hosting several NVIDIA GPUs set up as generic resources. When the compute nodes are rebooted, I noticed that

[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-31 Thread Christopher Samuel
On 30/08/14 02:14, Lev Givon wrote: Is there a recommended way (on Ubuntu, at least) to ensure that slurmd isn't started before any GPU device files appear? To be honest my policy has been for many years to never start queuing system daemons on boot, it's too easy to have a node go bad,

[slurm-dev] Re: starting slurmd only after GPUs are fully initialized

2014-08-29 Thread Andy Riebs
One way to work around this is to set the node definition(s) in slurm.conf with State=DOWN. That way, manual intervention will be required when a node is rebooted, allowing the rest of the system to finish coming up. Andy On 08/29/2014 12:13 PM, Lev Givon wrote: I recently set up slurm