I recently set up slurm 2.6.5 on a cluster of Ubuntu 14.04.1 systems hosting several NVIDIA GPUs set up as generic resources. When the compute nodes are rebooted, I noticed that they attempt to start slurmd before the device files initialized by the nvidia kernel module appear, i.e., the following message appears in syslog some number of lines before the GPU kernel driver load messages.
slurmd[1453]: fatal: can't stat gres.conf file /dev/nvidia0: No such file or directory Is there a recommended way (on Ubuntu, at least) to ensure that slurmd isn't started before any GPU device files appear? -- Lev Givon Bionet Group | Neurokernel Project http://www.columbia.edu/~lev/ http://lebedov.github.io/ http://neurokernel.github.io/