Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-27 Thread Tina Friedrich
Yeah, I don't build against NVML either at the moment (it's filed under 'try when you've got some spare time'). I'm pretty much 'autodetecting' what my gres.conf file needs to look like on nodes via my config management, and that all seems to work just fine. CUDA_VISIBLE_DEVIZES and cgroup

[slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Ole Holm Nielsen
In another thread, On 26-01-2021 17:44, Prentice Bisbal wrote: Personally, I think it's good that Slurm RPMs are now available through EPEL, although I won't be able to use them, and I'm sure many people on the list won't be able to either, since licensing issues prevent them from providing

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Christopher Samuel
On 1/26/21 12:10 pm, Ole Holm Nielsen wrote: What I don't understand is, is it actually *required* to make the NVIDIA libraries available to Slurm?  I didn't do that, and I'm not aware of any problems with our GPU nodes so far.  Of course, our GPU nodes have the libraries installed and the

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Paul Raines
Yes, you need to check inside the job. This is a while ago now, but I am pretty sure I remember though from the SLURM accounting aspect the jobs were being assigned GPUs fine as you would see in 'scontrol show job' or 'sacct --job', the CUDA_VISIBLE_DEVICES environment variable was not being

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Paul Edmon
That is correct.  I think NVML has some additional features but in terms of actually scheduling them what you have should work. They will just be treated as normal gres resources. -Paul Edmon- On 1/26/2021 3:55 PM, Ole Holm Nielsen wrote: On 26-01-2021 21:36, Paul Edmon wrote: You can

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Ole Holm Nielsen
On 26-01-2021 21:36, Paul Edmon wrote: You can include gpu's as gres in slurm with out compiling specifically against nvml.  You only really need to do that if you want to use the autodetection features that have been built into the slurm.  We don't really use any of those features at our

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Ole Holm Nielsen
Thanks Paul! On 26-01-2021 21:11, Paul Raines wrote: You should check your jobs that allocated GPUs and make sure CUDA_VISIBLE_DEVICES is being set in the environment.  This is a sign you GPU support is not really there but SLURM is just doing "generic" resource assignment. Could you

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Paul Edmon
You can include gpu's as gres in slurm with out compiling specifically against nvml.  You only really need to do that if you want to use the autodetection features that have been built into the slurm.  We don't really use any of those features at our site, we only started building against nvml

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Robert Kudyba
You all might be interested in a patch to the SPEC file, to not make the slurm RPMs depend on libnvidia-ml.so, even if it's been enabled at configure time. See https://bugs.schedmd.com/show_bug.cgi?id=7919#c3 On Tue, Jan 26, 2021 at 3:17 PM Paul Raines wrote: > > You should check your jobs that

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Paul Raines
You should check your jobs that allocated GPUs and make sure CUDA_VISIBLE_DEVICES is being set in the environment. This is a sign you GPU support is not really there but SLURM is just doing "generic" resource assignment. I have both GPU and non-GPU nodes. I build SLURM rpms twice. Once on a

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Ole Holm Nielsen
Thanks Paul! On 26-01-2021 20:50, Paul Edmon wrote: In our RPM spec we use to build slurm we do the following additional things for GPU's: BuildRequires: cuda-nvml-devel-11-1 the in the %build section we do: export CFLAGS="$CFLAGS -L/usr/local/cuda-11.1/targets/x86_64-linux/lib/stubs/

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Paul Edmon
In our RPM spec we use to build slurm we do the following additional things for GPU's: BuildRequires: cuda-nvml-devel-11-1 the in the %build section we do: export CFLAGS="$CFLAGS -L/usr/local/cuda-11.1/targets/x86_64-linux/lib/stubs/ -I/usr/local/cuda-11.1/targets/x86_64-linux/include/"