Re: [slurm-users] Which ports does slurm use?

2020-02-07 Thread Ole Holm Nielsen
On 06-02-2020 22:40, Dean Schulze wrote: I've moved two nodes to a different controller.  The nodes are wired and the controller is networked via wifi.  I had to open up ports 6817 and 6818 between the wired and wireless sides of our network to get any connectivity. Now when I do srun -N2

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Dean Schulze
So this is related to the gpu/nvml plugin in the source code tree. That didn't get built because I didn't have the nvidia driver (really the library libnvidia-ml.so) installed when I built the code. I see in config.log where it tries to find -lnvidia-ml and it skips building the gpu.nvml plugin

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Christopher Samuel
Hi Dean, On 2/7/20 8:03 AM, dean.w.schu...@gmail.com wrote: I just checked the .deb package that I build from source and there is nothing in it that has nv or cuda in its name. Are you sure that slurm distributes nvidia binaries? SchedMD only distributes sources, it's up to distros how

Re: [slurm-users] problem running slurm

2020-02-07 Thread Brian Andrus
Your trying to run bash which, without special configuration, needs a pty Try srun -v -p debug --pty bash Brian Andrus On 2/6/2020 10:28 PM, Hector Yuen wrote: Hello, I am setting up a very simple configuration: one node running slurmd and another one running slurmctld. In the slurmctld

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth
I didn't say slurm distributes nvidia binaries. But slurm's gpu_nvml.so links to libnvidia-ml.so if it was found at build time: $ ldd lib/slurm/gpu_nvml.so ... libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x7f2d2bac8000) ... When I run configure

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth
gpu_nvml.so links to libnvidia-ml.so: $ ldd lib/slurm/gpu_nvml.so ... libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x7f2d2bac8000) ... When you run configure you'll see something along these lines: On 07.02.20 17:03, dean.w.schu...@gmail.com wrote:

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread dean.w.schulze
I just checked the .deb package that I build from source and there is nothing in it that has nv or cuda in its name. Are you sure that slurm distributes nvidia binaries? -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM To:

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread dean.w.schulze
I didn't know that slurm had nvml linked into it. I build slurm from source and didn't notice that nvml was part of the build. I'll check on that again. -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM To:

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth
On 05.02.20 21:06, Dean Schulze wrote: > I need to dynamically configure gpus on my nodes. The gres.conf doc > says to use > > Autodetect=nvml That's all you need in gres.conf provided you don't configure any Gres=... entries for your nodes in your slurm.conf. If you do, make sure the string