Re: [slurm-users] [External] slurmd -C vs lscpu - which do I use to populate slurm.conf?

2021-04-28 Thread Michael Robbert
I think that you want to use the output of slurmd -C, but if that isn’t telling you the truth then you may not have built slurm with the correct libraries. I believe that you need to build with hwloc in order to get the most accurate details of the CPU topology. Make sure you have hwloc-devel

[slurm-users] slurmd -C vs lscpu - which do I use to populate slurm.conf?

2021-04-28 Thread David Henkemeyer
I'm working on populating slurm.conf on my nodes, and I noticed that slurmd -C doesn't agree with lscpu, in all cases, and I'm not sure why. Here is what lscpu reports: Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 And here is what slurmd -C is reporting:

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-04-28 Thread John DeSantis
Jürgen, >> does it work with `srun --overlap ...´ or if you do `export SLURM_OVERLAP=1´ >> before running your interactive job? I performed testing yesterday while using the "--overlap" flag, but that didn't do anything. But, exporting the variable instead seems to have corrected the issue:

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-04-28 Thread Juergen Salk
Hi John, does it work with `srun --overlap ...´ or if you do `export SLURM_OVERLAP=1´ before running your interactive job? Best regards Jürgen * John DeSantis [210428 09:41]: > Hello all, > > Just an update, the following URL almost mirrors the issue we're seeing: >

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-04-28 Thread Paul Edmon
I haven't experienced this issue here.  Then again we've been using PMIx for launching MPI for a while now, thus we may have circumvented this particular issue. -Paul Edmon- On 4/28/2021 9:41 AM, John DeSantis wrote: Hello all, Just an update, the following URL almost mirrors the issue

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-04-28 Thread John DeSantis
Hello all, Just an update, the following URL almost mirrors the issue we're seeing: https://github.com/open-mpi/ompi/issues/8378 But, SLURM 20.11.3 was shipped with the fix. I've verified that the changes are in the source code. We don't want to have to downgrade SLURM to 20.02.x, but it

[slurm-users] CUDA vs OpenCL

2021-04-28 Thread Valerio Bellizzomi
Greetings, I see here https://slurm.schedmd.com/gres.html#GPU_Management that CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL GPUs? Is there an OPENCL_VISIBLE_DEVICES ? -- Valerio Bellizzomi https://www.selroc.systems http://www.selnet.org

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-28 Thread Ole Holm Nielsen
On 4/28/21 2:48 AM, Sid Young wrote: I use SaltStack to push out the slurm.conf file to all nodes and do a "scontrol reconfigure" of the slurmd, this makes management much easier across the cluster. You can also do service restarts from one point etc. Avoid NFS mounts for the config, if the

Re: [slurm-users] [External] Re: PropagateResourceLimits

2021-04-28 Thread Diego Zuccato
Il 27/04/2021 17:31, Prentice Bisbal ha scritto: I don't think PAM comes into play here. Since Slurm is starting the processes on the compute nodes as the user, etc., PAM is being bypassed. Then maybe slurmd somehow goes throught the PAM stack another way, since limits on the frontend got