I think that you want to use the output of slurmd -C, but if that isn’t telling
you the truth then you may not have built slurm with the correct libraries. I
believe that you need to build with hwloc in order to get the most accurate
details of the CPU topology. Make sure you have hwloc-devel
I'm working on populating slurm.conf on my nodes, and I noticed that slurmd
-C doesn't agree with lscpu, in all cases, and I'm not sure why. Here is
what lscpu reports:
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
And here is what slurmd -C is reporting:
Jürgen,
>> does it work with `srun --overlap ...´ or if you do `export SLURM_OVERLAP=1´
>> before running your interactive job?
I performed testing yesterday while using the "--overlap" flag, but that didn't
do anything. But, exporting the variable instead seems to have corrected the
issue:
Hi John,
does it work with `srun --overlap ...´ or if you do `export SLURM_OVERLAP=1´
before running your interactive job?
Best regards
Jürgen
* John DeSantis [210428 09:41]:
> Hello all,
>
> Just an update, the following URL almost mirrors the issue we're seeing:
>
I haven't experienced this issue here. Then again we've been using PMIx
for launching MPI for a while now, thus we may have circumvented this
particular issue.
-Paul Edmon-
On 4/28/2021 9:41 AM, John DeSantis wrote:
Hello all,
Just an update, the following URL almost mirrors the issue
Hello all,
Just an update, the following URL almost mirrors the issue we're seeing:
https://github.com/open-mpi/ompi/issues/8378
But, SLURM 20.11.3 was shipped with the fix. I've verified that the changes
are in the source code.
We don't want to have to downgrade SLURM to 20.02.x, but it
Greetings,
I see here https://slurm.schedmd.com/gres.html#GPU_Management that
CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL
GPUs?
Is there an OPENCL_VISIBLE_DEVICES ?
--
Valerio Bellizzomi
https://www.selroc.systems
http://www.selnet.org
On 4/28/21 2:48 AM, Sid Young wrote:
I use SaltStack to push out the slurm.conf file to all nodes and do a
"scontrol reconfigure" of the slurmd, this makes management much easier
across the cluster. You can also do service restarts from one point etc.
Avoid NFS mounts for the config, if the
Il 27/04/2021 17:31, Prentice Bisbal ha scritto:
I don't think PAM comes into play here. Since Slurm is starting the
processes on the compute nodes as the user, etc., PAM is being bypassed.
Then maybe slurmd somehow goes throught the PAM stack another way, since
limits on the frontend got