Hi, When a user requests all of the GPUs on a system, but less than the total number of CPUs, the CPU bindings aren't ideal
[root@host ~]# nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 mlx5_3 mlx5_1 mlx5_2 mlx5_0 CPU Affinity GPU0 X PHB SYS SYS SYS PHB SYS PHB 0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22 GPU1 PHB X SYS SYS SYS PHB SYS PHB 0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18,20-20,22-22 GPU2 SYS SYS X PHB PHB SYS PHB SYS 1-1,3-3,5-5,7-7,9-9,11-11,13-13,15-15,17-17,19-19,21-21,23-23 GPU3 SYS SYS PHB X PHB SYS PHB SYS 1-1,3-3,5-5,7-7,9-9,11-11,13-13,15-15,17-17,19-19,21-21,23-23 mlx5_3 SYS SYS PHB PHB X SYS PIX SYS mlx5_1 PHB PHB SYS SYS SYS X SYS PIX mlx5_2 SYS SYS PHB PHB PIX SYS X SYS mlx5_0 PHB PHB SYS SYS SYS PIX SYS X $ cat /usr/local/slurm/etc/gres.conf NodeName=host Name=gpu Type=p100 File=/dev/nvidia0 Cores=0,2,4,6,8,10,12,14,16,18,20,22 NodeName=host Name=gpu Type=p100 File=/dev/nvidia1 Cores=0,2,4,6,8,10,12,14,16,18,20,22 NodeName=host Name=gpu Type=p100 File=/dev/nvidia2 Cores=1,3,5,7,9,11,13,15,17,19,21,23 NodeName=host Name=gpu Type=p100 File=/dev/nvidia3 Cores=1,3,5,7,9,11,13,15,17,19,21,23 [scrosby@thespian ~]$ sinteractive -n 20 --gres=gpu:p100:4 srun: job 612 queued and waiting for resources srun: job 612 has been allocated resources [scrosby@host ~]$ cat /sys/fs/cgroup/cpuset/slurm/uid_10255/job_612/cpuset.cpus 0-16,18,20,22 It should ideally be using CPUs 0-19 (split evenly across NUMA nodes). I've tried forcing it with this [scrosby@thespian ~]$ sinteractive -n 20 --gres=gpu:p100:4 --cpu_bind=map_cpu:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 srun: job 614 queued and waiting for resources srun: job 614 has been allocated resources But the resultant CPU binding is still the same [scrosby@host ~]$ cat /sys/fs/cgroup/cpuset/slurm/uid_10255/job_614/cpuset.cpus 0-16,18,20,22 Is there any way to force the CPU bindings of a particular job? Cheers, Sean