Yiannis, We agreed with the change to the allocation of CPUs when --cpu_bind=sockets that you recommended and have merged your patch. You'll find the change in v2.2.3.
Thank you, Don -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of yiannis georgiou Sent: Wednesday, February 23, 2011 2:43 PM To: slurm-dev Subject: [slurm-dev] Problem with cpu binding to sockets option Hello, I think there is a problem in the current behaviour of the '--cpu_bind=sockets' option of srun/sbatch/salloc commands. Let me explain my understanding: In a cluster with 2 sockets, 6 cores per socket and 1 thread per core: =========== [root@... ~]# numactl --hardware |grep cpus node 0 cpus: 0 1 2 3 4 5 node 1 cpus: 6 7 8 9 10 11 =========== I get correct binding with '--cpu_bind=none' : =========== [root@... ~]$ srun -n4 --cpu_bind=none cat /proc/self/status | grep Cpus_allowed_list; Cpus_allowed_list: 0-1,6-7 Cpus_allowed_list: 0-1,6-7 Cpus_allowed_list: 0-1,6-7 Cpus_allowed_list: 0-1,6-7 =========== and correct binding with '--cpu_bind=cores' : =========== [root@... ~]$ srun -n4 --cpu_bind=cores cat /proc/self/status | grep Cpus_allowed_list; Cpus_allowed_list: 0 Cpus_allowed_list: 6 Cpus_allowed_list: 7 Cpus_allowed_list: 1 =========== but the binding with '--cpu_bind=sockets' binds each task upon the whole socket even if there are CPUs that are not allocated to my job: =========== [root@... ~]$ srun -n4 --cpu_bind=sockets cat /proc/self/status | grep Cpus_allowed_list; Cpus_allowed_list: 0-5 Cpus_allowed_list: 6-11 Cpus_allowed_list: 0-5 Cpus_allowed_list: 6-11 =========== This should not be allowed. In my point of view the correct behaviour of '--cpu_bind=sockets' should bind on sockets but restrict this binding only on my jobs' allocated cpus. This is what I would prefer it to do: =========== [root@... ~]$ srun -n4 --cpu_bind=sockets cat /proc/self/status | grep Cpus_allowed_list; Cpus_allowed_list: 0-1 Cpus_allowed_list: 6-7 Cpus_allowed_list: 0-1 Cpus_allowed_list: 6-7 =========== You can find attached a patch that corrects the previous behaviour and provides the desired result. We might need to change the explanation of the parameter in the man pages as well, in order to reflect to this specific behaviour. Let me know if you agree or if you have different expectations of the '--cpu_bind=sockets' option. Best Regards, yiannis
