Yiannis,

We agreed with the change to the allocation of CPUs when --cpu_bind=sockets 
that you recommended and have merged your patch.  You'll find the change in 
v2.2.3.

Thank you,
Don

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of yiannis georgiou
Sent: Wednesday, February 23, 2011 2:43 PM
To: slurm-dev
Subject: [slurm-dev] Problem with cpu binding to sockets option

Hello,

I think there is a problem in the current behaviour of the
'--cpu_bind=sockets' option of srun/sbatch/salloc commands.

Let me explain my understanding:
In a cluster with 2 sockets, 6 cores per socket and 1 thread per core:

===========
[root@... ~]# numactl --hardware |grep cpus
node 0 cpus: 0 1 2 3 4 5
node 1 cpus: 6 7 8 9 10 11
===========

I get correct binding with '--cpu_bind=none' :

===========
[root@... ~]$ srun -n4 --cpu_bind=none cat /proc/self/status | grep
Cpus_allowed_list;
Cpus_allowed_list:    0-1,6-7
Cpus_allowed_list:    0-1,6-7
Cpus_allowed_list:    0-1,6-7
Cpus_allowed_list:    0-1,6-7
===========

and correct binding with '--cpu_bind=cores' :

===========
[root@... ~]$ srun -n4 --cpu_bind=cores cat /proc/self/status | grep
Cpus_allowed_list;
Cpus_allowed_list:    0
Cpus_allowed_list:    6
Cpus_allowed_list:    7
Cpus_allowed_list:    1
===========

but the binding with '--cpu_bind=sockets' binds each task upon the whole
socket even if there are CPUs that are not allocated to my job:

===========
[root@... ~]$ srun -n4 --cpu_bind=sockets cat /proc/self/status | grep
Cpus_allowed_list;
Cpus_allowed_list:    0-5
Cpus_allowed_list:    6-11
Cpus_allowed_list:    0-5
Cpus_allowed_list:    6-11
===========

This should not be allowed. In my point of view the correct behaviour of
'--cpu_bind=sockets' should bind on sockets but restrict this binding
only on my jobs' allocated cpus. This is what I would prefer it to do:

===========
[root@... ~]$ srun -n4 --cpu_bind=sockets cat /proc/self/status | grep
Cpus_allowed_list;
Cpus_allowed_list:    0-1
Cpus_allowed_list:    6-7
Cpus_allowed_list:    0-1
Cpus_allowed_list:    6-7
===========

You can find attached a patch that corrects the previous behaviour and 
provides the desired result.
We might need to change the explanation of the parameter in the man
pages as well, in order to reflect to this specific behaviour.

Let me know if you agree or if you have different expectations of the
'--cpu_bind=sockets' option.


Best Regards,
yiannis




Reply via email to