Re: [slurm-dev] Question on task affinity

Martin . Perry Tue, 05 Apr 2011 10:58:57 -0700

Kevin,
We (the Slurm team at Bull) have been looking at task affinity also. In 
your example, you do not specify a binding type. I think you need to 
specify --cpu_bind=threads.  When I adapt your example for our 
hyperthreaded test node here and specify thread binding, I get the 
expected results.  Tasks are distributed to CPUs cyclically across the 
sockets.  This is the default distribution method. See below.
Regards,
Martin Perry
Bull Phoenix


Node sulu CPU layout reported by /proc/cpuinfo

Socket#    CPU#

0          0,8
0          1,9
0          2,10
0          3,11
1          4,12
1          5,13
1          6,14
1          7,15

[sulu] (slurm) etc> srun -p sulu-only -n8 --cpu_bind=threads,verbose ...
cpu_bind=MASK - sulu, task  4  4 [3306]: mask 0x4 set
cpu_bind=MASK - sulu, task  0  0 [3302]: mask 0x1 set
cpu_bind=MASK - sulu, task  2  2 [3304]: mask 0x2 set
cpu_bind=MASK - sulu, task  3  3 [3305]: mask 0x20 set
cpu_bind=MASK - sulu, task  7  7 [3309]: mask 0x80 set
cpu_bind=MASK - sulu, task  5  5 [3307]: mask 0x40 set
cpu_bind=MASK - sulu, task  6  6 [3308]: mask 0x8 set
cpu_bind=MASK - sulu, task  1  1 [3303]: mask 0x10 set








"Williams, Kevin E. (Federal SIP)" <[email protected]> 
Sent by: [email protected]
04/04/2011 06:45 PM
Please respond to
[email protected]


To
"[email protected]" <[email protected]>
cc

Subject
[slurm-dev] Question on task affinity






Hello,

We are running slurm 2.2.0 on a redhat 5.5 cluster.

In an effort to understand the task/affinity plugin operation we have set 
up this simple test ... 

Note that the target node is a 2 processor, 6 core per processor, 2 
threads per core, hyperthreaded system.  The partition is set up with only 
the one node to avoid any
resource allocation issues.  The actual program being run is not included 
in this message.

We run the following command:

  srun   -n12 --cpu_bind=verbose  sh -c 'program and parameters'

By default, SLURM uses the following cpu masks:

cpu_bind;MASK - target, task  0  0 [29400]: mask 0x8 set 
cpu_bind;MASK - target, task  1  1 [29401]: mask 0x2 set 
cpu_bind;MASK - target, task  2  2 [29402]: mask 0x8000 set 
cpu_bind;MASK - target, task  3  3 [29403]: mask 0x2000 set 
cpu_bind;MASK - target, task  4  4 [29404]: mask 0x800 set 
cpu_bind;MASK - target, task  5  5 [29405]: mask 0x200 set 
cpu_bind;MASK - target, task  6  6 [29406]: mask 0x800000 set 
cpu_bind;MASK - target, task  7  7 [29407]: mask 0x200000 set 
cpu_bind;MASK - target, task  8  8 [29408]: mask 0x80 set 
cpu bind;MASK - target, task  9  9 [29409]: mask 0x20 set 
cpu-bind;MASK - target, task 10 10 [29410]: mask 0x80000 set 
cpu=bind;MASK - target, task 11 11 [29411]: mask 0x20000 set 

With the current default settings, it will take about 25 seconds. 

 
Now we run it as follows: 
 
  srun   -n12 --cpu_bind=rank,verbose   sh -c 'program and parameters'
 
cpu_bind;RANK - target, task  0  0 [29815]: mask 0xl set 
cpu_bind;RANK - target, task  1  1 [29816]: mask 0x2 set 
cpu_bind;RANK - target, task  2  2 [29817]: mask 0x4 set 
cpu_bind;RANK - target, task  3  3 [29818]: mask 0x8 set 
cpu_bind;RANK - target, task  4  4 [29819]: mask 0x10 set 
cpu_bind;RANK - target, task  5  5 [29820]: mask 0x20 set 
cpu_bind;RANK - target, task  6  6 [29821]: mask 0x40 set 
cpu_bind;RANK - target, task  7  7 [29822]: mask 0x80 set 
cpu_bind;RANK - target, task  8  8 [29823]: mask 0x100 set 
cpu_bind;RANK - target, task  9  9 [29824]: mask 0x200 set 
cpu_bind;RANK - target, task 10 10 [29825]: mask 0x400 set 
cpu_bind;RANK - target, task 11 11 [29826]: mask 0x800 set 

With the mask settings provided by --cpu_bind=rank, it takes only about 12 
seconds to finish. 

We believe these masks represent processor numbers as listed in 
/proc/cpuinfo, with bit 0xl for processor 0, 0x2 for processor 1, etc. If 
that is the case, here is a table showing how the masks and processor 
numbers correspond to the apicid, physical id, and core id fields: 

  apic  proc  phys  core     mask
     0     3     0     0        8
     1    15     0     0     8000
     2    11     0     1      800
     3    23     0     1   800000 
     4     7     0     2       80 
     5    19     0     2    80000 
    16     1     0     8        2 
    17    13     0     8     2000 
    18     9     0     9      200 
    19    21     0     9   200000 
    20     5     0    10       20 
    21    17     0    10    20000 
    32     0     1     0        1 
    33    12     1     0     1000 
    34     8     1     1      100 
    35    20     1     1   100000 
    36     4     1     2       10 
    37    16     1     2    10000 
    48     2     1     8        4 
    49    14     1     8     4000 
    50    10     1     9      400 
    51    22     1     9   400000 
    52     6     1    10       40 
    53    18     1    10    40000 

 
As you can see that the default set of masks corresponds to the 12 threads
on socket 0, rather than balancing the load among sockets 0 and 1. 

We would like to understand why SLURM is choosing these masks by default.  
 
 

The relevant slurm.conf parameters are :

TaskEpilog      = (null)
TaskPlugin      = task/affinity
TaskPluginParam = (null type)
TaskProlog      = (null)


Please advise.

Kevin Williams
Hewlett Packard
[email protected]

Re: [slurm-dev] Question on task affinity

Reply via email to