Kevin, We (the Slurm team at Bull) have been looking at task affinity also. In your example, you do not specify a binding type. I think you need to specify --cpu_bind=threads. When I adapt your example for our hyperthreaded test node here and specify thread binding, I get the expected results. Tasks are distributed to CPUs cyclically across the sockets. This is the default distribution method. See below. Regards, Martin Perry Bull Phoenix
Node sulu CPU layout reported by /proc/cpuinfo Socket# CPU# 0 0,8 0 1,9 0 2,10 0 3,11 1 4,12 1 5,13 1 6,14 1 7,15 [sulu] (slurm) etc> srun -p sulu-only -n8 --cpu_bind=threads,verbose ... cpu_bind=MASK - sulu, task 4 4 [3306]: mask 0x4 set cpu_bind=MASK - sulu, task 0 0 [3302]: mask 0x1 set cpu_bind=MASK - sulu, task 2 2 [3304]: mask 0x2 set cpu_bind=MASK - sulu, task 3 3 [3305]: mask 0x20 set cpu_bind=MASK - sulu, task 7 7 [3309]: mask 0x80 set cpu_bind=MASK - sulu, task 5 5 [3307]: mask 0x40 set cpu_bind=MASK - sulu, task 6 6 [3308]: mask 0x8 set cpu_bind=MASK - sulu, task 1 1 [3303]: mask 0x10 set "Williams, Kevin E. (Federal SIP)" <[email protected]> Sent by: [email protected] 04/04/2011 06:45 PM Please respond to [email protected] To "[email protected]" <[email protected]> cc Subject [slurm-dev] Question on task affinity Hello, We are running slurm 2.2.0 on a redhat 5.5 cluster. In an effort to understand the task/affinity plugin operation we have set up this simple test ... Note that the target node is a 2 processor, 6 core per processor, 2 threads per core, hyperthreaded system. The partition is set up with only the one node to avoid any resource allocation issues. The actual program being run is not included in this message. We run the following command: srun -n12 --cpu_bind=verbose sh -c 'program and parameters' By default, SLURM uses the following cpu masks: cpu_bind;MASK - target, task 0 0 [29400]: mask 0x8 set cpu_bind;MASK - target, task 1 1 [29401]: mask 0x2 set cpu_bind;MASK - target, task 2 2 [29402]: mask 0x8000 set cpu_bind;MASK - target, task 3 3 [29403]: mask 0x2000 set cpu_bind;MASK - target, task 4 4 [29404]: mask 0x800 set cpu_bind;MASK - target, task 5 5 [29405]: mask 0x200 set cpu_bind;MASK - target, task 6 6 [29406]: mask 0x800000 set cpu_bind;MASK - target, task 7 7 [29407]: mask 0x200000 set cpu_bind;MASK - target, task 8 8 [29408]: mask 0x80 set cpu bind;MASK - target, task 9 9 [29409]: mask 0x20 set cpu-bind;MASK - target, task 10 10 [29410]: mask 0x80000 set cpu=bind;MASK - target, task 11 11 [29411]: mask 0x20000 set With the current default settings, it will take about 25 seconds. Now we run it as follows: srun -n12 --cpu_bind=rank,verbose sh -c 'program and parameters' cpu_bind;RANK - target, task 0 0 [29815]: mask 0xl set cpu_bind;RANK - target, task 1 1 [29816]: mask 0x2 set cpu_bind;RANK - target, task 2 2 [29817]: mask 0x4 set cpu_bind;RANK - target, task 3 3 [29818]: mask 0x8 set cpu_bind;RANK - target, task 4 4 [29819]: mask 0x10 set cpu_bind;RANK - target, task 5 5 [29820]: mask 0x20 set cpu_bind;RANK - target, task 6 6 [29821]: mask 0x40 set cpu_bind;RANK - target, task 7 7 [29822]: mask 0x80 set cpu_bind;RANK - target, task 8 8 [29823]: mask 0x100 set cpu_bind;RANK - target, task 9 9 [29824]: mask 0x200 set cpu_bind;RANK - target, task 10 10 [29825]: mask 0x400 set cpu_bind;RANK - target, task 11 11 [29826]: mask 0x800 set With the mask settings provided by --cpu_bind=rank, it takes only about 12 seconds to finish. We believe these masks represent processor numbers as listed in /proc/cpuinfo, with bit 0xl for processor 0, 0x2 for processor 1, etc. If that is the case, here is a table showing how the masks and processor numbers correspond to the apicid, physical id, and core id fields: apic proc phys core mask 0 3 0 0 8 1 15 0 0 8000 2 11 0 1 800 3 23 0 1 800000 4 7 0 2 80 5 19 0 2 80000 16 1 0 8 2 17 13 0 8 2000 18 9 0 9 200 19 21 0 9 200000 20 5 0 10 20 21 17 0 10 20000 32 0 1 0 1 33 12 1 0 1000 34 8 1 1 100 35 20 1 1 100000 36 4 1 2 10 37 16 1 2 10000 48 2 1 8 4 49 14 1 8 4000 50 10 1 9 400 51 22 1 9 400000 52 6 1 10 40 53 18 1 10 40000 As you can see that the default set of masks corresponds to the 12 threads on socket 0, rather than balancing the load among sockets 0 and 1. We would like to understand why SLURM is choosing these masks by default. The relevant slurm.conf parameters are : TaskEpilog = (null) TaskPlugin = task/affinity TaskPluginParam = (null type) TaskProlog = (null) Please advise. Kevin Williams Hewlett Packard [email protected]
