Matteo,
On your nodes, numactl reports that there 8 NUMA nodes with 6 cpus each,
but your node definition specifies 4 sockets with 12 cpus each. A NUMA
node on your system is not equivalent to a socket as defined in your node
definition. So I'm not sure your results here tell us anything useful. To
reliably determine which Linux logical cpu numbers (as reported by
Cpus_allowed_list) are on which physical socket, I think you need to look
at /proc/cpuinfo. Logical cpu numbers are labeled "processor" and socket
numbers are labeled "physical id". This will tell us for sure how your
tasks are being distributed across the sockets.
I only have a single 4-socket node, so I can't run any tests with -N
greater than 1. But that shouldn't matter. Your results show the
expected distribution across nodes. It is only the distribution across
sockets within each node that appears to be incorrect.
Regards,
Martin
Matteo Guglielmi<[email protected]>
Sent by: [email protected]
10/26/2011 02:18 PM
Please respond to
[email protected]
To
[email protected]
cc
Subject
Re: [slurm-dev] Fat Nodes (48 cores) Job Allocation& Distribution (A much
more complicated example)
An update to the complicated example:
(1) upgraded to slurm 2.3.1
(2) did not change slurm conf (same conf used with slurm 2.2.7)
TaskPlugin=task/affinity
TaskPluginParam=Sched
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/linux
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio
#
NodeName=DEFAULT TmpDisk=16384 State=UNKNOWN
#
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1
RealMemory=127000 Weight=1 Feature=6176,foff,foffhm
#
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=UNLIMITED PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
#
PartitionName=batch Nodes=foff[09-13] Default=YES
#
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
(3) [software@foff10:~]$ numactl --hardware | grep cpus
node 0 cpus: 0 1 2 3 4 5
node 1 cpus: 6 7 8 9 10 11
node 2 cpus: 12 13 14 15 16 17
node 3 cpus: 18 19 20 21 22 23
node 4 cpus: 24 25 26 27 28 29
node 5 cpus: 30 31 32 33 34 35
node 6 cpus: 36 37 38 39 40 41
node 7 cpus: 42 43 44 45 46 47
(4) --exclusive& slurm 2.3.1 give same distribution as before (slurm
2.2.7& --exclusive) and --exclusive DOES MATTER (as before).
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core --exclusive -l hostname | sort
00: foff10
01: foff11
02: foff12
03: foff13
04: foff10
05: foff11
06: foff12
07: foff13
08: foff10
09: foff11
10: foff12
11: foff13
12: foff10
13: foff11
14: foff12
15: foff13
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core --exclusive -l cat /proc/self/status |
grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,12
01: Cpus_allowed_list: 0,12
02: Cpus_allowed_list: 0,12
03: Cpus_allowed_list: 0,12
04: Cpus_allowed_list: 24,36
05: Cpus_allowed_list: 24,36
06: Cpus_allowed_list: 24,36
07: Cpus_allowed_list: 24,36
08: Cpus_allowed_list: 6,18
09: Cpus_allowed_list: 6,18
10: Cpus_allowed_list: 6,18
11: Cpus_allowed_list: 6,18
12: Cpus_allowed_list: 30,42
13: Cpus_allowed_list: 30,42
14: Cpus_allowed_list: 30,42
15: Cpus_allowed_list: 30,42
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core -l hostname | sort
00: foff10
01: foff11
02: foff12
03: foff13
04: foff10
05: foff11
06: foff12
07: foff13
08: foff10
09: foff11
10: foff12
11: foff13
12: foff10
13: foff11
14: foff12
15: foff13
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core -l cat /proc/self/status | grep
Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,6
01: Cpus_allowed_list: 0,6
02: Cpus_allowed_list: 0,6
03: Cpus_allowed_list: 0,6
04: Cpus_allowed_list: 1,7
05: Cpus_allowed_list: 1,7
06: Cpus_allowed_list: 1,7
07: Cpus_allowed_list: 1,7
08: Cpus_allowed_list: 2,8
09: Cpus_allowed_list: 2,8
10: Cpus_allowed_list: 2,8
11: Cpus_allowed_list: 2,8
12: Cpus_allowed_list: 3,9
13: Cpus_allowed_list: 3,9
14: Cpus_allowed_list: 3,9
15: Cpus_allowed_list: 3,9
REMARK1: No difference using CR_Core or CR_Core_Memory.
REMARK2: Can you try something like this too???
srun -p mns0-only -N 4 -n 16 --ntasks-per-node=4 -c 2
--distribution=cyclic:cyclic (--exclusive) -l ....
--matt
On 10/24/11 22:46, [email protected] wrote:
Matteo,
It should not be necessary to specify --exclusive to get cyclic binding
across all the sockets of a node. I was able to run a test on a
4-socket
node today. I cannot reproduce the problem you are seeing. Slurm is
correctly allocating and binding across all 4 sockets, both with and
without --exclusive. See the example below. Do you get the same
results
if you specify CR_Core instead of CR_Core_Memory?
Martin
SelectType=select/cons_res
SelectTypeParameters=CR_Core
TaskPlugin=task/affinity
NodeName=n0 NodeHostname=mns0 NodeAddr=XXX.XXX.XX.XX Sockets=4
CoresPerSocket=8
ThreadsPerCore=1 Procs=32 State=IDLE
PartitionName=mns0-only Nodes=n0 State=UP
[slurm@mns0 /]$ numactl --hardware | grep cpus
node 0 cpus: 0 4 8 12 16 20 24 28<--NUMA node# here is equivalent to
socket#
node 1 cpus: 1 5 9 13 17 21 25 29
node 2 cpus: 2 6 10 14 18 22 26 30
node 3 cpus: 3 7 11 15 19 23 27 31
[slurm@mns0 /]$ srun -p mns0-only -n 8 --distribution=cyclic:cyclic -l
--cpu_bind=core --exclusive cat /proc/self/status | grep
Cpus_allowed_list
| sort
0: Cpus_allowed_list: 0
1: Cpus_allowed_list: 1
2: Cpus_allowed_list: 2
3: Cpus_allowed_list: 3
4: Cpus_allowed_list: 16
5: Cpus_allowed_list: 17
6: Cpus_allowed_list: 18
7: Cpus_allowed_list: 19
[slurm@mns0 /]$ srun -p mns0-only -n 8 --distribution=cyclic:cyclic -l
--cpu_bind=core cat /proc/self/status | grep Cpus_allowed_list | sort
0: Cpus_allowed_list: 0
1: Cpus_allowed_list: 1
2: Cpus_allowed_list: 2
3: Cpus_allowed_list: 3
4: Cpus_allowed_list: 16
5: Cpus_allowed_list: 17
6: Cpus_allowed_list: 18
7: Cpus_allowed_list: 19
Matteo Guglielmi<[email protected]>
Sent by: [email protected]
10/21/2011 04:10 AM
Please respond to
[email protected]
To
[email protected]
cc
Subject
Re: [slurm-dev] Fat Nodes (48 cores) Job Allocation& Distribution (A
much
more complicated example)
That makes a huge difference in all cases actually:
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 --cpu_bind=core -l --exclusive hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 --cpu_bind=core --exclusive -l cat /proc/self/status
|
grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 14,20
01: Cpus_allowed_list: 15,21
02: Cpus_allowed_list: 16,22
03: Cpus_allowed_list: 17,23
04: Cpus_allowed_list: 14,20
05: Cpus_allowed_list: 15,21
06: Cpus_allowed_list: 16,22
07: Cpus_allowed_list: 17,23
08: Cpus_allowed_list: 14,20
09: Cpus_allowed_list: 15,21
10: Cpus_allowed_list: 16,22
11: Cpus_allowed_list: 17,23
12: Cpus_allowed_list: 14,20
13: Cpus_allowed_list: 15,21
14: Cpus_allowed_list: 16,22
15: Cpus_allowed_list: 17,23
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 --cpu_bind=core -l --exclusive hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 --cpu_bind=core --exclusive -l cat /proc/self/status
|
grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,12
01: Cpus_allowed_list: 24,36
02: Cpus_allowed_list: 6,18
03: Cpus_allowed_list: 30,42
04: Cpus_allowed_list: 0,12
05: Cpus_allowed_list: 24,36
06: Cpus_allowed_list: 6,18
07: Cpus_allowed_list: 30,42
08: Cpus_allowed_list: 0,12
09: Cpus_allowed_list: 24,36
10: Cpus_allowed_list: 6,18
11: Cpus_allowed_list: 30,42
12: Cpus_allowed_list: 0,12
13: Cpus_allowed_list: 24,36
14: Cpus_allowed_list: 6,18
15: Cpus_allowed_list: 30,42
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 --cpu_bind=core -l --exclusive hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 --cpu_bind=core --exclusive -l cat /proc/self/status
|
grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 14,20
01: Cpus_allowed_list: 14,20
02: Cpus_allowed_list: 14,20
03: Cpus_allowed_list: 14,20
04: Cpus_allowed_list: 15,21
05: Cpus_allowed_list: 15,21
06: Cpus_allowed_list: 15,21
07: Cpus_allowed_list: 15,21
08: Cpus_allowed_list: 16,22
09: Cpus_allowed_list: 16,22
10: Cpus_allowed_list: 16,22
11: Cpus_allowed_list: 16,22
12: Cpus_allowed_list: 17,23
13: Cpus_allowed_list: 17,23
14: Cpus_allowed_list: 17,23
15: Cpus_allowed_list: 17,23
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core -l --exclusive hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core --exclusive -l cat /proc/self/status
|
grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,12
01: Cpus_allowed_list: 0,12
02: Cpus_allowed_list: 0,12
03: Cpus_allowed_list: 0,12
04: Cpus_allowed_list: 24,36
05: Cpus_allowed_list: 24,36
06: Cpus_allowed_list: 24,36
07: Cpus_allowed_list: 24,36
08: Cpus_allowed_list: 6,18
09: Cpus_allowed_list: 6,18
10: Cpus_allowed_list: 6,18
11: Cpus_allowed_list: 6,18
12: Cpus_allowed_list: 30,42
13: Cpus_allowed_list: 30,42
14: Cpus_allowed_list: 30,42
15: Cpus_allowed_list: 30,42
On 10/21/11 09:04, Carles Fenoy wrote:
Hi,
Have you tryied with --exclusive? As far as I understand, you are not
asking
for all the resources of a node, so slurm won't give you access to all
the
sockets.
Maybe that's not the point, but I'll try it first.
Carles
On Fri, Oct 21, 2011 at 1:36 AM, Matteo Guglielmi
<[email protected]>wrote:
OK, so...
(1) is there any way to map slurm core numbers to physical cores?
I've added the --cpu_bind=core option.. things got better... the new
outputs are reported here
below... but to me it still seems that cores are not assigned
cyclically...
but because of (1)
I cannot tell of course... unless I stick my fingers on all cpus and
see if
they get evenly hot
:-)
What would you say looking at these numbers?
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 --cpu_bind=core -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 --cpu_bind=core -l cat /proc/self/status | grep
Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,6
01: Cpus_allowed_list: 1,7
02: Cpus_allowed_list: 2,8
03: Cpus_allowed_list: 3,9
04: Cpus_allowed_list: 0,6
05: Cpus_allowed_list: 1,7
06: Cpus_allowed_list: 2,8
07: Cpus_allowed_list: 3,9
08: Cpus_allowed_list: 0,6
09: Cpus_allowed_list: 1,7
10: Cpus_allowed_list: 2,8
11: Cpus_allowed_list: 3,9
12: Cpus_allowed_list: 0,6
13: Cpus_allowed_list: 1,7
14: Cpus_allowed_list: 2,8
15: Cpus_allowed_list: 3,9
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 --cpu_bind=core -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 --cpu_bind=core -l cat /proc/self/status | grep
Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,6
01: Cpus_allowed_list: 0,6
02: Cpus_allowed_list: 0,6
03: Cpus_allowed_list: 0,6
04: Cpus_allowed_list: 1,7
05: Cpus_allowed_list: 1,7
06: Cpus_allowed_list: 1,7
07: Cpus_allowed_list: 1,7
08: Cpus_allowed_list: 2,8
09: Cpus_allowed_list: 2,8
10: Cpus_allowed_list: 2,8
11: Cpus_allowed_list: 2,8
12: Cpus_allowed_list: 3,9
13: Cpus_allowed_list: 3,9
14: Cpus_allowed_list: 3,9
15: Cpus_allowed_list: 3,9
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 --cpu_bind=core -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 --cpu_bind=core -l cat /proc/self/status | grep
Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,6
01: Cpus_allowed_list: 1,7
02: Cpus_allowed_list: 2,8
03: Cpus_allowed_list: 3,9
04: Cpus_allowed_list: 0,6
05: Cpus_allowed_list: 1,7
06: Cpus_allowed_list: 2,8
07: Cpus_allowed_list: 3,9
08: Cpus_allowed_list: 0,6
09: Cpus_allowed_list: 1,7
10: Cpus_allowed_list: 2,8
11: Cpus_allowed_list: 3,9
12: Cpus_allowed_list: 0,6
13: Cpus_allowed_list: 1,7
14: Cpus_allowed_list: 2,8
15: Cpus_allowed_list: 3,9
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 --cpu_bind=core -l cat /proc/self/status | grep
Cpus_allowed_list | sort
00: Cpus_allowed_list: 0,6
01: Cpus_allowed_list: 0,6
02: Cpus_allowed_list: 0,6
03: Cpus_allowed_list: 0,6
04: Cpus_allowed_list: 1,7
05: Cpus_allowed_list: 1,7
06: Cpus_allowed_list: 1,7
07: Cpus_allowed_list: 1,7
08: Cpus_allowed_list: 2,8
09: Cpus_allowed_list: 2,8
10: Cpus_allowed_list: 2,8
11: Cpus_allowed_list: 2,8
12: Cpus_allowed_list: 3,9
13: Cpus_allowed_list: 3,9
14: Cpus_allowed_list: 3,9
15: Cpus_allowed_list: 3,9
On 10/20/11 23:51, [email protected] wrote:
Matteo,
When task affinity is configured but no binding unit is specified
(sockets, cores or threads), each task is bound to all cpus on the
node
that are allocated to the job/step. That is why the results show
each
task bound to all 8 allocated cpus. To bind each task to two cpus
(cores), per the option "-c 2", add the option "--cpu_bind=core" to
your
srun command.
Martin
Matteo Guglielmi<[email protected]>
Sent by: [email protected]
10/20/2011 04:11 AM
Please respond to
[email protected]
To
SLURM<[email protected]>
cc
Subject
[slurm-dev] Fat Nodes (48 cores) Job Allocation& Distribution (A
much
more complicated example)
Topology of AMD 6176 SE (likwid-topology -g):
*************************************************************
Graphical:
*************************************************************
Socket 0:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 |
|
7 | | 8 | | 9 | | 10 | | 11 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| |
64kB | | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| |
512kB | | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | | 5MB
| |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 1:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 12 | | 13 | | 14 | | 15 | | 16 | | 17 | | 18
|
|
19 | | 20 | | 21 | | 22 | | 23 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| |
64kB | | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| |
512kB | | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | | 5MB
| |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 2:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 24 | | 25 | | 26 | | 27 | | 28 | | 29 | | 30
|
|
31 | | 32 | | 33 | | 34 | | 35 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| |
64kB | | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| |
512kB | | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | | 5MB
| |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 3:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 36 | | 37 | | 38 | | 39 | | 40 | | 41 | | 42
|
|
43 | | 44 | | 45 | | 46 | | 47 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| |
64kB | | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| |
512kB | | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | | 5MB
| |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
### slurm.conf (2.2.7) ###
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/affinity
TopologyPlugin=topology/none
SchedulerType=sched/backfill
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4
ThreadsPerCore=1
RealMemory=127000 Weight=1 Feature=6176,foff
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=UNLIMITED PreemptMode=SUSPEND Shared=FORCE:1 State=UP
Default=NO
PartitionName=batch Nodes=foff[09-13] Priority=1 Default=YES
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
###########################
Now, what I'd need to run is a hybrid MPI/OpenMP.
I would like to obtain the following distribution:
4 nodes
4 tasks per node (one per socket)
each MPI task will use 2 cores (OpenMP part)
See "EXPECTED BEHAVIOR" which is what I would expect from slurm
allocation
& distribution steps.
Please comment as much as you can on this.
My final question is:
given such a fat-node topology, how do I submit (batch script) OpemMP
and
hybrid OpemMP/MPI codes
for best performance?
Apparently the thing is not that obvious.
Thanks a lot,
--matt
1st) BLOCK:BLOCK
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list
|
sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 6-7
02: Cpus_allowed_list: 3-4
03: Cpus_allowed_list: 8-9
04: Cpus_allowed_list: 0-1
05: Cpus_allowed_list: 6-7
06: Cpus_allowed_list: 3-4
07: Cpus_allowed_list: 8-9
08: Cpus_allowed_list: 0-1
09: Cpus_allowed_list: 6-7
10: Cpus_allowed_list: 3-4
11: Cpus_allowed_list: 8-9
12: Cpus_allowed_list: 0-1
13: Cpus_allowed_list: 6-7
14: Cpus_allowed_list: 3-4
15: Cpus_allowed_list: 8-9
2nd) BLOCK:CYCLIC
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list
|
sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 12-13
02: Cpus_allowed_list: 24-25
03: Cpus_allowed_list: 36-37
04: Cpus_allowed_list: 0-1
05: Cpus_allowed_list: 12-13
06: Cpus_allowed_list: 24-25
07: Cpus_allowed_list: 36-37
08: Cpus_allowed_list: 0-1
09: Cpus_allowed_list: 12-13
10: Cpus_allowed_list: 24-25
11: Cpus_allowed_list: 36-37
12: Cpus_allowed_list: 0-1
13: Cpus_allowed_list: 12-13
14: Cpus_allowed_list: 24-25
15: Cpus_allowed_list: 36-37
3rd) CYCLIC:BLOCK
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list
|
sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 0-1
02: Cpus_allowed_list: 0-1
03: Cpus_allowed_list: 0-1
04: Cpus_allowed_list: 6-7
05: Cpus_allowed_list: 6-7
06: Cpus_allowed_list: 6-7
07: Cpus_allowed_list: 6-7
08: Cpus_allowed_list: 2-3
09: Cpus_allowed_list: 2-3
10: Cpus_allowed_list: 2-3
11: Cpus_allowed_list: 2-3
12: Cpus_allowed_list: 8-9
13: Cpus_allowed_list: 8-9
14: Cpus_allowed_list: 8-9
15: Cpus_allowed_list: 8-9
4th) CYCLIC:CYCLIC
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list
|
sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 0-1
02: Cpus_allowed_list: 0-1
03: Cpus_allowed_list: 0-1
04: Cpus_allowed_list: 12-13
05: Cpus_allowed_list: 12-13
06: Cpus_allowed_list: 12-13
07: Cpus_allowed_list: 12-13
08: Cpus_allowed_list: 24-25
09: Cpus_allowed_list: 24-25
10: Cpus_allowed_list: 24-25
11: Cpus_allowed_list: 24-25
12: Cpus_allowed_list: 36-37
13: Cpus_allowed_list: 36-37
14: Cpus_allowed_list: 36-37
15: Cpus_allowed_list: 36-37