Topology of AMD 6176 SE (likwid-topology -g):
*************************************************************
Graphical:
*************************************************************
Socket 0:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7
| | 8 | | 9 | | 10 | | 11 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | |
5MB | |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 1:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 12 | | 13 | | 14 | | 15 | | 16 | | 17 | | 18 | | 19
| | 20 | | 21 | | 22 | | 23 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | |
5MB | |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 2:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 24 | | 25 | | 26 | | 27 | | 28 | | 29 | | 30 | | 31
| | 32 | | 33 | | 34 | | 35 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | |
5MB | |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 3:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 36 | | 37 | | 38 | | 39 | | 40 | | 41 | | 42 | | 43
| | 44 | | 45 | | 46 | | 47 | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB
| | 64kB | | 64kB | | 64kB | | 64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
| | 5MB | |
5MB | |
| +---------------------------------------------------------+
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
### slurm.conf (2.2.7) ###
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/affinity
TopologyPlugin=topology/none
SchedulerType=sched/backfill
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio
NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1
RealMemory=127000 Weight=1 Feature=6176,foff
PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED
MaxTime=UNLIMITED PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO
PartitionName=batch Nodes=foff[09-13] Priority=1 Default=YES
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
###########################
Now, what I'd need to run is a hybrid MPI/OpenMP.
I would like to obtain the following distribution:
4 nodes
4 tasks per node (one per socket)
each MPI task will use 2 cores (OpenMP part)
See "EXPECTED BEHAVIOR" which is what I would expect from slurm allocation &
distribution steps.
Please comment as much as you can on this.
My final question is:
given such a fat-node topology, how do I submit (batch script) OpemMP and
hybrid OpemMP/MPI codes
for best performance?
Apparently the thing is not that obvious.
Thanks a lot,
--matt
1st) BLOCK:BLOCK
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 6-7
02: Cpus_allowed_list: 3-4
03: Cpus_allowed_list: 8-9
04: Cpus_allowed_list: 0-1
05: Cpus_allowed_list: 6-7
06: Cpus_allowed_list: 3-4
07: Cpus_allowed_list: 8-9
08: Cpus_allowed_list: 0-1
09: Cpus_allowed_list: 6-7
10: Cpus_allowed_list: 3-4
11: Cpus_allowed_list: 8-9
12: Cpus_allowed_list: 0-1
13: Cpus_allowed_list: 6-7
14: Cpus_allowed_list: 3-4
15: Cpus_allowed_list: 8-9
2nd) BLOCK:CYCLIC
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 12-13
02: Cpus_allowed_list: 24-25
03: Cpus_allowed_list: 36-37
04: Cpus_allowed_list: 0-1
05: Cpus_allowed_list: 12-13
06: Cpus_allowed_list: 24-25
07: Cpus_allowed_list: 36-37
08: Cpus_allowed_list: 0-1
09: Cpus_allowed_list: 12-13
10: Cpus_allowed_list: 24-25
11: Cpus_allowed_list: 36-37
12: Cpus_allowed_list: 0-1
13: Cpus_allowed_list: 12-13
14: Cpus_allowed_list: 24-25
15: Cpus_allowed_list: 36-37
3rd) CYCLIC:BLOCK
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 0-1
02: Cpus_allowed_list: 0-1
03: Cpus_allowed_list: 0-1
04: Cpus_allowed_list: 6-7
05: Cpus_allowed_list: 6-7
06: Cpus_allowed_list: 6-7
07: Cpus_allowed_list: 6-7
08: Cpus_allowed_list: 2-3
09: Cpus_allowed_list: 2-3
10: Cpus_allowed_list: 2-3
11: Cpus_allowed_list: 2-3
12: Cpus_allowed_list: 8-9
13: Cpus_allowed_list: 8-9
14: Cpus_allowed_list: 8-9
15: Cpus_allowed_list: 8-9
4th) CYCLIC:CYCLIC
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 -l hostname | sort
00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12
srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort
00: Cpus_allowed_list: 0-3,6-9
01: Cpus_allowed_list: 0-3,6-9
02: Cpus_allowed_list: 0-3,6-9
03: Cpus_allowed_list: 0-3,6-9
04: Cpus_allowed_list: 0-3,6-9
05: Cpus_allowed_list: 0-3,6-9
06: Cpus_allowed_list: 0-3,6-9
07: Cpus_allowed_list: 0-3,6-9
08: Cpus_allowed_list: 0-3,6-9
09: Cpus_allowed_list: 0-3,6-9
10: Cpus_allowed_list: 0-3,6-9
11: Cpus_allowed_list: 0-3,6-9
12: Cpus_allowed_list: 0-3,6-9
13: Cpus_allowed_list: 0-3,6-9
14: Cpus_allowed_list: 0-3,6-9
15: Cpus_allowed_list: 0-3,6-9
EXPECTED BEHAVIOR (or something very similar)
00: Cpus_allowed_list: 0-1
01: Cpus_allowed_list: 0-1
02: Cpus_allowed_list: 0-1
03: Cpus_allowed_list: 0-1
04: Cpus_allowed_list: 12-13
05: Cpus_allowed_list: 12-13
06: Cpus_allowed_list: 12-13
07: Cpus_allowed_list: 12-13
08: Cpus_allowed_list: 24-25
09: Cpus_allowed_list: 24-25
10: Cpus_allowed_list: 24-25
11: Cpus_allowed_list: 24-25
12: Cpus_allowed_list: 36-37
13: Cpus_allowed_list: 36-37
14: Cpus_allowed_list: 36-37
15: Cpus_allowed_list: 36-37