Topology of AMD 6176 SE (likwid-topology -g):

*************************************************************
Graphical:
*************************************************************
Socket 0:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |   0   | |   1   | |   2   | |   3   | |   4   | |   5   | |   6   | |   7   
| |   8   | |   9   | |   10  | |   11  | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB 
| |  64kB | |  64kB | |  64kB | |  64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB 
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
| |                           5MB                           | |                 
          5MB                           | |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 1:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |   12  | |   13  | |   14  | |   15  | |   16  | |   17  | |   18  | |   19  
| |   20  | |   21  | |   22  | |   23  | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB 
| |  64kB | |  64kB | |  64kB | |  64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB 
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
| |                           5MB                           | |                 
          5MB                           | |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 2:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |   24  | |   25  | |   26  | |   27  | |   28  | |   29  | |   30  | |   31  
| |   32  | |   33  | |   34  | |   35  | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB 
| |  64kB | |  64kB | |  64kB | |  64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB 
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
| |                           5MB                           | |                 
          5MB                           | |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+
Socket 3:
+-------------------------------------------------------------------------------------------------------------------------+
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |   36  | |   37  | |   38  | |   39  | |   40  | |   41  | |   42  | |   43  
| |   44  | |   45  | |   46  | |   47  | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB | |  64kB 
| |  64kB | |  64kB | |  64kB | |  64kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB 
| | 512kB | | 512kB | | 512kB | | 512kB | |
| +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ 
+-------+ +-------+ +-------+ +-------+ +-------+ |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
| |                           5MB                           | |                 
          5MB                           | |
| +---------------------------------------------------------+ 
+---------------------------------------------------------+ |
+-------------------------------------------------------------------------------------------------------------------------+

### slurm.conf (2.2.7) ###
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/affinity
TopologyPlugin=topology/none
SchedulerType=sched/backfill
PreemptMode=suspend,gang
PreemptType=preempt/partition_prio

NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1 
RealMemory=127000 Weight=1 Feature=6176,foff

PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED 
MaxTime=UNLIMITED PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO

PartitionName=batch Nodes=foff[09-13] Priority=1 Default=YES
PartitionName=foff2 Nodes=foff[09-13] Priority=1000
###########################

Now, what I'd need to run is a hybrid MPI/OpenMP.

I would like to obtain the following distribution:

4 nodes

4 tasks per node (one per socket)

each MPI task will use 2 cores (OpenMP part)



See "EXPECTED BEHAVIOR" which is what I would expect from slurm allocation & 
distribution steps.

Please comment as much as you can on this.


My final question is:

given such a fat-node topology, how do I submit (batch script) OpemMP and 
hybrid OpemMP/MPI codes
for best performance?


Apparently the thing is not that obvious.


Thanks a lot,

--matt

1st) BLOCK:BLOCK

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block 
--ntasks-per-node=4 -l hostname | sort

00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block 
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort

00: Cpus_allowed_list:  0-3,6-9
01: Cpus_allowed_list:  0-3,6-9
02: Cpus_allowed_list:  0-3,6-9
03: Cpus_allowed_list:  0-3,6-9
04: Cpus_allowed_list:  0-3,6-9
05: Cpus_allowed_list:  0-3,6-9
06: Cpus_allowed_list:  0-3,6-9
07: Cpus_allowed_list:  0-3,6-9
08: Cpus_allowed_list:  0-3,6-9
09: Cpus_allowed_list:  0-3,6-9
10: Cpus_allowed_list:  0-3,6-9
11: Cpus_allowed_list:  0-3,6-9
12: Cpus_allowed_list:  0-3,6-9
13: Cpus_allowed_list:  0-3,6-9
14: Cpus_allowed_list:  0-3,6-9
15: Cpus_allowed_list:  0-3,6-9

EXPECTED BEHAVIOR (or something very similar)

00: Cpus_allowed_list:  0-1
01: Cpus_allowed_list:  6-7
02: Cpus_allowed_list:  3-4
03: Cpus_allowed_list:  8-9
04: Cpus_allowed_list:  0-1
05: Cpus_allowed_list:  6-7
06: Cpus_allowed_list:  3-4
07: Cpus_allowed_list:  8-9
08: Cpus_allowed_list:  0-1
09: Cpus_allowed_list:  6-7
10: Cpus_allowed_list:  3-4
11: Cpus_allowed_list:  8-9
12: Cpus_allowed_list:  0-1
13: Cpus_allowed_list:  6-7
14: Cpus_allowed_list:  3-4
15: Cpus_allowed_list:  8-9

2nd) BLOCK:CYCLIC

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic 
--ntasks-per-node=4 -l hostname | sort

00: foff09
01: foff09
02: foff09
03: foff09
04: foff10
05: foff10
06: foff10
07: foff10
08: foff11
09: foff11
10: foff11
11: foff11
12: foff12
13: foff12
14: foff12
15: foff12

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic 
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort

00: Cpus_allowed_list:  0-3,6-9
01: Cpus_allowed_list:  0-3,6-9
02: Cpus_allowed_list:  0-3,6-9
03: Cpus_allowed_list:  0-3,6-9
04: Cpus_allowed_list:  0-3,6-9
05: Cpus_allowed_list:  0-3,6-9
06: Cpus_allowed_list:  0-3,6-9
07: Cpus_allowed_list:  0-3,6-9
08: Cpus_allowed_list:  0-3,6-9
09: Cpus_allowed_list:  0-3,6-9
10: Cpus_allowed_list:  0-3,6-9
11: Cpus_allowed_list:  0-3,6-9
12: Cpus_allowed_list:  0-3,6-9
13: Cpus_allowed_list:  0-3,6-9
14: Cpus_allowed_list:  0-3,6-9
15: Cpus_allowed_list:  0-3,6-9

EXPECTED BEHAVIOR (or something very similar)

00: Cpus_allowed_list:  0-1
01: Cpus_allowed_list:  12-13
02: Cpus_allowed_list:  24-25
03: Cpus_allowed_list:  36-37
04: Cpus_allowed_list:  0-1
05: Cpus_allowed_list:  12-13
06: Cpus_allowed_list:  24-25
07: Cpus_allowed_list:  36-37
08: Cpus_allowed_list:  0-1
09: Cpus_allowed_list:  12-13
10: Cpus_allowed_list:  24-25
11: Cpus_allowed_list:  36-37
12: Cpus_allowed_list:  0-1
13: Cpus_allowed_list:  12-13
14: Cpus_allowed_list:  24-25
15: Cpus_allowed_list:  36-37

3rd) CYCLIC:BLOCK

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block 
--ntasks-per-node=4 -l hostname | sort

00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block 
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort

00: Cpus_allowed_list:  0-3,6-9
01: Cpus_allowed_list:  0-3,6-9
02: Cpus_allowed_list:  0-3,6-9
03: Cpus_allowed_list:  0-3,6-9
04: Cpus_allowed_list:  0-3,6-9
05: Cpus_allowed_list:  0-3,6-9
06: Cpus_allowed_list:  0-3,6-9
07: Cpus_allowed_list:  0-3,6-9
08: Cpus_allowed_list:  0-3,6-9
09: Cpus_allowed_list:  0-3,6-9
10: Cpus_allowed_list:  0-3,6-9
11: Cpus_allowed_list:  0-3,6-9
12: Cpus_allowed_list:  0-3,6-9
13: Cpus_allowed_list:  0-3,6-9
14: Cpus_allowed_list:  0-3,6-9
15: Cpus_allowed_list:  0-3,6-9

EXPECTED BEHAVIOR (or something very similar)

00: Cpus_allowed_list:  0-1
01: Cpus_allowed_list:  0-1
02: Cpus_allowed_list:  0-1
03: Cpus_allowed_list:  0-1
04: Cpus_allowed_list:  6-7
05: Cpus_allowed_list:  6-7
06: Cpus_allowed_list:  6-7
07: Cpus_allowed_list:  6-7
08: Cpus_allowed_list:  2-3
09: Cpus_allowed_list:  2-3
10: Cpus_allowed_list:  2-3
11: Cpus_allowed_list:  2-3
12: Cpus_allowed_list:  8-9
13: Cpus_allowed_list:  8-9
14: Cpus_allowed_list:  8-9
15: Cpus_allowed_list:  8-9

4th) CYCLIC:CYCLIC

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic 
--ntasks-per-node=4 -l hostname | sort

00: foff09
01: foff10
02: foff11
03: foff12
04: foff09
05: foff10
06: foff11
07: foff12
08: foff09
09: foff10
10: foff11
11: foff12
12: foff09
13: foff10
14: foff11
15: foff12

srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic 
--ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort

00: Cpus_allowed_list:  0-3,6-9
01: Cpus_allowed_list:  0-3,6-9
02: Cpus_allowed_list:  0-3,6-9
03: Cpus_allowed_list:  0-3,6-9
04: Cpus_allowed_list:  0-3,6-9
05: Cpus_allowed_list:  0-3,6-9
06: Cpus_allowed_list:  0-3,6-9
07: Cpus_allowed_list:  0-3,6-9
08: Cpus_allowed_list:  0-3,6-9
09: Cpus_allowed_list:  0-3,6-9
10: Cpus_allowed_list:  0-3,6-9
11: Cpus_allowed_list:  0-3,6-9
12: Cpus_allowed_list:  0-3,6-9
13: Cpus_allowed_list:  0-3,6-9
14: Cpus_allowed_list:  0-3,6-9
15: Cpus_allowed_list:  0-3,6-9

EXPECTED BEHAVIOR (or something very similar)

00: Cpus_allowed_list:  0-1
01: Cpus_allowed_list:  0-1
02: Cpus_allowed_list:  0-1
03: Cpus_allowed_list:  0-1
04: Cpus_allowed_list:  12-13
05: Cpus_allowed_list:  12-13
06: Cpus_allowed_list:  12-13
07: Cpus_allowed_list:  12-13
08: Cpus_allowed_list:  24-25
09: Cpus_allowed_list:  24-25
10: Cpus_allowed_list:  24-25
11: Cpus_allowed_list:  24-25
12: Cpus_allowed_list:  36-37
13: Cpus_allowed_list:  36-37
14: Cpus_allowed_list:  36-37
15: Cpus_allowed_list:  36-37

Reply via email to