Matteo, When task affinity is configured but no binding unit is specified (sockets, cores or threads), each task is bound to all cpus on the node that are allocated to the job/step. That is why the results show each task bound to all 8 allocated cpus. To bind each task to two cpus (cores), per the option "-c 2", add the option "--cpu_bind=core" to your srun command. Martin
Matteo Guglielmi <[email protected]> Sent by: [email protected] 10/20/2011 04:11 AM Please respond to [email protected] To SLURM <[email protected]> cc Subject [slurm-dev] Fat Nodes (48 cores) Job Allocation & Distribution (A much more complicated example) Topology of AMD 6176 SE (likwid-topology -g): ************************************************************* Graphical: ************************************************************* Socket 0: +-------------------------------------------------------------------------------------------------------------------------+ | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +---------------------------------------------------------+ +---------------------------------------------------------+ | | | 5MB | | 5MB | | | +---------------------------------------------------------+ +---------------------------------------------------------+ | +-------------------------------------------------------------------------------------------------------------------------+ Socket 1: +-------------------------------------------------------------------------------------------------------------------------+ | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 12 | | 13 | | 14 | | 15 | | 16 | | 17 | | 18 | | 19 | | 20 | | 21 | | 22 | | 23 | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +---------------------------------------------------------+ +---------------------------------------------------------+ | | | 5MB | | 5MB | | | +---------------------------------------------------------+ +---------------------------------------------------------+ | +-------------------------------------------------------------------------------------------------------------------------+ Socket 2: +-------------------------------------------------------------------------------------------------------------------------+ | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 24 | | 25 | | 26 | | 27 | | 28 | | 29 | | 30 | | 31 | | 32 | | 33 | | 34 | | 35 | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +---------------------------------------------------------+ +---------------------------------------------------------+ | | | 5MB | | 5MB | | | +---------------------------------------------------------+ +---------------------------------------------------------+ | +-------------------------------------------------------------------------------------------------------------------------+ Socket 3: +-------------------------------------------------------------------------------------------------------------------------+ | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 36 | | 37 | | 38 | | 39 | | 40 | | 41 | | 42 | | 43 | | 44 | | 45 | | 46 | | 47 | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | 64kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | 512kB | | | +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ | | +---------------------------------------------------------+ +---------------------------------------------------------+ | | | 5MB | | 5MB | | | +---------------------------------------------------------+ +---------------------------------------------------------+ | +-------------------------------------------------------------------------------------------------------------------------+ ### slurm.conf (2.2.7) ### SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory TaskPlugin=task/affinity TopologyPlugin=topology/none SchedulerType=sched/backfill PreemptMode=suspend,gang PreemptType=preempt/partition_prio NodeName=foff[09-13] Procs=48 CoresPerSocket=12 Sockets=4 ThreadsPerCore=1 RealMemory=127000 Weight=1 Feature=6176,foff PartitionName=DEFAULT DefaultTime=60 MinNodes=1 MaxNodes=UNLIMITED MaxTime=UNLIMITED PreemptMode=SUSPEND Shared=FORCE:1 State=UP Default=NO PartitionName=batch Nodes=foff[09-13] Priority=1 Default=YES PartitionName=foff2 Nodes=foff[09-13] Priority=1000 ########################### Now, what I'd need to run is a hybrid MPI/OpenMP. I would like to obtain the following distribution: 4 nodes 4 tasks per node (one per socket) each MPI task will use 2 cores (OpenMP part) See "EXPECTED BEHAVIOR" which is what I would expect from slurm allocation & distribution steps. Please comment as much as you can on this. My final question is: given such a fat-node topology, how do I submit (batch script) OpemMP and hybrid OpemMP/MPI codes for best performance? Apparently the thing is not that obvious. Thanks a lot, --matt 1st) BLOCK:BLOCK srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block --ntasks-per-node=4 -l hostname | sort 00: foff09 01: foff09 02: foff09 03: foff09 04: foff10 05: foff10 06: foff10 07: foff10 08: foff11 09: foff11 10: foff11 11: foff11 12: foff12 13: foff12 14: foff12 15: foff12 srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:block --ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort 00: Cpus_allowed_list: 0-3,6-9 01: Cpus_allowed_list: 0-3,6-9 02: Cpus_allowed_list: 0-3,6-9 03: Cpus_allowed_list: 0-3,6-9 04: Cpus_allowed_list: 0-3,6-9 05: Cpus_allowed_list: 0-3,6-9 06: Cpus_allowed_list: 0-3,6-9 07: Cpus_allowed_list: 0-3,6-9 08: Cpus_allowed_list: 0-3,6-9 09: Cpus_allowed_list: 0-3,6-9 10: Cpus_allowed_list: 0-3,6-9 11: Cpus_allowed_list: 0-3,6-9 12: Cpus_allowed_list: 0-3,6-9 13: Cpus_allowed_list: 0-3,6-9 14: Cpus_allowed_list: 0-3,6-9 15: Cpus_allowed_list: 0-3,6-9 EXPECTED BEHAVIOR (or something very similar) 00: Cpus_allowed_list: 0-1 01: Cpus_allowed_list: 6-7 02: Cpus_allowed_list: 3-4 03: Cpus_allowed_list: 8-9 04: Cpus_allowed_list: 0-1 05: Cpus_allowed_list: 6-7 06: Cpus_allowed_list: 3-4 07: Cpus_allowed_list: 8-9 08: Cpus_allowed_list: 0-1 09: Cpus_allowed_list: 6-7 10: Cpus_allowed_list: 3-4 11: Cpus_allowed_list: 8-9 12: Cpus_allowed_list: 0-1 13: Cpus_allowed_list: 6-7 14: Cpus_allowed_list: 3-4 15: Cpus_allowed_list: 8-9 2nd) BLOCK:CYCLIC srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic --ntasks-per-node=4 -l hostname | sort 00: foff09 01: foff09 02: foff09 03: foff09 04: foff10 05: foff10 06: foff10 07: foff10 08: foff11 09: foff11 10: foff11 11: foff11 12: foff12 13: foff12 14: foff12 15: foff12 srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=block:cyclic --ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort 00: Cpus_allowed_list: 0-3,6-9 01: Cpus_allowed_list: 0-3,6-9 02: Cpus_allowed_list: 0-3,6-9 03: Cpus_allowed_list: 0-3,6-9 04: Cpus_allowed_list: 0-3,6-9 05: Cpus_allowed_list: 0-3,6-9 06: Cpus_allowed_list: 0-3,6-9 07: Cpus_allowed_list: 0-3,6-9 08: Cpus_allowed_list: 0-3,6-9 09: Cpus_allowed_list: 0-3,6-9 10: Cpus_allowed_list: 0-3,6-9 11: Cpus_allowed_list: 0-3,6-9 12: Cpus_allowed_list: 0-3,6-9 13: Cpus_allowed_list: 0-3,6-9 14: Cpus_allowed_list: 0-3,6-9 15: Cpus_allowed_list: 0-3,6-9 EXPECTED BEHAVIOR (or something very similar) 00: Cpus_allowed_list: 0-1 01: Cpus_allowed_list: 12-13 02: Cpus_allowed_list: 24-25 03: Cpus_allowed_list: 36-37 04: Cpus_allowed_list: 0-1 05: Cpus_allowed_list: 12-13 06: Cpus_allowed_list: 24-25 07: Cpus_allowed_list: 36-37 08: Cpus_allowed_list: 0-1 09: Cpus_allowed_list: 12-13 10: Cpus_allowed_list: 24-25 11: Cpus_allowed_list: 36-37 12: Cpus_allowed_list: 0-1 13: Cpus_allowed_list: 12-13 14: Cpus_allowed_list: 24-25 15: Cpus_allowed_list: 36-37 3rd) CYCLIC:BLOCK srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block --ntasks-per-node=4 -l hostname | sort 00: foff09 01: foff10 02: foff11 03: foff12 04: foff09 05: foff10 06: foff11 07: foff12 08: foff09 09: foff10 10: foff11 11: foff12 12: foff09 13: foff10 14: foff11 15: foff12 srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:block --ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort 00: Cpus_allowed_list: 0-3,6-9 01: Cpus_allowed_list: 0-3,6-9 02: Cpus_allowed_list: 0-3,6-9 03: Cpus_allowed_list: 0-3,6-9 04: Cpus_allowed_list: 0-3,6-9 05: Cpus_allowed_list: 0-3,6-9 06: Cpus_allowed_list: 0-3,6-9 07: Cpus_allowed_list: 0-3,6-9 08: Cpus_allowed_list: 0-3,6-9 09: Cpus_allowed_list: 0-3,6-9 10: Cpus_allowed_list: 0-3,6-9 11: Cpus_allowed_list: 0-3,6-9 12: Cpus_allowed_list: 0-3,6-9 13: Cpus_allowed_list: 0-3,6-9 14: Cpus_allowed_list: 0-3,6-9 15: Cpus_allowed_list: 0-3,6-9 EXPECTED BEHAVIOR (or something very similar) 00: Cpus_allowed_list: 0-1 01: Cpus_allowed_list: 0-1 02: Cpus_allowed_list: 0-1 03: Cpus_allowed_list: 0-1 04: Cpus_allowed_list: 6-7 05: Cpus_allowed_list: 6-7 06: Cpus_allowed_list: 6-7 07: Cpus_allowed_list: 6-7 08: Cpus_allowed_list: 2-3 09: Cpus_allowed_list: 2-3 10: Cpus_allowed_list: 2-3 11: Cpus_allowed_list: 2-3 12: Cpus_allowed_list: 8-9 13: Cpus_allowed_list: 8-9 14: Cpus_allowed_list: 8-9 15: Cpus_allowed_list: 8-9 4th) CYCLIC:CYCLIC srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic --ntasks-per-node=4 -l hostname | sort 00: foff09 01: foff10 02: foff11 03: foff12 04: foff09 05: foff10 06: foff11 07: foff12 08: foff09 09: foff10 10: foff11 11: foff12 12: foff09 13: foff10 14: foff11 15: foff12 srun -A gr-fo -p foff2 -n 16 -c 2 -N 4 --distribution=cyclic:cyclic --ntasks-per-node=4 -l cat /proc/self/status | grep Cpus_allowed_list | sort 00: Cpus_allowed_list: 0-3,6-9 01: Cpus_allowed_list: 0-3,6-9 02: Cpus_allowed_list: 0-3,6-9 03: Cpus_allowed_list: 0-3,6-9 04: Cpus_allowed_list: 0-3,6-9 05: Cpus_allowed_list: 0-3,6-9 06: Cpus_allowed_list: 0-3,6-9 07: Cpus_allowed_list: 0-3,6-9 08: Cpus_allowed_list: 0-3,6-9 09: Cpus_allowed_list: 0-3,6-9 10: Cpus_allowed_list: 0-3,6-9 11: Cpus_allowed_list: 0-3,6-9 12: Cpus_allowed_list: 0-3,6-9 13: Cpus_allowed_list: 0-3,6-9 14: Cpus_allowed_list: 0-3,6-9 15: Cpus_allowed_list: 0-3,6-9 EXPECTED BEHAVIOR (or something very similar) 00: Cpus_allowed_list: 0-1 01: Cpus_allowed_list: 0-1 02: Cpus_allowed_list: 0-1 03: Cpus_allowed_list: 0-1 04: Cpus_allowed_list: 12-13 05: Cpus_allowed_list: 12-13 06: Cpus_allowed_list: 12-13 07: Cpus_allowed_list: 12-13 08: Cpus_allowed_list: 24-25 09: Cpus_allowed_list: 24-25 10: Cpus_allowed_list: 24-25 11: Cpus_allowed_list: 24-25 12: Cpus_allowed_list: 36-37 13: Cpus_allowed_list: 36-37 14: Cpus_allowed_list: 36-37 15: Cpus_allowed_list: 36-37
