Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

Ralph Castain Wed, 6 Jan 2016 15:51:26 -0500 (EST)

Hmmm…let me see if I can remember :-)

Procs-per-object is what it does, of course, but I honestly forget what that 
last “r” stands for!


So what your command line is telling us is:

map 2 processes on each socket, binding each process to 7 cpu’s (“pe” = 
processing element)

In this case, we have defaulted to using cores as cpu’s (or “pe’s”). You could 
also set —use-hwthread-cpus and then “pe” would indicate the number of HTs to 
use for each process.

For a hybrid app, you probably want to ensure you have at least one pe for each 
thread you intend to spawn - but that is something you can experiment with to 
find the best option. I had hoped to someday make that a little easier by 
integrating OpenMP with OMPI a little better, but that ran into some 
controversy and so I abandoned it.

HTH
Ralph

> On Jan 6, 2016, at 12:33 PM, Matt Thompson <fort...@gmail.com> wrote:
> 
> A ha! The Gurus know all. The map-by was the magic sauce:
> 
> (1176) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by 
> ppr:2:socket:pe=7 ./hello-hybrid.x | sort -g -k 18
> Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0
> Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1
> Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2
> Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3
> Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4
> Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5
> Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6
> Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 7
> Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 8
> Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 9
> Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 10
> Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 11
> Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 12
> Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 13
> Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14
> Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15
> Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16
> Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17
> Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18
> Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19
> Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20
> Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 21
> Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 22
> Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 23
> Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 24
> Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 25
> Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 26
> Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 27
> 
> So, a question: what does "ppr" mean? The man page seems to accept it as an 
> axiom of Open MPI:
> 
>        --map-by <foo>
>               Map  to  the specified object, defaults to socket. Supported 
> options include slot, hwthread, core, L1cache, L2cache, L3cache, socket, numa,
>               board, node, sequential, distance, and ppr. Any object can 
> include modifiers by adding a : and any combination of PE=n (bind  n  
> processing
>               elements  to  each  proc), SPAN (load balance the processes 
> across the allocation), OVERSUBSCRIBE (allow more processes on a node than 
> pro‐
>               cessing elements), and NOOVERSUBSCRIBE.  This includes PPR, 
> where the pattern would be terminated by another colon to separate it from  
> the
>               modifiers.
> 
> Is it an acronym/initialism? From some experimenting it seems to be 
> ppr:2:socket means 2 processes per socket? And pe=7 means leave 7 processes 
> between them? Is that about right?
> 
> Matt
> 
> On Wed, Jan 6, 2016 at 3:19 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> I believe he wants two procs/socket, so you’d need ppr:2:socket:pe=7
> 
> 
>> On Jan 6, 2016, at 12:14 PM, Nick Papior <nickpap...@gmail.com 
>> <mailto:nickpap...@gmail.com>> wrote:
>> 
>> I do not think KMP_AFFINITY should affect anything in OpenMPI, it is an MKL 
>> env setting? Or am I wrong?
>> 
>> Note that these are used in an environment where openmpi automatically gets 
>> the host-file. Hence they are not present.
>> With intel mkl and openmpi I got the best performance using these, rather 
>> long flags:
>> 
>> export KMP_AFFINITY=verbose,compact,granularity=core
>> export KMP_STACKSIZE=62M
>> export KMP_SETTINGS=1
>> 
>> def_flags="--bind-to core -x OMP_PROC_BIND=true --report-bindings"
>> def_flags="$def_flags -x KMP_AFFINITY=$KMP_AFFINITY"
>> 
>> # in your case 7:
>> ONP=7
>> flags="$def_flags -x MKL_NUM_THREADS=$ONP -x MKL_DYNAMIC=FALSE"
>> flags="$flags -x OMP_NUM_THREADS=$ONP -x OMP_DYNAMIC=FALSE"
>> flags="$flags -x KMP_STACKSIZE=$KMP_STACKSIZE"
>> flags="$flags --map-by ppr:1:socket:pe=7"
>> 
>> then run your program:
>> 
>> mpirun $flags <app> 
>> 
>> A lot of the option flags are duplicated (and strictly not needed), but I 
>> provide them for easy testing changes.
>> Surely this is application dependent, but for my case it was performing 
>> really well. 
>> 
>> 
>> 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com 
>> <mailto:schnet...@gmail.com>>:
>> Setting KMP_AFFINITY will probably override anything that OpenMPI
>> sets. Can you try without?
>> 
>> -erik
>> 
>> On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com 
>> <mailto:fort...@gmail.com>> wrote:
>> > Hello Open MPI Gurus,
>> >
>> > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to do
>> > things to get the same behavior in various stacks. For example, I have a
>> > 28-core node (2 14-core Haswells), and I'd like to run 4 MPI processes and 
>> > 7
>> > OpenMP threads. Thus, I'd like the processes to be 2 processes per socket
>> > with the OpenMP threads laid out on them. Using a "hybrid Hello World"
>> > program, I can achieve this with Intel MPI (after a lot of testing):
>> >
>> > (1097) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4
>> > ./hello-hybrid.x | sort -g -k 18
>> > srun.slurm: cluster configuration lacks support for cpu binding
>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 0
>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 1
>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 2
>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 3
>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 4
>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 5
>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 6
>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 7
>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 8
>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 9
>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 10
>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 11
>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 12
>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 13
>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 14
>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 15
>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 16
>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 17
>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 18
>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 19
>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 20
>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 21
>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 22
>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 23
>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 24
>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 25
>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 26
>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 27
>> >
>> > Other than the odd fact that Process #0 seemed to start on Socket #1 (this
>> > might be an artifact of how I'm trying to detect the CPU I'm on), this 
>> > looks
>> > reasonable. 14 threads on each socket and each process is laying out its
>> > threads in a nice orderly fashion.
>> >
>> > I'm trying to figure out how to do this with Open MPI (version 1.10.0) and
>> > apparently I am just not quite good enough to figure it out. The closest
>> > I've gotten is:
>> >
>> > (1155) $ env OMP_NUM_THREADS=7 KMP_AFFINITY=compact mpirun -np 4 -map-by
>> > ppr:2:socket ./hello-hybrid.x | sort -g -k 18
>> > Hello from thread 0 out of 7 from process 0 out of 4 on borgo035 on CPU 0
>> > Hello from thread 0 out of 7 from process 1 out of 4 on borgo035 on CPU 0
>> > Hello from thread 1 out of 7 from process 0 out of 4 on borgo035 on CPU 1
>> > Hello from thread 1 out of 7 from process 1 out of 4 on borgo035 on CPU 1
>> > Hello from thread 2 out of 7 from process 0 out of 4 on borgo035 on CPU 2
>> > Hello from thread 2 out of 7 from process 1 out of 4 on borgo035 on CPU 2
>> > Hello from thread 3 out of 7 from process 0 out of 4 on borgo035 on CPU 3
>> > Hello from thread 3 out of 7 from process 1 out of 4 on borgo035 on CPU 3
>> > Hello from thread 4 out of 7 from process 0 out of 4 on borgo035 on CPU 4
>> > Hello from thread 4 out of 7 from process 1 out of 4 on borgo035 on CPU 4
>> > Hello from thread 5 out of 7 from process 0 out of 4 on borgo035 on CPU 5
>> > Hello from thread 5 out of 7 from process 1 out of 4 on borgo035 on CPU 5
>> > Hello from thread 6 out of 7 from process 0 out of 4 on borgo035 on CPU 6
>> > Hello from thread 6 out of 7 from process 1 out of 4 on borgo035 on CPU 6
>> > Hello from thread 0 out of 7 from process 2 out of 4 on borgo035 on CPU 14
>> > Hello from thread 0 out of 7 from process 3 out of 4 on borgo035 on CPU 14
>> > Hello from thread 1 out of 7 from process 2 out of 4 on borgo035 on CPU 15
>> > Hello from thread 1 out of 7 from process 3 out of 4 on borgo035 on CPU 15
>> > Hello from thread 2 out of 7 from process 2 out of 4 on borgo035 on CPU 16
>> > Hello from thread 2 out of 7 from process 3 out of 4 on borgo035 on CPU 16
>> > Hello from thread 3 out of 7 from process 2 out of 4 on borgo035 on CPU 17
>> > Hello from thread 3 out of 7 from process 3 out of 4 on borgo035 on CPU 17
>> > Hello from thread 4 out of 7 from process 2 out of 4 on borgo035 on CPU 18
>> > Hello from thread 4 out of 7 from process 3 out of 4 on borgo035 on CPU 18
>> > Hello from thread 5 out of 7 from process 2 out of 4 on borgo035 on CPU 19
>> > Hello from thread 5 out of 7 from process 3 out of 4 on borgo035 on CPU 19
>> > Hello from thread 6 out of 7 from process 2 out of 4 on borgo035 on CPU 20
>> > Hello from thread 6 out of 7 from process 3 out of 4 on borgo035 on CPU 20
>> >
>> > Obviously not right. Any ideas on how to help me learn? The man mpirun page
>> > is a bit formidable in the pinning part, so maybe I've missed an obvious
>> > answer.
>> >
>> > Matt
>> > --
>> > Matt Thompson
>> >
>> > Man Among Men
>> > Fulcrum of History
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post:
>> > http://www.open-mpi.org/community/lists/users/2016/01/28217.php 
>> > <http://www.open-mpi.org/community/lists/users/2016/01/28217.php>
>> 
>> 
>> 
>> --
>> Erik Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>
>> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>> <http://www.perimeterinstitute.ca/personal/eschnetter/>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/01/28218.php 
>> <http://www.open-mpi.org/community/lists/users/2016/01/28218.php>
>> 
>> 
>> 
>> -- 
>> Kind regards Nick
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/01/28219.php 
>> <http://www.open-mpi.org/community/lists/users/2016/01/28219.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/01/28221.php 
> <http://www.open-mpi.org/community/lists/users/2016/01/28221.php>
> 
> 
> 
> -- 
> Matt Thompson
> Man Among Men
> Fulcrum of History
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/01/28223.php

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

Reply via email to