Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI 
BoF meeting at SC’16, for those who can attend


> On Oct 11, 2016, at 8:16 AM, Dave Love  wrote:
> 
> Wirawan Purwanto  writes:
> 
>> Instead of the scenario above, I was trying to get the MPI processes
>> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
>> node 0 first, then fill node 1, and so on. How do I do this properly?
>> 
>> I tried a few attempts that fail:
>> 
>> $ export OMP_NUM_THREADS=2
>> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE
> 
> ...
> 
>> Clearly I am not understanding how this map-by works. Could somebody
>> help me? There was a wiki article partially written:
>> 
>> https://github.com/open-mpi/ompi/wiki/ProcessPlacement
>> 
>> but unfortunately it is also not clear to me.
> 
> Me neither; this stuff has traditionally been quite unclear and really
> needs documenting/explaining properly.
> 
> This sort of thing from my local instructions for OMPI 1.8 probably does
> what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me
> a couple of small nodes):
> 
>  $ qrsh -pe mpi 24 -l num_proc=12 \
> mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |&
> sort -k 4 -n
>  [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
>  [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
> 
> I don't remember how I found that out.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-11 Thread Dave Love
Wirawan Purwanto  writes:

> Instead of the scenario above, I was trying to get the MPI processes
> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
> node 0 first, then fill node 1, and so on. How do I do this properly?
>
> I tried a few attempts that fail:
>
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE

...

> Clearly I am not understanding how this map-by works. Could somebody
> help me? There was a wiki article partially written:
>
> https://github.com/open-mpi/ompi/wiki/ProcessPlacement
>
> but unfortunately it is also not clear to me.

Me neither; this stuff has traditionally been quite unclear and really
needs documenting/explaining properly.

This sort of thing from my local instructions for OMPI 1.8 probably does
what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me
a couple of small nodes):

  $ qrsh -pe mpi 24 -l num_proc=12 \
 mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |&
 sort -k 4 -n
  [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
1[hwt 0]]: [B/B/./././.][./././././.]
  [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 
3[hwt 0]]: [././B/B/./.][./././././.]
  [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 
5[hwt 0]]: [././././B/B][./././././.]
  [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
7[hwt 0]]: [./././././.][B/B/./././.]
  [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 
9[hwt 0]]: [./././././.][././B/B/./.]
  [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 
11[hwt 0]]: [./././././.][././././B/B]
  [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 
1[hwt 0]]: [B/B/./././.][./././././.]
  [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 
3[hwt 0]]: [././B/B/./.][./././././.]
  [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 
5[hwt 0]]: [././././B/B][./././././.]
  [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 
7[hwt 0]]: [./././././.][B/B/./././.]
  [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 
9[hwt 0]]: [./././././.][././B/B/./.]
  [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 
11[hwt 0]]: [./././././.][././././B/B]

I don't remember how I found that out.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread r...@open-mpi.org
FWIW: the socket option seems to work fine for me:

$ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname
[rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:200408] MCW rank 2 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]
[rhc001:200408] MCW rank 4 bound to socket 0[core 4[hwt 0-1]], socket 0[core 
5[hwt 0-1]]: 
[../../../../BB/BB/../../../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 5 bound to socket 1[core 16[hwt 0-1]], socket 1[core 
17[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../BB/BB/../../../../../..]
[rhc001:200408] MCW rank 6 bound to socket 0[core 6[hwt 0-1]], socket 0[core 
7[hwt 0-1]]: 
[../../../../../../BB/BB/../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../BB/BB/../../../..]
[rhc001:200408] MCW rank 8 bound to socket 0[core 8[hwt 0-1]], socket 0[core 
9[hwt 0-1]]: 
[../../../../../../../../BB/BB/../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 9 bound to socket 1[core 20[hwt 0-1]], socket 1[core 
21[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../../../BB/BB/../..]
[rhc001:200408] MCW rank 10 bound to socket 0[core 10[hwt 0-1]], socket 0[core 
11[hwt 0-1]]: 
[../../../../../../../../../../BB/BB][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 11 bound to socket 1[core 22[hwt 0-1]], socket 1[core 
23[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../../../../../BB/BB]
[rhc001:200408] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
$

I know that isn’t the pattern you are seeking - will have to ponder that one a 
bit. Is it possible that mpirun is not sitting on the same topology as your 
compute nodes?


> On Oct 3, 2016, at 2:22 PM, Wirawan Purwanto  wrote:
> 
> Hi,
> 
> I have been trying to understand how to correctly launch hybrid
> MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite
> puzzled as to what is the correct command-line options to use. The
> description on mpirun man page is very confusing and I could not get
> what I wanted.
> 
> A background: The cluster is using SGE, and I am using OpenMPI 1.10.2
> compiled with & for gcc 4.9.3. The MPI library was configured with SGE
> support. The compute nodes have 32 cores, which are basically 2
> sockets of Xeon E5-2698 v3 (16-core Haswell).
> 
> A colleague told me the following:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE
> 
> I could see the executable using 200% of CPU per process--that's good.
> There is one catch in the general case. "-map-by node" will assign the
> MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi
> rank 1 gets node 1, and so on until all nodes are given 1 process,
> then it will go back to node 0,1, ...).
> 
> Instead of the scenario above, I was trying to get the MPI processes
> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
> node 0 first, then fill node 1, and so on. How do I do this properly?
> 
> I tried a few attempts that fail:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE
> 
> or
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE
> 
> Both failed with an error mesage:
> 
> --
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that cannot support that
> directive.
> 
> Please specify a mapping level that has more than one cpu, or
> else let us define a default mapping that will allow multiple
> cpus-per-proc.
> --
> 
> Another attempt was:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE
> 
> Here's the error message:
> 
> --
> A request for multiple cpus-per-proc was given, but a conflicting binding
> policy was specified:
> 
>  #cpus-per-proc:  2
>  type of cpus:cores as cpus
>  binding policy given: SOCKET
> 
> The correct binding policy for the given type of cpu is:
> 
>  correct binding policy:  bind-to core
> 
> This is the binding policy we would 

[OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread Wirawan Purwanto
Hi,

I have been trying to understand how to correctly launch hybrid
MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite
puzzled as to what is the correct command-line options to use. The
description on mpirun man page is very confusing and I could not get
what I wanted.

A background: The cluster is using SGE, and I am using OpenMPI 1.10.2
compiled with & for gcc 4.9.3. The MPI library was configured with SGE
support. The compute nodes have 32 cores, which are basically 2
sockets of Xeon E5-2698 v3 (16-core Haswell).

A colleague told me the following:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE

I could see the executable using 200% of CPU per process--that's good.
There is one catch in the general case. "-map-by node" will assign the
MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi
rank 1 gets node 1, and so on until all nodes are given 1 process,
then it will go back to node 0,1, ...).

Instead of the scenario above, I was trying to get the MPI processes
side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
node 0 first, then fill node 1, and so on. How do I do this properly?

I tried a few attempts that fail:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE

or

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE

Both failed with an error mesage:

--
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.

Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.
--

Another attempt was:

$ export OMP_NUM_THREADS=2
$ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE

Here's the error message:

--
A request for multiple cpus-per-proc was given, but a conflicting binding
policy was specified:

  #cpus-per-proc:  2
  type of cpus:cores as cpus
  binding policy given: SOCKET

The correct binding policy for the given type of cpu is:

  correct binding policy:  bind-to core

This is the binding policy we would apply by default for this
situation, so no binding need be specified. Please correct the
situation and try again.
--

Clearly I am not understanding how this map-by works. Could somebody
help me? There was a wiki article partially written:

https://github.com/open-mpi/ompi/wiki/ProcessPlacement

but unfortunately it is also not clear to me.

-- 
Wirawan Purwanto
Computational Scientist, HPC Group
Information Technology Services
Old Dominion University
Norfolk, VA 23529
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users