Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend > On Oct 11, 2016, at 8:16 AM, Dave Lovewrote: > > Wirawan Purwanto writes: > >> Instead of the scenario above, I was trying to get the MPI processes >> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill >> node 0 first, then fill node 1, and so on. How do I do this properly? >> >> I tried a few attempts that fail: >> >> $ export OMP_NUM_THREADS=2 >> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE > > ... > >> Clearly I am not understanding how this map-by works. Could somebody >> help me? There was a wiki article partially written: >> >> https://github.com/open-mpi/ompi/wiki/ProcessPlacement >> >> but unfortunately it is also not clear to me. > > Me neither; this stuff has traditionally been quite unclear and really > needs documenting/explaining properly. > > This sort of thing from my local instructions for OMPI 1.8 probably does > what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me > a couple of small nodes): > > $ qrsh -pe mpi 24 -l num_proc=12 \ > mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |& > sort -k 4 -n > [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt 0]]: [B/B/./././.][./././././.] > [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core > 3[hwt 0]]: [././B/B/./.][./././././.] > [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core > 5[hwt 0]]: [././././B/B][./././././.] > [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core > 7[hwt 0]]: [./././././.][B/B/./././.] > [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core > 9[hwt 0]]: [./././././.][././B/B/./.] > [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core > 11[hwt 0]]: [./././././.][././././B/B] > [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core > 1[hwt 0]]: [B/B/./././.][./././././.] > [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core > 3[hwt 0]]: [././B/B/./.][./././././.] > [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core > 5[hwt 0]]: [././././B/B][./././././.] > [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core > 7[hwt 0]]: [./././././.][B/B/./././.] > [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core > 9[hwt 0]]: [./././././.][././B/B/./.] > [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core > 11[hwt 0]]: [./././././.][././././B/B] > > I don't remember how I found that out. > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?
Wirawan Purwantowrites: > Instead of the scenario above, I was trying to get the MPI processes > side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill > node 0 first, then fill node 1, and so on. How do I do this properly? > > I tried a few attempts that fail: > > $ export OMP_NUM_THREADS=2 > $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE ... > Clearly I am not understanding how this map-by works. Could somebody > help me? There was a wiki article partially written: > > https://github.com/open-mpi/ompi/wiki/ProcessPlacement > > but unfortunately it is also not clear to me. Me neither; this stuff has traditionally been quite unclear and really needs documenting/explaining properly. This sort of thing from my local instructions for OMPI 1.8 probably does what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me a couple of small nodes): $ qrsh -pe mpi 24 -l num_proc=12 \ mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |& sort -k 4 -n [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././.][./././././.] [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./.][./././././.] [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B][./././././.] [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]]: [./././././.][B/B/./././.] [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]]: [./././././.][././B/B/./.] [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][././././B/B] [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././.][./././././.] [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./.][./././././.] [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B][./././././.] [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]]: [./././././.][B/B/./././.] [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]]: [./././././.][././B/B/./.] [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][././././B/B] I don't remember how I found that out. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?
FWIW: the socket option seems to work fine for me: $ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname [rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..] [rhc001:200408] MCW rank 2 bound to socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]]: [../../BB/BB/../../../../../../../..][../../../../../../../../../../../..] [rhc001:200408] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] [rhc001:200408] MCW rank 4 bound to socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [../../../../BB/BB/../../../../../..][../../../../../../../../../../../..] [rhc001:200408] MCW rank 5 bound to socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]]: [../../../../../../../../../../../..][../../../../BB/BB/../../../../../..] [rhc001:200408] MCW rank 6 bound to socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]]: [../../../../../../BB/BB/../../../..][../../../../../../../../../../../..] [rhc001:200408] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../../BB/BB/../../../..] [rhc001:200408] MCW rank 8 bound to socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [../../../../../../../../BB/BB/../..][../../../../../../../../../../../..] [rhc001:200408] MCW rank 9 bound to socket 1[core 20[hwt 0-1]], socket 1[core 21[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../../../../BB/BB/../..] [rhc001:200408] MCW rank 10 bound to socket 0[core 10[hwt 0-1]], socket 0[core 11[hwt 0-1]]: [../../../../../../../../../../BB/BB][../../../../../../../../../../../..] [rhc001:200408] MCW rank 11 bound to socket 1[core 22[hwt 0-1]], socket 1[core 23[hwt 0-1]]: [../../../../../../../../../../../..][../../../../../../../../../../BB/BB] [rhc001:200408] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]]: [BB/BB/../../../../../../../../../..][../../../../../../../../../../../..] rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 rhc001 $ I know that isn’t the pattern you are seeking - will have to ponder that one a bit. Is it possible that mpirun is not sitting on the same topology as your compute nodes? > On Oct 3, 2016, at 2:22 PM, Wirawan Purwantowrote: > > Hi, > > I have been trying to understand how to correctly launch hybrid > MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite > puzzled as to what is the correct command-line options to use. The > description on mpirun man page is very confusing and I could not get > what I wanted. > > A background: The cluster is using SGE, and I am using OpenMPI 1.10.2 > compiled with & for gcc 4.9.3. The MPI library was configured with SGE > support. The compute nodes have 32 cores, which are basically 2 > sockets of Xeon E5-2698 v3 (16-core Haswell). > > A colleague told me the following: > > $ export OMP_NUM_THREADS=2 > $ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE > > I could see the executable using 200% of CPU per process--that's good. > There is one catch in the general case. "-map-by node" will assign the > MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi > rank 1 gets node 1, and so on until all nodes are given 1 process, > then it will go back to node 0,1, ...). > > Instead of the scenario above, I was trying to get the MPI processes > side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill > node 0 first, then fill node 1, and so on. How do I do this properly? > > I tried a few attempts that fail: > > $ export OMP_NUM_THREADS=2 > $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE > > or > > $ export OMP_NUM_THREADS=2 > $ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE > > Both failed with an error mesage: > > -- > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that cannot support that > directive. > > Please specify a mapping level that has more than one cpu, or > else let us define a default mapping that will allow multiple > cpus-per-proc. > -- > > Another attempt was: > > $ export OMP_NUM_THREADS=2 > $ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE > > Here's the error message: > > -- > A request for multiple cpus-per-proc was given, but a conflicting binding > policy was specified: > > #cpus-per-proc: 2 > type of cpus:cores as cpus > binding policy given: SOCKET > > The correct binding policy for the given type of cpu is: > > correct binding policy: bind-to core > > This is the binding policy we would
[OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?
Hi, I have been trying to understand how to correctly launch hybrid MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite puzzled as to what is the correct command-line options to use. The description on mpirun man page is very confusing and I could not get what I wanted. A background: The cluster is using SGE, and I am using OpenMPI 1.10.2 compiled with & for gcc 4.9.3. The MPI library was configured with SGE support. The compute nodes have 32 cores, which are basically 2 sockets of Xeon E5-2698 v3 (16-core Haswell). A colleague told me the following: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE I could see the executable using 200% of CPU per process--that's good. There is one catch in the general case. "-map-by node" will assign the MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi rank 1 gets node 1, and so on until all nodes are given 1 process, then it will go back to node 0,1, ...). Instead of the scenario above, I was trying to get the MPI processes side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill node 0 first, then fill node 1, and so on. How do I do this properly? I tried a few attempts that fail: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE or $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE Both failed with an error mesage: -- A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that cannot support that directive. Please specify a mapping level that has more than one cpu, or else let us define a default mapping that will allow multiple cpus-per-proc. -- Another attempt was: $ export OMP_NUM_THREADS=2 $ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE Here's the error message: -- A request for multiple cpus-per-proc was given, but a conflicting binding policy was specified: #cpus-per-proc: 2 type of cpus:cores as cpus binding policy given: SOCKET The correct binding policy for the given type of cpu is: correct binding policy: bind-to core This is the binding policy we would apply by default for this situation, so no binding need be specified. Please correct the situation and try again. -- Clearly I am not understanding how this map-by works. Could somebody help me? There was a wiki article partially written: https://github.com/open-mpi/ompi/wiki/ProcessPlacement but unfortunately it is also not clear to me. -- Wirawan Purwanto Computational Scientist, HPC Group Information Technology Services Old Dominion University Norfolk, VA 23529 ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users