I’m not entirely sure I understand your reference to “real cores”. When we bind 
you to a core, we bind you to all the HT’s that comprise that core. So, yes, 
with HT enabled, the binding report will list things by HT, but you’ll always 
be bound to the full core if you tell us bind-to core

The default binding directive is bind-to socket when more than 2 processes are 
in the job, and that’s what you are showing. You can override that by adding 
"-bind-to core" to your cmd line if that is what you desire.

If you want to use individual HTs as independent processors, then 
“--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.

> On Apr 10, 2017, at 3:55 AM, Heinz-Ado Arnolds <arno...@mpa-garching.mpg.de> 
> wrote:
> 
> Dear OpenMPI users & developers,
> 
> I'm trying to distribute my jobs (with SGE) to a machine with a certain 
> number of nodes, each node having 2 sockets, each socket having 10 cores & 10 
> hyperthreads. I like to use only the real cores, no hyperthreading.
> 
> lscpu -a -e
> 
> CPU NODE SOCKET CORE L1d:L1i:L2:L3
> 0   0    0      0    0:0:0:0      
> 1   1    1      1    1:1:1:1      
> 2   0    0      2    2:2:2:0      
> 3   1    1      3    3:3:3:1      
> 4   0    0      4    4:4:4:0      
> 5   1    1      5    5:5:5:1      
> 6   0    0      6    6:6:6:0      
> 7   1    1      7    7:7:7:1      
> 8   0    0      8    8:8:8:0      
> 9   1    1      9    9:9:9:1      
> 10  0    0      10   10:10:10:0   
> 11  1    1      11   11:11:11:1   
> 12  0    0      12   12:12:12:0   
> 13  1    1      13   13:13:13:1   
> 14  0    0      14   14:14:14:0   
> 15  1    1      15   15:15:15:1   
> 16  0    0      16   16:16:16:0   
> 17  1    1      17   17:17:17:1   
> 18  0    0      18   18:18:18:0   
> 19  1    1      19   19:19:19:1   
> 20  0    0      0    0:0:0:0      
> 21  1    1      1    1:1:1:1      
> 22  0    0      2    2:2:2:0      
> 23  1    1      3    3:3:3:1      
> 24  0    0      4    4:4:4:0      
> 25  1    1      5    5:5:5:1      
> 26  0    0      6    6:6:6:0      
> 27  1    1      7    7:7:7:1      
> 28  0    0      8    8:8:8:0      
> 29  1    1      9    9:9:9:1      
> 30  0    0      10   10:10:10:0   
> 31  1    1      11   11:11:11:1   
> 32  0    0      12   12:12:12:0   
> 33  1    1      13   13:13:13:1   
> 34  0    0      14   14:14:14:0   
> 35  1    1      15   15:15:15:1   
> 36  0    0      16   16:16:16:0   
> 37  1    1      17   17:17:17:1   
> 38  0    0      18   18:18:18:0   
> 39  1    1      19   19:19:19:1   
> 
> How do I have to choose the options & parameters of mpirun to achieve this 
> behavior?
> 
> mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings 
> ./myid
> 
> distributes to
> 
> [pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], 
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 
> 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
> [pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 
> 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], 
> socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 
> 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 
> 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
> [pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 
> 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], 
> socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 
> 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 
> 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
> [pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 
> 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], 
> socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 
> 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 
> 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
> MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, 
> Cpus_allowed_list:       
> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
> MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, 
> Cpus_allowed_list:       
> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
> MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, 
> Cpus_allowed_list:       
> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
> MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, 
> Cpus_allowed_list:       
> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
> 
> i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all 
> hwthreads
> 
> I have tried several combinations of --use-hwthread-cpus, --bind-to 
> hwthreads, but didn't find the right combination.
> 
> Would be great to get any hints?
> 
> Thank a lot in advance,
> 
> Heinz-Ado Arnolds
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to