Hi,

Am 28.08.2014 um 20:50 schrieb McGrattan, Kevin B. Dr.:

> My institute recently purchased a linux cluster with 20 nodes; 2 sockets per 
> node; 6 cores per socket. OpenMPI v 1.8.1 is installed. I want to run 15 
> jobs. Each job requires 16 MPI processes.  For each job, I want to use two 
> cores on each node, mapping by socket. If I use these options:
>  
> #PBS -l nodes=8:ppn=2
> mpirun --report-bindings --bind-to core --map-by socket:PE=1 -np 16 
> <executable file name>
>  
> The reported bindings are:
>  
> [burn001:09186] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././.][./././././.]
> [burn001:09186] MCW rank 1 bound to socket 1[core 6[hwt 0]]: 
> [./././././.][B/././././.]
> [burn004:07113] MCW rank 6 bound to socket 0[core 0[hwt 0]]: 
> [B/././././.][./././././.]
> [burn004:07113] MCW rank 7 bound to socket 1[core 6[hwt 0]]: 
> [./././././.][B/././././.]
> and so on…
>  
> These bindings appear to be OK, but when I do a “top –H” on each node, I see 
> that all 15 jobs use core 0 and core 6 on each node. This means, I believe, 
> that I am only using 1/6 or my resources. I want to use 100%. So I try this:
>  
> #PBS -l nodes=8:ppn=2
> mpirun --report-bindings --bind-to socket --map-by socket:PE=1 -np 16 
> <executable file name>
>  
> Now it appears that I am getting 100% usage of all cores on all nodes. The 
> bindings are:
>  
> [burn004:07244] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 
> 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.]
> [burn004:07244] MCW rank 1 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 
> 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B]
> [burn008:07256] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 
> 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B]
> [burn008:07256] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 
> 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.]
> and so on…
>  
> The problem now is that some of my jobs are hanging. They all start running 
> fine, and produce output. But at some point I lose about 4 out of 15 jobs due 
> to hanging. I suspect that an MPI message is passed and not received. The 
> number of jobs that hang and the time when they hang varies from test to 
> test. We have run these cases successfully on our old cluster dozens of times 
> – they are part of our benchmark suite.
>  
> When I run these jobs using a map by core strategy (that is, the MPI 
> processes are just mapped by core, and each job only uses 16 cores on two 
> nodes), I do not see as much hanging. It still occurs, but less often. This 
> leads me to suspect that there is something about the increased network 
> traffic due to the map-by-socket approach that is the cause of the problem. 
> But I do not know what to do about it. I think that the map-by-socket 
> approach is the right one, but I do not know if I have my OpenMPI options 
> just right.
>  
> Can you tell me what OpenMPI options to use, and can you tell me how I might 
> debug the hanging issue.

BTW: In modern systems the NIC(s) can be connected directly to one CPU, while 
the other CPU first has to send the data to the other CPU to get to the NIC 
(besides that the integrated NICs may be connect to the chipset).

Did anyone ever made some benchmarks whether there is a difference in which CPU 
was used in the system, i.e. the one to which the network adapter is connected 
or the other CPU - or even to the chipset one?

-- Reuti


> Kevin McGrattan
> National Institute of Standards and Technology
> 100 Bureau Drive, Mail Stop 8664
> Gaithersburg, Maryland 20899
>  
> 301 975 2712
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25181.php

Reply via email to