Re: [OMPI users] How does binding option affect network traffic?

McGrattan, Kevin B. Dr. Fri, 5 Sep 2014 13:44:56 -0400 (EDT)

I am testing a new cluster that we just bought, which is why I am loading 
things this way. I am deliberately increasing network traffic. But in general, 
we submit jobs intermittently with various numbers of MPI processes. I have 
read that a good strategy is to map by socket, which in our case means that we 
assign 2 MPI processes to node1, which has two sockets, 2 MPI processes to 
node2, and so on. For my test cases, each has 16 MPI processes, which means 
that each job is spread out over 8 nodes. Yes, if I were to always load up the 
entire cluster, I could map the way you suggest, but I am looking for a 
strategy that gives me optimum performance for small cluster loads and for 
large too.

Can anyone confirm whether or not it is best to map by socket in cases where 
you have a light load on your cluster?

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Friday, September 05, 2014 10:37 AM
To: Open MPI User's List
Subject: Re: [OMPI users] How does binding option affect network traffic?

I'm confused, then: why you wouldn't want to minimize the number of servers 
that a single job runs on?

I ask because it sounds to me like you're running 12 jobs, each with 1 process 
per server.  And therefore all 12 jobs are running on each server, like this:

[cid:image001.jpg@01CFC90F.89614760]
With this layout, you're thrashing the server networking resources -- you're 
forcing the maximum use of the network.

Why don't you pack the jobs in to as few servers as possible, and therefore use 
shared memory as much as possible, and as little network as possible?  This is 
the conventional wisdom.  ...perhaps I'm missing something in your setup?

[cid:image002.jpg@01CFC90F.89614760]

On Sep 3, 2014, at 10:02 AM, McGrattan, Kevin B. Dr. 
<kevin.mcgrat...@nist.gov<mailto:kevin.mcgrat...@nist.gov>> wrote:

No, there are 12 cores per node, and 12 MPI processes are assigned to each 
node. The total RAM usage is about 10% of available. We suspect that the 
problem might be the combination of MPI message passing and disk I/O to the 
master node, both of which are handled by Infiniband. But I do not know how to 
monitor the traffic, and I do not know how much is too much. Ganglia reports 
Gigabit Ethernet usage, but we're primarily using IB.

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Tuesday, September 02, 2014 5:41 PM
To: Open MPI User's List
Subject: Re: [OMPI users] How does binding option affect network traffic?

Ah, ok -- I think I missed this part of the thread: each of your individual MPI 
processes suck up huge gobbs of memory.

So just to be clear, in general: you don't intend to run more MPI processes 
than cores per server, *and* you intend to run fewer MPI processes per server 
than would consume the entire amount of RAM.

Are both of those always correct (at the same time)?

If so, it sounds like the first runs that you posted about were heavily 
overloading the servers in terms of RAM usage.  Specifically: if you were 
running out of (registered) RAM, I can understand why Open MPI would hang.  We 
have a few known issues that the openib BTL will hang if it runs out of 
registered memory -- but this is such a small corner case (because no one runs 
that way) that we've honestly never bothered to fix the issue (it's actually a 
really complicated resource exhaustion issue -- it's kinda hard to know what 
the Right Thing is to do when you've run out of memory...).

On Sep 2, 2014, at 9:37 AM, McGrattan, Kevin B. Dr. 
<kevin.mcgrat...@nist.gov<mailto:kevin.mcgrat...@nist.gov>> wrote:

Thanks for the advice. Our jobs vary in size, from just a few MPI processes to 
about 64. Jobs are submitted at random, which is why I want to map by socket. 
If the cluster is empty, and someone submits a job with 16 MPI processes, I 
would think it would run most efficiently if it used 8 nodes, 2 processes per 
node. If we just fill up two nodes as you suggest, we overload the RAM on those 
two nodes.

-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of 
tmish...@jcity.maeda.co.jp<mailto:tmish...@jcity.maeda.co.jp>
Sent: Friday, August 29, 2014 5:24 PM
To: Open MPI Users
Subject: Re: [OMPI users] How does binding option affect network traffic?

Hi,

Your cluster is very similar to ours where Torque and OpenMPI is installed.

I would use this cmd line:

#PBS -l nodes=2:ppn=12
mpirun --report-bindings -np 16 <executable file name>

Here --map-by socket:pe=1 and -bind-to core is assumed as default setting.
Then, you can run 10 jobs independently and simultaneously beacause you have 20 
nodes totally.

While each node in your cluser has 12 cores, the nprocs per node running on a 
node is 8, which means 66.666 % use, not 100%.
I think this loss can not be avoided as long as you use 16*N MPI per job.
It's a kind of mismatch with your cluster which has 12 cores per node.
If you can use 12*N MPI per job, then it's most effective.
Is there any reason why you use 16*N MPI per job?

Tetsuya

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/25201.php
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25220.php

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25233.php
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25249.php

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] How does binding option affect network traffic?

Reply via email to