I am testing a new cluster that we just bought, which is why I am loading things this way. I am deliberately increasing network traffic. But in general, we submit jobs intermittently with various numbers of MPI processes. I have read that a good strategy is to map by socket, which in our case means that we assign 2 MPI processes to node1, which has two sockets, 2 MPI processes to node2, and so on. For my test cases, each has 16 MPI processes, which means that each job is spread out over 8 nodes. Yes, if I were to always load up the entire cluster, I could map the way you suggest, but I am looking for a strategy that gives me optimum performance for small cluster loads and for large too.
Can anyone confirm whether or not it is best to map by socket in cases where you have a light load on your cluster? From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Friday, September 05, 2014 10:37 AM To: Open MPI User's List Subject: Re: [OMPI users] How does binding option affect network traffic? I'm confused, then: why you wouldn't want to minimize the number of servers that a single job runs on? I ask because it sounds to me like you're running 12 jobs, each with 1 process per server. And therefore all 12 jobs are running on each server, like this: [cid:image001.jpg@01CFC90F.89614760] With this layout, you're thrashing the server networking resources -- you're forcing the maximum use of the network. Why don't you pack the jobs in to as few servers as possible, and therefore use shared memory as much as possible, and as little network as possible? This is the conventional wisdom. ...perhaps I'm missing something in your setup? [cid:image002.jpg@01CFC90F.89614760] On Sep 3, 2014, at 10:02 AM, McGrattan, Kevin B. Dr. <kevin.mcgrat...@nist.gov<mailto:kevin.mcgrat...@nist.gov>> wrote: No, there are 12 cores per node, and 12 MPI processes are assigned to each node. The total RAM usage is about 10% of available. We suspect that the problem might be the combination of MPI message passing and disk I/O to the master node, both of which are handled by Infiniband. But I do not know how to monitor the traffic, and I do not know how much is too much. Ganglia reports Gigabit Ethernet usage, but we're primarily using IB. -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, September 02, 2014 5:41 PM To: Open MPI User's List Subject: Re: [OMPI users] How does binding option affect network traffic? Ah, ok -- I think I missed this part of the thread: each of your individual MPI processes suck up huge gobbs of memory. So just to be clear, in general: you don't intend to run more MPI processes than cores per server, *and* you intend to run fewer MPI processes per server than would consume the entire amount of RAM. Are both of those always correct (at the same time)? If so, it sounds like the first runs that you posted about were heavily overloading the servers in terms of RAM usage. Specifically: if you were running out of (registered) RAM, I can understand why Open MPI would hang. We have a few known issues that the openib BTL will hang if it runs out of registered memory -- but this is such a small corner case (because no one runs that way) that we've honestly never bothered to fix the issue (it's actually a really complicated resource exhaustion issue -- it's kinda hard to know what the Right Thing is to do when you've run out of memory...). On Sep 2, 2014, at 9:37 AM, McGrattan, Kevin B. Dr. <kevin.mcgrat...@nist.gov<mailto:kevin.mcgrat...@nist.gov>> wrote: Thanks for the advice. Our jobs vary in size, from just a few MPI processes to about 64. Jobs are submitted at random, which is why I want to map by socket. If the cluster is empty, and someone submits a job with 16 MPI processes, I would think it would run most efficiently if it used 8 nodes, 2 processes per node. If we just fill up two nodes as you suggest, we overload the RAM on those two nodes. -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of tmish...@jcity.maeda.co.jp<mailto:tmish...@jcity.maeda.co.jp> Sent: Friday, August 29, 2014 5:24 PM To: Open MPI Users Subject: Re: [OMPI users] How does binding option affect network traffic? Hi, Your cluster is very similar to ours where Torque and OpenMPI is installed. I would use this cmd line: #PBS -l nodes=2:ppn=12 mpirun --report-bindings -np 16 <executable file name> Here --map-by socket:pe=1 and -bind-to core is assumed as default setting. Then, you can run 10 jobs independently and simultaneously beacause you have 20 nodes totally. While each node in your cluser has 12 cores, the nprocs per node running on a node is 8, which means 66.666 % use, not 100%. I think this loss can not be avoided as long as you use 16*N MPI per job. It's a kind of mismatch with your cluster which has 12 cores per node. If you can use 12*N MPI per job, then it's most effective. Is there any reason why you use 16*N MPI per job? Tetsuya _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/25201.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25220.php -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25233.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25249.php -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/