Re: [OMPI users] How OMPI picks ethernet interfaces

Reuti Mon, 10 Nov 2014 06:24:41 -0500 (EST)

Hi,

Am 09.11.2014 um 05:38 schrieb Ralph Castain:


> FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each 
> process receives a complete map of that info for every process in the job. So 
> when the TCP btl sets itself up, it attempts to connect across -all- the 
> interfaces published by the other end.
> 
> So it doesn’t matter what hostname is provided by the RM. We discover and 
> “share” all of the interface info for every node, and then use them for 
> loadbalancing.

does this lead to any time delay when starting up? I stayed with Open MPI 1.6.5 
for some time and tried to use Open MPI 1.8.3 now. As there is a delay when the 
applications starts in my first compilation of 1.8.3 I disregarded even all my 
extra options and run it outside of any queuingsystem - the delay remains - on 
two different clusters.

I tracked it down, that up to 1.8.1 it is working fine, but 1.8.2 already 
creates this delay when starting up a simple mpihello. I assume it may lay in 
the way how to reach other machines, as with one single machine there is no 
delay. But using one (and only one - no tree spawn involved) additional machine 
already triggers this delay.

Did anyone else notice it?

-- Reuti


> HTH
> Ralph
> 
> 
>> On Nov 8, 2014, at 8:13 PM, Brock Palen <bro...@umich.edu> wrote:
>> 
>> Ok I figured, i'm going to have to read some more for my own curiosity. The 
>> reason I mention the Resource Manager we use, and that the hostnames given 
>> but PBS/Torque match the 1gig-e interfaces, i'm curious what path it would 
>> take to get to a peer node when the node list given all match the 1gig 
>> interfaces but yet data is being sent out the 10gig eoib0/ib0 interfaces.  
>> 
>> I'll go do some measurements and see.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>>> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>> Ralph is right: OMPI aggressively uses all Ethernet interfaces by default.  
>>> 
>>> This short FAQ has links to 2 other FAQs that provide detailed information 
>>> about reachability:
>>> 
>>>  http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
>>> 
>>> The usNIC BTL uses UDP for its wire transport and actually does a much more 
>>> standards-conformant peer reachability determination (i.e., it actually 
>>> checks routing tables to see if it can reach a given peer which has all 
>>> kinds of caching benefits, kernel controls if you want them, etc.).  We 
>>> haven't back-ported this to the TCP BTL because a) most people who use TCP 
>>> for MPI still use a single L2 address space, and b) no one has asked for 
>>> it.  :-)
>>> 
>>> As for the round robin scheduling, there's no indication from the Linux TCP 
>>> stack what the bandwidth is on a given IP interface.  So unless you use the 
>>> btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., btl_tcp_bandwidth_eth0) MCA 
>>> params, OMPI will round-robin across them equally.
>>> 
>>> If you have multiple IP interfaces sharing a single physical link, there 
>>> will likely be no benefit from having Open MPI use more than one of them.  
>>> You should probably use btl_tcp_if_include / btl_tcp_if_exclude to select 
>>> just one.
>>> 
>>> 
>>> 
>>> 
>>> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote:
>>> 
>>>> I was doing a test on our IB based cluster, where I was diabling IB
>>>> 
>>>> --mca btl ^openib --mca mtl ^mxm
>>>> 
>>>> I was sending very large messages >1GB  and I was surppised by the speed.
>>>> 
>>>> I noticed then that of all our ethernet interfaces
>>>> 
>>>> eth0  (1gig-e)
>>>> ib0  (ip over ib, for lustre configuration at vendor request)
>>>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>>>> extrnal storage support at >1Gig speed
>>>> 
>>>> I saw all three were getting traffic.
>>>> 
>>>> We use torque for our Resource Manager and use TM support, the hostnames 
>>>> given by torque match the eth0 interfaces.
>>>> 
>>>> How does OMPI figure out that it can also talk over the others?  How does 
>>>> it chose to load balance?
>>>> 
>>>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 
>>>> and eoib0  are the same physical device and may screw with load balancing 
>>>> if anyone ver falls back to TCP.
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> XSEDE Campus Champion
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25713.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25715.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25716.php
>

Re: [OMPI users] How OMPI picks ethernet interfaces

Reply via email to