Hi,

Am 10.11.2014 um 16:39 schrieb Ralph Castain:

> That is indeed bizarre - we haven’t heard of anything similar from other 
> users. What is your network configuration? If you use oob_tcp_if_include or 
> exclude, can you resolve the problem?

Thx - this option helped to get it working.

These tests were made for sake of simplicity between the headnode of the 
cluster and one (idle) compute node. I tried then between the (identical) 
compute nodes and this worked fine. The headnode of the cluster and the compute 
node are slightly different though (i.e. number of cores), and using eth1 resp. 
eth0 for the internal network of the cluster.

I tried --hetero-nodes with no change.

Then I turned to:

reuti@annemarie:~> date; mpiexec -mca btl self,tcp --mca oob_tcp_if_include 
192.168.154.0/26 -n 4 --hetero-nodes --hostfile machines ./mpihello; date

and the application started instantly. On another cluster, where the headnode 
is identical to the compute nodes but with the same network setup as above, I 
observed a delay of "only" 30 seconds. Nevertheless, also on this cluster the 
working addition was the correct "oob_tcp_if_include" to solve the issue.

The questions which remain: a) is this a targeted behavior, b) what changed in 
this scope between 1.8.1 and 1.8.2?

-- Reuti


> 
>> On Nov 10, 2014, at 4:50 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>> 
>> Am 10.11.2014 um 12:50 schrieb Jeff Squyres (jsquyres):
>> 
>>> Wow, that's pretty terrible!  :(
>>> 
>>> Is the behavior BTL-specific, perchance?  E.G., if you only use certain 
>>> BTLs, does the delay disappear?
>> 
>> You mean something like:
>> 
>> reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines 
>> ./mpihello; date
>> Mon Nov 10 13:44:34 CET 2014
>> Hello World from Node 1.
>> Total: 4
>> Universe: 4
>> Hello World from Node 0.
>> Hello World from Node 3.
>> Hello World from Node 2.
>> Mon Nov 10 13:46:42 CET 2014
>> 
>> (the above was even the latest v1.8.3-186-g978f61d)
>> 
>> Falling back to 1.8.1 gives (as expected):
>> 
>> reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines 
>> ./mpihello; date
>> Mon Nov 10 13:49:51 CET 2014
>> Hello World from Node 1.
>> Total: 4
>> Universe: 4
>> Hello World from Node 0.
>> Hello World from Node 2.
>> Hello World from Node 3.
>> Mon Nov 10 13:49:53 CET 2014
>> 
>> 
>> -- Reuti
>> 
>>> FWIW: the use-all-IP interfaces approach has been in OMPI forever. 
>>> 
>>> Sent from my phone. No type good. 
>>> 
>>>> On Nov 10, 2014, at 6:42 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>> 
>>>>> Am 10.11.2014 um 12:24 schrieb Reuti:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>> Am 09.11.2014 um 05:38 schrieb Ralph Castain:
>>>>>> 
>>>>>> FWIW: during MPI_Init, each process “publishes” all of its interfaces. 
>>>>>> Each process receives a complete map of that info for every process in 
>>>>>> the job. So when the TCP btl sets itself up, it attempts to connect 
>>>>>> across -all- the interfaces published by the other end.
>>>>>> 
>>>>>> So it doesn’t matter what hostname is provided by the RM. We discover 
>>>>>> and “share” all of the interface info for every node, and then use them 
>>>>>> for loadbalancing.
>>>>> 
>>>>> does this lead to any time delay when starting up? I stayed with Open MPI 
>>>>> 1.6.5 for some time and tried to use Open MPI 1.8.3 now. As there is a 
>>>>> delay when the applications starts in my first compilation of 1.8.3 I 
>>>>> disregarded even all my extra options and run it outside of any 
>>>>> queuingsystem - the delay remains - on two different clusters.
>>>> 
>>>> I forgot to mention: the delay is more or less exactly 2 minutes from the 
>>>> time I issued `mpiexec` until the `mpihello` starts up (there is no delay 
>>>> for the initial `ssh` to reach the other node though).
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> I tracked it down, that up to 1.8.1 it is working fine, but 1.8.2 already 
>>>>> creates this delay when starting up a simple mpihello. I assume it may 
>>>>> lay in the way how to reach other machines, as with one single machine 
>>>>> there is no delay. But using one (and only one - no tree spawn involved) 
>>>>> additional machine already triggers this delay.
>>>>> 
>>>>> Did anyone else notice it?
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> HTH
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>>> On Nov 8, 2014, at 8:13 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>>>> 
>>>>>>> Ok I figured, i'm going to have to read some more for my own curiosity. 
>>>>>>> The reason I mention the Resource Manager we use, and that the 
>>>>>>> hostnames given but PBS/Torque match the 1gig-e interfaces, i'm curious 
>>>>>>> what path it would take to get to a peer node when the node list given 
>>>>>>> all match the 1gig interfaces but yet data is being sent out the 10gig 
>>>>>>> eoib0/ib0 interfaces.  
>>>>>>> 
>>>>>>> I'll go do some measurements and see.
>>>>>>> 
>>>>>>> Brock Palen
>>>>>>> www.umich.edu/~brockp
>>>>>>> CAEN Advanced Computing
>>>>>>> XSEDE Campus Champion
>>>>>>> bro...@umich.edu
>>>>>>> (734)936-1985
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) 
>>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>>> 
>>>>>>>> Ralph is right: OMPI aggressively uses all Ethernet interfaces by 
>>>>>>>> default.  
>>>>>>>> 
>>>>>>>> This short FAQ has links to 2 other FAQs that provide detailed 
>>>>>>>> information about reachability:
>>>>>>>> 
>>>>>>>> http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
>>>>>>>> 
>>>>>>>> The usNIC BTL uses UDP for its wire transport and actually does a much 
>>>>>>>> more standards-conformant peer reachability determination (i.e., it 
>>>>>>>> actually checks routing tables to see if it can reach a given peer 
>>>>>>>> which has all kinds of caching benefits, kernel controls if you want 
>>>>>>>> them, etc.).  We haven't back-ported this to the TCP BTL because a) 
>>>>>>>> most people who use TCP for MPI still use a single L2 address space, 
>>>>>>>> and b) no one has asked for it.  :-)
>>>>>>>> 
>>>>>>>> As for the round robin scheduling, there's no indication from the 
>>>>>>>> Linux TCP stack what the bandwidth is on a given IP interface.  So 
>>>>>>>> unless you use the btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., 
>>>>>>>> btl_tcp_bandwidth_eth0) MCA params, OMPI will round-robin across them 
>>>>>>>> equally.
>>>>>>>> 
>>>>>>>> If you have multiple IP interfaces sharing a single physical link, 
>>>>>>>> there will likely be no benefit from having Open MPI use more than one 
>>>>>>>> of them.  You should probably use btl_tcp_if_include / 
>>>>>>>> btl_tcp_if_exclude to select just one.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>>>>>> 
>>>>>>>>> I was doing a test on our IB based cluster, where I was diabling IB
>>>>>>>>> 
>>>>>>>>> --mca btl ^openib --mca mtl ^mxm
>>>>>>>>> 
>>>>>>>>> I was sending very large messages >1GB  and I was surppised by the 
>>>>>>>>> speed.
>>>>>>>>> 
>>>>>>>>> I noticed then that of all our ethernet interfaces
>>>>>>>>> 
>>>>>>>>> eth0  (1gig-e)
>>>>>>>>> ib0  (ip over ib, for lustre configuration at vendor request)
>>>>>>>>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for 
>>>>>>>>> some extrnal storage support at >1Gig speed
>>>>>>>>> 
>>>>>>>>> I saw all three were getting traffic.
>>>>>>>>> 
>>>>>>>>> We use torque for our Resource Manager and use TM support, the 
>>>>>>>>> hostnames given by torque match the eth0 interfaces.
>>>>>>>>> 
>>>>>>>>> How does OMPI figure out that it can also talk over the others?  How 
>>>>>>>>> does it chose to load balance?
>>>>>>>>> 
>>>>>>>>> BTW that is fine, but we will use if_exclude on one of the IB ones as 
>>>>>>>>> ib0 and eoib0  are the same physical device and may screw with load 
>>>>>>>>> balancing if anyone ver falls back to TCP.
>>>>>>>>> 
>>>>>>>>> Brock Palen
>>>>>>>>> www.umich.edu/~brockp
>>>>>>>>> CAEN Advanced Computing
>>>>>>>>> XSEDE Campus Champion
>>>>>>>>> bro...@umich.edu
>>>>>>>>> (734)936-1985
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post: 
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Jeff Squyres
>>>>>>>> jsquy...@cisco.com
>>>>>>>> For corporate legal information go to: 
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25713.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25715.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25716.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25721.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25722.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25724.php
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25725.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25733.php
> 

Reply via email to