Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces
Am 13.11.2014 um 00:55 schrieb Gilles Gouaillardet: > Could you please send the output of netstat -nr on both head and compute node > ? Head node: annemarie:~ # netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 137.248.x.y 0.0.0.0 UG0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 137.248.x.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 192.168.151.80 0.0.0.0 255.255.255.255 UH0 0 0 eth1 192.168.154.0 0.0.0.0 255.255.255.192 U 0 0 0 eth1 192.168.154.128 0.0.0.0 255.255.255.192 U 0 0 0 eth3 Compute node with (wrong) entry for the non-existing GW: node28:~ # netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.154.60 0.0.0.0 UG0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 192.168.154.0 0.0.0.0 255.255.255.192 U 0 0 0 eth0 192.168.154.64 0.0.0.0 255.255.255.192 U 0 0 0 eth1 As said: when I remove the "default" entry for the GW it starts up instantly. -- Reti > no problem obfuscating the ip of the head node, i am only interested in > netmasks and routes. > > Ralph Castain wrote: >> >>> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >>> >>> Am 12.11.2014 um 17:27 schrieb Reuti: >>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: > Another thing you can do is (a) ensure you built with —enable-debug, and > then (b) run it with -mca oob_base_verbose 100 (without the > tcp_if_include option) so we can watch the connection handshake and see > what it is doing. The —hetero-nodes will have not affect here and can be > ignored. Done. It really tries to connect to the outside interface of the headnode. But being there a firewall or not: the nodes have no clue how to reach 137.248.0.0 - they have no gateway to this network at all. >>> >>> I have to revert this. They think that there is a gateway although it >>> isn't. When I remove the entry by hand for the gateway in the routing table >>> it starts up instantly too. >>> >>> While I can do this on my own cluster I still have the 30 seconds delay on >>> a cluster where I'm not root, while this can be because of the firewall >>> there. The gateway on this cluster is indeed going to the outside world. >>> >>> Personally I find this behavior a little bit too aggressive to use all >>> interfaces. If you don't check this carefully beforehand and start a long >>> running application one might even not notice the delay during the startup. >> >> Agreed - do you have any suggestions on how we should choose the order in >> which to try them? I haven’t been able to come up with anything yet. Jeff >> has some fancy algo in his usnic BTL that we are going to discuss after SC >> that I’m hoping will help, but I’d be open to doing something better in the >> interim for 1.8.4 >> >>> >>> -- Reuti >>> >>> It tries so independent from the internal or external name of the headnode given in the machinefile - I hit ^C then. I attached the output of Open MPI 1.8.1 for this setup too. -- Reuti ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/11/25777.php >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25782.php > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25783.php >
Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces
Could you please send the output of netstat -nr on both head and compute node ? no problem obfuscating the ip of the head node, i am only interested in netmasks and routes. Ralph Castain wrote: > >> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >> >> Am 12.11.2014 um 17:27 schrieb Reuti: >> >>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>> Another thing you can do is (a) ensure you built with —enable-debug, and then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include option) so we can watch the connection handshake and see what it is doing. The —hetero-nodes will have not affect here and can be ignored. >>> >>> Done. It really tries to connect to the outside interface of the headnode. >>> But being there a firewall or not: the nodes have no clue how to reach >>> 137.248.0.0 - they have no gateway to this network at all. >> >> I have to revert this. They think that there is a gateway although it isn't. >> When I remove the entry by hand for the gateway in the routing table it >> starts up instantly too. >> >> While I can do this on my own cluster I still have the 30 seconds delay on a >> cluster where I'm not root, while this can be because of the firewall there. >> The gateway on this cluster is indeed going to the outside world. >> >> Personally I find this behavior a little bit too aggressive to use all >> interfaces. If you don't check this carefully beforehand and start a long >> running application one might even not notice the delay during the startup. > >Agreed - do you have any suggestions on how we should choose the order in >which to try them? I haven’t been able to come up with anything yet. Jeff has >some fancy algo in his usnic BTL that we are going to discuss after SC that >I’m hoping will help, but I’d be open to doing something better in the interim >for 1.8.4 > >> >> -- Reuti >> >> >>> It tries so independent from the internal or external name of the headnode >>> given in the machinefile - I hit ^C then. I attached the output of Open MPI >>> 1.8.1 for this setup too. >>> >>> -- Reuti >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25781.php > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25782.php
Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces
Right I understand those are TCP interfaces, I was just showing that I have two TCP interfaces over one physical interface, so why I was asking how TCP interfaces were selected. It rarely if ever will mater to us. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 > On Nov 7, 2014, at 8:38 PM, Gilles Gouaillardet > wrote: > > Ralph, > > IIRC there is load balancing accros all the btl, for example > between vader and scif. > So load balancing between ib0 and eoib0 is just a particular case that might > not necessarily be handled by the btl tcp. > > Cheers, > > Gilles > > Ralph Castain wrote: >> OMPI discovers all active interfaces and automatically considers them >> available for its use unless instructed otherwise via the params. I’d have >> to look at the TCP BTL code to see the loadbalancing algo - I thought we >> didn’t have that “on” by default across BTLs, but I don’t know if the TCP >> one automatically uses all available Ethernet interfaces by default. Sounds >> like it must. >> >> >>> On Nov 7, 2014, at 11:53 AM, Brock Palen wrote: >>> >>> I was doing a test on our IB based cluster, where I was diabling IB >>> >>> --mca btl ^openib --mca mtl ^mxm >>> >>> I was sending very large messages >1GB and I was surppised by the speed. >>> >>> I noticed then that of all our ethernet interfaces >>> >>> eth0 (1gig-e) >>> ib0 (ip over ib, for lustre configuration at vendor request) >>> eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some >>> extrnal storage support at >1Gig speed >>> >>> I saw all three were getting traffic. >>> >>> We use torque for our Resource Manager and use TM support, the hostnames >>> given by torque match the eth0 interfaces. >>> >>> How does OMPI figure out that it can also talk over the others? How does >>> it chose to load balance? >>> >>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 >>> and eoib0 are the same physical device and may screw with load balancing >>> if anyone ver falls back to TCP. >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25710.php > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25712.php
Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces
Ralph, IIRC there is load balancing accros all the btl, for example between vader and scif. So load balancing between ib0 and eoib0 is just a particular case that might not necessarily be handled by the btl tcp. Cheers, Gilles Ralph Castain wrote: >OMPI discovers all active interfaces and automatically considers them >available for its use unless instructed otherwise via the params. I’d have to >look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t >have that “on” by default across BTLs, but I don’t know if the TCP one >automatically uses all available Ethernet interfaces by default. Sounds like >it must. > > >> On Nov 7, 2014, at 11:53 AM, Brock Palen wrote: >> >> I was doing a test on our IB based cluster, where I was diabling IB >> >> --mca btl ^openib --mca mtl ^mxm >> >> I was sending very large messages >1GB and I was surppised by the speed. >> >> I noticed then that of all our ethernet interfaces >> >> eth0 (1gig-e) >> ib0 (ip over ib, for lustre configuration at vendor request) >> eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some >> extrnal storage support at >1Gig speed >> >> I saw all three were getting traffic. >> >> We use torque for our Resource Manager and use TM support, the hostnames >> given by torque match the eth0 interfaces. >> >> How does OMPI figure out that it can also talk over the others? How does it >> chose to load balance? >> >> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 >> and eoib0 are the same physical device and may screw with load balancing if >> anyone ver falls back to TCP. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25709.php > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25710.php
Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces
Brock, Is your post related to ib0/eoib0 being used at all, or being used with load balancing ? let me clarify this : --mca btl ^openib disables the openib btl aka *native* infiniband. This does not disable ib0 and eoib0 that are handled by the tcp btl. As you already figured out, btl_tcp_if_include (or btl_tcp_if_exclude) can be used for that purpose. Cheers, Gilles Ralph Castain wrote: >OMPI discovers all active interfaces and automatically considers them >available for its use unless instructed otherwise via the params. I’d have to >look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t >have that “on” by default across BTLs, but I don’t know if the TCP one >automatically uses all available Ethernet interfaces by default. Sounds like >it must. > > >> On Nov 7, 2014, at 11:53 AM, Brock Palen wrote: >> >> I was doing a test on our IB based cluster, where I was diabling IB >> >> --mca btl ^openib --mca mtl ^mxm >> >> I was sending very large messages >1GB and I was surppised by the speed. >> >> I noticed then that of all our ethernet interfaces >> >> eth0 (1gig-e) >> ib0 (ip over ib, for lustre configuration at vendor request) >> eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some >> extrnal storage support at >1Gig speed >> >> I saw all three were getting traffic. >> >> We use torque for our Resource Manager and use TM support, the hostnames >> given by torque match the eth0 interfaces. >> >> How does OMPI figure out that it can also talk over the others? How does it >> chose to load balance? >> >> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 >> and eoib0 are the same physical device and may screw with load balancing if >> anyone ver falls back to TCP. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25709.php > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25710.php