Hi, Am 09.11.2014 um 05:38 schrieb Ralph Castain:
> FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each > process receives a complete map of that info for every process in the job. So > when the TCP btl sets itself up, it attempts to connect across -all- the > interfaces published by the other end. > > So it doesn’t matter what hostname is provided by the RM. We discover and > “share” all of the interface info for every node, and then use them for > loadbalancing. does this lead to any time delay when starting up? I stayed with Open MPI 1.6.5 for some time and tried to use Open MPI 1.8.3 now. As there is a delay when the applications starts in my first compilation of 1.8.3 I disregarded even all my extra options and run it outside of any queuingsystem - the delay remains - on two different clusters. I tracked it down, that up to 1.8.1 it is working fine, but 1.8.2 already creates this delay when starting up a simple mpihello. I assume it may lay in the way how to reach other machines, as with one single machine there is no delay. But using one (and only one - no tree spawn involved) additional machine already triggers this delay. Did anyone else notice it? -- Reuti > HTH > Ralph > > >> On Nov 8, 2014, at 8:13 PM, Brock Palen <bro...@umich.edu> wrote: >> >> Ok I figured, i'm going to have to read some more for my own curiosity. The >> reason I mention the Resource Manager we use, and that the hostnames given >> but PBS/Torque match the 1gig-e interfaces, i'm curious what path it would >> take to get to a peer node when the node list given all match the 1gig >> interfaces but yet data is being sent out the 10gig eoib0/ib0 interfaces. >> >> I'll go do some measurements and see. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >>> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>> wrote: >>> >>> Ralph is right: OMPI aggressively uses all Ethernet interfaces by default. >>> >>> This short FAQ has links to 2 other FAQs that provide detailed information >>> about reachability: >>> >>> http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network >>> >>> The usNIC BTL uses UDP for its wire transport and actually does a much more >>> standards-conformant peer reachability determination (i.e., it actually >>> checks routing tables to see if it can reach a given peer which has all >>> kinds of caching benefits, kernel controls if you want them, etc.). We >>> haven't back-ported this to the TCP BTL because a) most people who use TCP >>> for MPI still use a single L2 address space, and b) no one has asked for >>> it. :-) >>> >>> As for the round robin scheduling, there's no indication from the Linux TCP >>> stack what the bandwidth is on a given IP interface. So unless you use the >>> btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., btl_tcp_bandwidth_eth0) MCA >>> params, OMPI will round-robin across them equally. >>> >>> If you have multiple IP interfaces sharing a single physical link, there >>> will likely be no benefit from having Open MPI use more than one of them. >>> You should probably use btl_tcp_if_include / btl_tcp_if_exclude to select >>> just one. >>> >>> >>> >>> >>> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote: >>> >>>> I was doing a test on our IB based cluster, where I was diabling IB >>>> >>>> --mca btl ^openib --mca mtl ^mxm >>>> >>>> I was sending very large messages >1GB and I was surppised by the speed. >>>> >>>> I noticed then that of all our ethernet interfaces >>>> >>>> eth0 (1gig-e) >>>> ib0 (ip over ib, for lustre configuration at vendor request) >>>> eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some >>>> extrnal storage support at >1Gig speed >>>> >>>> I saw all three were getting traffic. >>>> >>>> We use torque for our Resource Manager and use TM support, the hostnames >>>> given by torque match the eth0 interfaces. >>>> >>>> How does OMPI figure out that it can also talk over the others? How does >>>> it chose to load balance? >>>> >>>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 >>>> and eoib0 are the same physical device and may screw with load balancing >>>> if anyone ver falls back to TCP. >>>> >>>> Brock Palen >>>> www.umich.edu/~brockp >>>> CAEN Advanced Computing >>>> XSEDE Campus Champion >>>> bro...@umich.edu >>>> (734)936-1985 >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25713.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25715.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25716.php >