> On Nov 13, 2014, at 9:20 AM, Reuti <re...@staff.uni-marburg.de> wrote: > > Am 13.11.2014 um 17:14 schrieb Ralph Castain: > >> Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to >> assign different hostnames to their interfaces - I’ve seen it in the Hadoop >> world, but not in HPC. Still, no law against it. > > Maybe it depends on the background to do it this way. At one point in the > past I read this Howto: > > https://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html > <https://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html> > > and appreciated the idea to route different services to different interfaces > - a large file copy won't hurt the MPI communication this way. As SGE handles > it well to contact the qmaster or execds on the correct interface of the > machines (which might be eth0, eth1 or any else one), I'm doing it for a > decade now this way and according to the mails on the SGE lists others are > doing it too. Hence I don't see it that unusual.
Understood - just not something we see in many installations. Most of the places we deal with route services by explicitly specifying interfaces as opposed to hostnames. Like I said, though, no law against it. > > >> This will take a little thought to figure out a solution. One problem that >> immediately occurs is if someone includes a hostfile that has lines which >> refer to the same physical server, but using different interface names. > > Yes, I see this point too. Therefore I had the idea to list all the > interfaces one want to use in one line. In case they put it in different > lines, they would do it the wrong way - their fault. One line = one machine, > unless the list of interfaces is exactly the same in multiple lines, then > they could be added up like now. > > (Under SGE there is the [now correctly working] setup to get the same machine > a couple of times in case they origin from several queues. But this would > still fit with the above interpretation: the interface name is the same > although they are coming from different queues they can just be added up like > now in the GridEngine MCA.) > Makes sense - I’ll look into it. Thanks! > >> We’ll think those are completely distinct servers, and so the process >> placement will be totally messed up. >> >> We’ll also encounter issues with the daemon when it reports back, as the >> hostname it gets will almost certainly differ from the hostname we were >> expecting. Not as critical, but need to check to see where that will impact >> the code base > > Hence I prefer to use eth0 for Open MPI (for now). But I remember that there > was a time when it could be set up to route the MPI traffic dedicated to > eth1, although it was for MPICH(1): You can do the same here: -mca oob_tcp_if_include eth0 -mca btl_tcp_if_include eth1 or the equivalent. Main point is that you can separate the out-of-band traffic from the MPI traffic. > > https://arc.liv.ac.uk/SGE/howto/mpich-integration.html > <https://arc.liv.ac.uk/SGE/howto/mpich-integration.html> => Wrong interface > selected for the back channel of the MPICH-tasks with the ch_p4-device > > >> We can look at the hostfile changes at that time - no real objection to >> them, but would need to figure out how to pass that info to the appropriate >> subsystems. I assume you want this to apply to both the oob and tcp/btl? > > Yes. > > >> Obviously, this won’t make it for 1.8 as it is going to be fairly intrusive, >> but we can probably do something for 1.9 >> >>> On Nov 13, 2014, at 4:23 AM, Reuti <re...@staff.uni-marburg.de> wrote: >>> >>> Am 13.11.2014 um 00:34 schrieb Ralph Castain: >>> >>>>> On Nov 12, 2014, at 2:45 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>>>> >>>>> Am 12.11.2014 um 17:27 schrieb Reuti: >>>>> >>>>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>>>>> >>>>>>> Another thing you can do is (a) ensure you built with —enable-debug, >>>>>>> and then (b) run it with -mca oob_base_verbose 100 (without the >>>>>>> tcp_if_include option) so we can watch the connection handshake and see >>>>>>> what it is doing. The —hetero-nodes will have not affect here and can >>>>>>> be ignored. >>>>>> >>>>>> Done. It really tries to connect to the outside interface of the >>>>>> headnode. But being there a firewall or not: the nodes have no clue how >>>>>> to reach 137.248.0.0 - they have no gateway to this network at all. >>>>> >>>>> I have to revert this. They think that there is a gateway although it >>>>> isn't. When I remove the entry by hand for the gateway in the routing >>>>> table it starts up instantly too. >>>>> >>>>> While I can do this on my own cluster I still have the 30 seconds delay >>>>> on a cluster where I'm not root, while this can be because of the >>>>> firewall there. The gateway on this cluster is indeed going to the >>>>> outside world. >>>>> >>>>> Personally I find this behavior a little bit too aggressive to use all >>>>> interfaces. If you don't check this carefully beforehand and start a long >>>>> running application one might even not notice the delay during the >>>>> startup. >>>> >>>> Agreed - do you have any suggestions on how we should choose the order in >>>> which to try them? I haven’t been able to come up with anything yet. Jeff >>>> has some fancy algo in his usnic BTL that we are going to discuss after SC >>>> that I’m hoping will help, but I’d be open to doing something better in >>>> the interim for 1.8.4 >>> >>> The plain`mpiexec` should just use the specified interface it finds in the >>> hostfile. Being it hand crafted or prepared by any queuing system. >>> >>> >>> Option: could a single entry for a machine in the hostfile contain a list >>> of interfaces? I mean something like: >>> >>> node01,node01-extra-eth1,node01-extra-eth2 slots=4 >>> >>> or >>> >>> node01* slots=4 >>> >>> Means: use exactly these interfaces or even try to find all available >>> interfaces on/between the machines. >>> >>> In case all interfaces have the same name, then it's up to the admin to >>> correct this. >>> >>> -- Reuti >>> >>> >>>>> -- Reuti >>>>> >>>>> >>>>>> It tries so independent from the internal or external name of the >>>>>> headnode given in the machinefile - I hit ^C then. I attached the output >>>>>> of Open MPI 1.8.1 for this setup too. >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/11/25782.php >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25800.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25801.php >> <http://www.open-mpi.org/community/lists/users/2014/11/25801.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25802.php > <http://www.open-mpi.org/community/lists/users/2014/11/25802.php>