> On Nov 13, 2014, at 9:20 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> 
> Am 13.11.2014 um 17:14 schrieb Ralph Castain:
> 
>> Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to 
>> assign different hostnames to their interfaces - I’ve seen it in the Hadoop 
>> world, but not in HPC. Still, no law against it.
> 
> Maybe it depends on the background to do it this way. At one point in the 
> past I read this Howto:
> 
> https://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html 
> <https://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html>
> 
> and appreciated the idea to route different services to different interfaces 
> - a large file copy won't hurt the MPI communication this way. As SGE handles 
> it well to contact the qmaster or execds on the correct interface of the 
> machines (which might be eth0, eth1 or any else one), I'm doing it for a 
> decade now this way and according to the mails on the SGE lists others are 
> doing it too. Hence I don't see it that unusual.

Understood - just not something we see in many installations. Most of the 
places we deal with route services by explicitly specifying interfaces as 
opposed to hostnames. Like I said, though, no law against it.

> 
> 
>> This will take a little thought to figure out a solution. One problem that 
>> immediately occurs is if someone includes a hostfile that has lines which 
>> refer to the same physical server, but using different interface names.
> 
> Yes, I see this point too. Therefore I had the idea to list all the 
> interfaces one want to use in one line. In case they put it in different 
> lines, they would do it the wrong way - their fault. One line = one machine, 
> unless the list of interfaces is exactly the same in multiple lines, then 
> they could be added up like now.
> 
> (Under SGE there is the [now correctly working] setup to get the same machine 
> a couple of times in case they origin from several queues. But this would 
> still fit with the above interpretation: the interface name is the same 
> although they are coming from different queues they can just be added up like 
> now in the GridEngine MCA.)
> 

Makes sense - I’ll look into it. Thanks!

> 
>> We’ll think those are completely distinct servers, and so the process 
>> placement will be totally messed up.
>> 
>> We’ll also encounter issues with the daemon when it reports back, as the 
>> hostname it gets will almost certainly differ from the hostname we were 
>> expecting. Not as critical, but need to check to see where that will impact 
>> the code base
> 
> Hence I prefer to use eth0 for Open MPI (for now). But I remember that there 
> was a time when it could be set up to route the MPI traffic dedicated to 
> eth1, although it was for MPICH(1):

You can do the same here:

-mca oob_tcp_if_include eth0 -mca btl_tcp_if_include eth1

or the equivalent. Main point is that you can separate the out-of-band traffic 
from the MPI traffic.

> 
> https://arc.liv.ac.uk/SGE/howto/mpich-integration.html 
> <https://arc.liv.ac.uk/SGE/howto/mpich-integration.html> => Wrong interface 
> selected for the back channel of the MPICH-tasks with the ch_p4-device
> 
> 
>> We can look at the hostfile changes at that time - no real objection to 
>> them, but would need to figure out how to pass that info to the appropriate 
>> subsystems. I assume you want this to apply to both the oob and tcp/btl?
> 
> Yes.
> 
> 
>> Obviously, this won’t make it for 1.8 as it is going to be fairly intrusive, 
>> but we can probably do something for 1.9
>> 
>>> On Nov 13, 2014, at 4:23 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>> 
>>> Am 13.11.2014 um 00:34 schrieb Ralph Castain:
>>> 
>>>>> On Nov 12, 2014, at 2:45 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>>> 
>>>>> Am 12.11.2014 um 17:27 schrieb Reuti:
>>>>> 
>>>>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>>>>>> 
>>>>>>> Another thing you can do is (a) ensure you built with —enable-debug, 
>>>>>>> and then (b) run it with -mca oob_base_verbose 100  (without the 
>>>>>>> tcp_if_include option) so we can watch the connection handshake and see 
>>>>>>> what it is doing. The —hetero-nodes will have not affect here and can 
>>>>>>> be ignored.
>>>>>> 
>>>>>> Done. It really tries to connect to the outside interface of the 
>>>>>> headnode. But being there a firewall or not: the nodes have no clue how 
>>>>>> to reach 137.248.0.0 - they have no gateway to this network at all.
>>>>> 
>>>>> I have to revert this. They think that there is a gateway although it 
>>>>> isn't. When I remove the entry by hand for the gateway in the routing 
>>>>> table it starts up instantly too.
>>>>> 
>>>>> While I can do this on my own cluster I still have the 30 seconds delay 
>>>>> on a cluster where I'm not root, while this can be because of the 
>>>>> firewall there. The gateway on this cluster is indeed going to the 
>>>>> outside world.
>>>>> 
>>>>> Personally I find this behavior a little bit too aggressive to use all 
>>>>> interfaces. If you don't check this carefully beforehand and start a long 
>>>>> running application one might even not notice the delay during the 
>>>>> startup.
>>>> 
>>>> Agreed - do you have any suggestions on how we should choose the order in 
>>>> which to try them? I haven’t been able to come up with anything yet. Jeff 
>>>> has some fancy algo in his usnic BTL that we are going to discuss after SC 
>>>> that I’m hoping will help, but I’d be open to doing something better in 
>>>> the interim for 1.8.4
>>> 
>>> The plain`mpiexec` should just use the specified interface it finds in the 
>>> hostfile. Being it hand crafted or prepared by any queuing system.
>>> 
>>> 
>>> Option: could a single entry for a machine in the hostfile contain a list 
>>> of interfaces? I mean something like:
>>> 
>>> node01,node01-extra-eth1,node01-extra-eth2 slots=4
>>> 
>>> or
>>> 
>>> node01* slots=4
>>> 
>>> Means: use exactly these interfaces or even try to find all available 
>>> interfaces on/between the machines.
>>> 
>>> In case all interfaces have the same name, then it's up to the admin to 
>>> correct this.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> It tries so independent from the internal or external name of the 
>>>>>> headnode given in the machinefile - I hit ^C then. I attached the output 
>>>>>> of Open MPI 1.8.1 for this setup too.
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25782.php
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25800.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25801.php 
>> <http://www.open-mpi.org/community/lists/users/2014/11/25801.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25802.php 
> <http://www.open-mpi.org/community/lists/users/2014/11/25802.php>

Reply via email to