I'll ignore the rest of this thread as it kinda diverged from your original
question. I've been reviewing the code, and I think I'm getting a handle on
the issue.

Just to be clear - your hostname resolves to the 127 address? And you are
on a Linux (not one of the BSD flavors out there)?

If the answer to both is "yes", then the problem is that we ignore loopback
devices if anything else is present. When we check to see if the hostname
we were given is the local node, we resolve the name to the address and
then check our list of interfaces. The loopback device is ignored and
therefore not on the list. So if you resolve to the 127 address, we will
decide this is a different node than the one we are on.

I can modify that logic, but want to ensure this accurately captures the
problem. I'll also have to discuss the change with the other developers to
ensure we don't shoot ourselves in the foot if we make it.



On Thu, Jun 20, 2013 at 2:56 AM, Riccardo Murri <riccardo.mu...@uzh.ch>wrote:

> On 20 June 2013 06:33, Ralph Castain <r...@open-mpi.org> wrote:
> > Been trying to decipher this problem, and think maybe I'm beginning to
> > understand it. Just to clarify:
> >
> > * when you execute "hostname", you get the <name>.local response?
>
> Yes:
>
>     [rmurri@nh64-2-11 ~]$ hostname
>     nh64-2-11.local
>
>     [rmurri@nh64-2-11 ~]$ uname -n
>     nh64-2-11.local
>
>     [rmurri@nh64-2-11 ~]$ hostname -s
>     nh64-2-11
>
>     [rmurri@nh64-2-11 ~]$ hostname -f
>     nh64-2-11.local
>
>
> > * you somewhere have it setup so that 10.x.x.x resolves to <name>, with
> no
> > ".local" extension?
>
> No. Host name resolution is correct, but the hostname resolves to the
> 127.0.1.1 address:
>
>     [rmurri@nh64-2-11 ~]$ getent hosts `hostname`
>     127.0.1.1    nh64-2-11.local nh64-2-11
>
> Note that `/etc/hosts` also lists a 10.x.x.x address, which is the one
> actually assigned to the ethernet interface:
>
>     [rmurri@nh64-2-11 ~]$ fgrep `hostname -s` /etc/hosts
>     127.0.1.1       nh64-2-11.local nh64-2-11
>     10.1.255.201    nh64-2-11.local nh64-2-11
>     192.168.255.206 nh64-2-11-myri0
>
> If we remove the `127.0.1.1` line from `/etc/hosts`, then everything
> works again.  Also, everything works if we use only FQDNs in the
> hostfile.
>
> So it seems that the 127.0.1.1 address is treated specially.
>
> Thanks,
> Riccardo
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to