Can you verify that for all 4 nodes?  I.e., something like this:

foreach node (Node1 Node2 Node3 Node4)
  foreach other (Node1 Node2 Node3 Node 4)
     echo from $node to $other
     ssh $node ssh $other hostname



On Mar 12, 2014, at 7:34 AM, Victor <victor.ma...@gmail.com> wrote:

> Yes they are. Can resolve and log into each node, from each node, using their 
> "friendly" name, not IP.
> 
> 
> On 12 March 2014 18:15, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> Are all names resolvable from all servers?
> 
> I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work?
> 
> 
> On Mar 12, 2014, at 4:07 AM, Victor <victor.ma...@gmail.com> wrote:
> 
> > Hostname.... no I use lower case, but for some reason while I was writing 
> > the email I thought that upper case is clearer...
> >
> > The same version of Ubuntu (12.04 x64) is on all nodes and openmpi and the 
> > executable are shared via nfs.
> >
> >
> > On 12 March 2014 16:01, Reuti <re...@staff.uni-marburg.de> wrote:
> > Hi,
> >
> > Am 12.03.2014 um 07:37 schrieb Victor:
> >
> > > I am using openmpi 1.7.4 on Ubuntu 12.04 x64 and I have a very odd 
> > > problem.
> > >
> > > I have 4 nodes, all of which are defined in the hostfile and in 
> > > /etc/hosts.
> > >
> > > I can log into each node using ssh and certificate method from the shell 
> > > that is running the mpi job, by sing their name as defined in /etc/hosts.
> > >
> > > I can run an mpi job if I include only 3 nodes in the hostfile, for 
> > > example:
> > >
> > > Node1 slots=8 max-slots=8
> > > Node2 slots=8 max-slots=8
> > > Node3 slots=8 max-slots=8
> >
> > You are using an uppercase name here by intention - this is the one the 
> > host returns by `hostname`? Although it is allowed and should be mangled to 
> > lowercase resp. ignored for hostname resolution, I found that not all 
> > programs are doing it. Best is to use only lowercase characters is my 
> > experience.
> >
> > The same version of your Ubuntu Linux is installed on all machines?
> >
> > -- Reuti
> >
> >
> > > But if I add a fourth node into the hostfile eg:
> > >
> > > Node1 slots=8 max-slots=8
> > > Node2 slots=8 max-slots=8
> > > Node3 slots=8 max-slots=8
> > > Node4 slots=8 max-slots=8
> > >
> > > I get this error after attempting mpirun -np 32 --hostfile hostfile a.out:
> > >
> > > ssh: Could not resolve hostname Node4: Name or service not known.
> > >
> > > But, I can log into Node4 using ssh from the same shell by using ssh 
> > > Node4.
> > >
> > > Also if I mix up the hostfile like this for example and place Node1 to 
> > > the last spot:
> > >
> > > Node4 slots=8 max-slots=8
> > > Node2 slots=8 max-slots=8
> > > Node3 slots=8 max-slots=8
> > > Node1 slots=8 max-slots=8
> > >
> > > The error becomes
> > >
> > > ssh: Could not resolve hostname Node1: Name or service not known.
> > >
> > > If I then go back to the three node hostfile like this:
> > >
> > > Node1 slots=8 max-slots=8
> > > Node4 slots=8 max-slots=8
> > > Node2 slots=8 max-slots=8
> > >
> > > There is no error with three nodes even though both Node1 and Node4 
> > > "cannot be found" if they are present in a 4 node hostfile in the last 
> > > spot. The last slot seems to be bugged.
> > >
> > > What is going on? How do I fix this?
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to