Re: [OMPI users] Problem with connecting to 3 or more nodes

2015-01-16 Thread Jeff Squyres (jsquyres)
It's because Open MPI uses a tree-based ssh startup pattern.

(amusingly enough, I'm literally half way through writing up a blog entry about 
this exact same issue :-) )

That is, not only does Open MPI ssh from your mpirun-server to host1, Open MPI 
may also ssh from host1 to host2 (or host1 to host3).

In short, if you're not using a resource manager (such as Torque or SLURM), 
then you can't predict the ssh pattern, and you need 
passwordless/passphraseless ssh logins from each server to each other server.

Make sense?


> On Jan 16, 2015, at 3:29 PM, Chan, Elbert  wrote:
> 
> Hi
> 
> I'm hoping that someone will be able to help me figure out a problem with 
> connecting to multiple nodes with v1.8.4. 
> 
> Currently, I'm running into this issue:
> $ mpirun --host host1 hostname
> host1
> 
> $ mpirun --host host2,host3 hostname
> host2
> host3
> 
> Running this command on 1 or 2 nodes generates the expected result. However:
> $ mpirun --host host1,host2,host3 hostname
> Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,password,keyboard-interactive).
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>  settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>  Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>  Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>  (e.g., on Cray). Please check your configure cmd line and consider using
>  one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>  lack of common network interfaces and/or no route found between
>  them. Please check network connectivity (including firewalls
>  and network routing requirements).
> --
> 
> This is set up with passwordless logins with passphrases/ssh-agent. When I 
> run passphraseless, I get the expected result. 
> 
> What am I doing wrong? What can I look at to see where my problem could be?
> 
> Elbert
> 
> --
> 
> Elbert Chan
> Operating Systems Analyst
> College of ECC
> CSU, Chico
> 530-898-6481
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/01/26207.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI users] Problem with connecting to 3 or more nodes

2015-01-16 Thread Chan, Elbert
Hi

I'm hoping that someone will be able to help me figure out a problem with 
connecting to multiple nodes with v1.8.4. 

Currently, I'm running into this issue:
$ mpirun --host host1 hostname
host1

$ mpirun --host host2,host3 hostname
host2
host3

Running this command on 1 or 2 nodes generates the expected result. However:
$ mpirun --host host1,host2,host3 hostname
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password,keyboard-interactive).
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--

This is set up with passwordless logins with passphrases/ssh-agent. When I run 
passphraseless, I get the expected result. 

What am I doing wrong? What can I look at to see where my problem could be?

Elbert

--

Elbert Chan
Operating Systems Analyst
College of ECC
CSU, Chico
530-898-6481