Ahhhh... thanks Gilles. That makes sense. I was stuck thinking there was
an ssh problem on rank 0; it never occurred to me mpirun was doing
something clever there and that those ssh errors were from a different
It's no problem to put my private key on all instances - I'll go that route.
On Mon, Feb 12, 2018 at 7:12 PM, Gilles Gouaillardet <
> by default, when more than 64 hosts are involved, mpirun uses a tree
> spawn in order to remote launch the orted daemons.
> That means you have two options here :
> - allow all compute nodes to ssh each other (e.g. the ssh private key
> of *all* the nodes should be in *all* the authorized_keys
> - do not use a tree spawn (e.g. mpirun --mca plm_rsh_no_tree_spawn true
> I recommend the first option, otherwise mpirun would fork&exec a large
> number of ssh processes and hence use quite a lot of
> resources on the node running mpirun.
> On Tue, Feb 13, 2018 at 8:23 AM, Adam Sylvester <op8...@gmail.com> wrote:
> > I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the
> > default ssh-based launcher, where I have my private ssh key on rank 0 and
> > the associated public key on all ranks. I create a hosts file with a
> > of unique IPs, with the host that I'm running mpirun from on the first
> > and run this command:
> > mpirun -N 1 --bind-to none --hostfile hosts.txt hostname
> > This works fine up to 64 machines. At 65 or greater, I get ssh errors.
> > Frequently
> > Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
> > though today another user got
> > Host key verification failed.
> > I have confirmed I can successfully manually ssh into these instances.
> > also written a loop in bash which will background an ssh sleep command
> to >
> > 64 instances and this succeeds.
> > From what I can tell, the /etc/ssh/ssh*config settings that limit ssh
> > connections have to do with inbound, not outbound limits, and I can
> prove by
> > running straight ssh commands that I'm not hitting a limit.
> > Is there something wrong with my mpirun syntax (I've run this way
> > of times without issues with fewer than 64 hosts, and I know MPI is
> > frequently used on orders of magnitudes more hosts than this)? Or is
> this a
> > known bug that's addressed in a later MPI release?
> > Thanks for the help.
> > -Adam
> > _______________________________________________
> > users mailing list
> > email@example.com
> > https://lists.open-mpi.org/mailman/listinfo/users
> users mailing list
users mailing list