Hi, Am 14.03.2013 um 09:20 schrieb yumenlj:
> Hi, all > > I encountered a problem about mpirun and SSH when using Open MPI 1.7rc8. > > I have a 4-node cluster. This is the hostfile: > > [mpiuser@testnode11 openmpi-1.6.4]$ cat ~/work/hostfile > testnode11 > testnode12 > testnode13 > testnode14 > > I had configured SSH, copying ".ssh/id_rsa.pub" on testnode11 to > ".ssh/authorized_keys" on all the 4 nodes. > So that I can login all the 4 nodes from testnode11 without a password. > > The following test worked well with Open MPI 1.6.4. > > [mpiuser@testnode11 openmpi-1.6.4]$ mpirun -hostfile ~/work/hostfile -np 8 > ~/src/openmpi-1.6.4/examples/ring_c > Process 0 sending 10 to 1, tag 201 (8 processes in ring) > Process 0 sent to 1 > Process 0 decremented value: 9 > Process 0 decremented value: 8 > Process 0 decremented value: 7 > Process 0 decremented value: 6 > Process 0 decremented value: 5 > Process 0 decremented value: 4 > Process 0 decremented value: 3 > Process 0 decremented value: 2 > Process 0 decremented value: 1 > Process 0 decremented value: 0 > Process 0 exiting > Process 4 exiting > Process 2 exiting > Process 3 exiting > Process 1 exiting > Process 6 exiting > Process 7 exiting > Process 5 exiting > > However, when I switched to Open MPI 1.7rc8, the same test did not work. > > [mpiuser@testnode11 openmpi-1.7rc8]$ mpirun -hostfile ~/work/hostfile -np 8 > ~/src/openmpi-1.7rc8/examples/ring_c > Permission denied, please try again. > Permission denied, please try again. > Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). > [testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to > be sent to a process whose contact information is unknown in file > rml_oob_send.c at line 362 > [testnode12:06990] [[50636,0],1] attempted to send to [[50636,0],3]: tag 15 > [testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to > be sent to a process whose contact information is unknown in file > base/grpcomm_base_xcast.c at line 166 > > I had checked the logs of SSH, and found the direct reason. A SSH request > from testnode12 to testnode14 was denied. > > [mpiuser@testnode11 openmpi-1.7rc8]$ ssh root@testnode14 tail -f > /var/log/secure > ... > Mar 14 15:39:01 testnode14 sshd[31610]: Connection closed by testnode12 > Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from > testnode12 port 55964 ssh2 > Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from > testnode12 port 55964 ssh2 > Mar 14 15:39:04 testnode14 sshd[31612]: Connection closed by testnode12 > ... > > So I am puzzled. I launched mpirun on testnode11, but I do not know why > testnode12 would send a SSH request to testnode14. > One solution is to copy ".ssh/id_rsa.pub" on all the nodes to > ".ssh/authorized_keys" If all nodes have their own private key without a passphrase set this would work. OTOH copying the private key of testnode11 to all other nodes should also do. > on all the nodes, but that is not what I want. > Is there any way to control that all the SSH requests are sent from the node > where mpirun executed, to all the nodes? > I had checked all the orte parameters, and no answer found. Please give some > suggestions. Depending on the amount of nodes and in case you don't like passphrase-less ssh-keys at all like I do: setting up hostbased authentication could help: http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html -- Reuti > Thanks! > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users