Max, the 'T' state of the ssh process is very puzzling.
can you try to run /usr/bin/ssh -x b09-32 orted on b09-30 and see what happens ? (it should fail with an error message, instead of hanging) In order to check there is no firewall, can you run instead iptables -L Also, is 'selinux' enabled ? there could be some rules that prevent 'ssh' from working as expected Cheers, Gilles On Sat, May 12, 2018 at 7:38 AM, Max Mellette <wmell...@ucsd.edu> wrote: > Hi Jeff, > > Thanks for the reply. FYI since I originally posted this, I uninstalled > OpenMPI 3.0.1 and installed 3.1.0, but I'm still experiencing the same > problem. > > When I run the command without the `--mca plm_base_verbose 100` flag, it > hangs indefinitely with no output. > > As far as I can tell, these are the additional processes running on each > machine while mpirun is hanging (printed using `ps -aux | less`): > > On executing host b09-30: > > user 361714 0.4 0.0 293016 8444 pts/0 Sl+ 15:10 0:00 mpirun > --host b09-30,b09-32 hostname > user 361719 0.0 0.0 37092 5112 pts/0 T 15:10 0:00 > /usr/bin/ssh -x b09-32 orted -mca ess "env" -mca ess_base_jobid "638517248" > -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca orte_node_regex > "b[2:9]-30,b[2:9]-32@0(2)" -mca orte_hnp_uri > "638517248.0;tcp://169.228.66.102,10.1.100.30:55090" -mca plm "rsh" -mca > pmix "^s1,s2,cray,isolated" > > On remote host b09-32: > > root 175273 0.0 0.0 61752 5824 ? Ss 15:10 0:00 sshd: > [accepted] > sshd 175274 0.0 0.0 61752 708 ? S 15:10 0:00 sshd: > [net] > > I only see orted showing up in the ssh flags on b09-30. Any ideas what I > should try next? > > Thanks, > Max > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users