Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-15 Thread Jeff Squyres (jsquyres)
On May 15, 2018, at 1:39 AM, Max Mellette wrote: > > Thanks everyone for all your assistance. The problem seems to be resolved > now, although I'm not entirely sure why these changes made a difference. > There were two things I changed: > > (1) I had some additional `export

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-15 Thread Gustavo Correa
Hi Max Name resolution in /etc/hosts is a simple solution for (2). I hope this helps, Gus > On May 15, 2018, at 01:39, Max Mellette wrote: > > Thanks everyone for all your assistance. The problem seems to be resolved > now, although I'm not entirely sure why these changes

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
Thanks everyone for all your assistance. The problem seems to be resolved now, although I'm not entirely sure why these changes made a difference. There were two things I changed: (1) I had some additional `export ...` lines in .bashrc before the `export PATH=...` and `export LD_LIBRARY_PATH=...`

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Gilles Gouaillardet
In the initial report, the /usr/bin/ssh process was in the 'T' state (it generally hints the process is attached by a debugger) /usr/bin/ssh -x b09-32 orted did behave as expected (e.g. orted was executed, exited with an error since the command line is invalid, and error message was received)

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Jeff Squyres (jsquyres)
Yes, that "T" state is quite puzzling. You didn't attach a debugger or hit the ssh with a signal, did you? (we had a similar situation on the devel list recently, but it only happened with a very old version of Slurm. We concluded that it was a SLURM bug that has since been fixed. And just

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread r...@open-mpi.org
You got that error because the orted is looking for its rank on the cmd line and not finding it. > On May 14, 2018, at 12:37 PM, Max Mellette wrote: > > Hi Gus, > > Thanks for the suggestions. The correct version of openmpi seems to be > getting picked up; I also

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
Hi Gus, Thanks for the suggestions. The correct version of openmpi seems to be getting picked up; I also prepended .bashrc with the installation path like you suggested, but it didn't seemed to help: user@b09-30:~$ cat .bashrc export

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Gus Correa
Hi Max Just in case, as environment mix often happens. Could it be that you are inadvertently picking another installation of OpenMPI, perhaps installed from packages in /usr , or /usr/local? That's easy to check with 'which mpiexec' or 'which mpicc', for instance. Have you tried to prepend (as

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread Max Mellette
John, Thanks for the suggestions. In this case there is no cluster manager / job scheduler; these are just a couple of individual hosts in a rack. The reason for the generic names is that I anonymized the full network address in the previous posts, truncating to just the host name. My home

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread John Hearns via users
One very, very stupid question here. This arose over on the Slurm list actually. Those hostnames look like quite generic names, ie they are part of an HPC cluster? Do they happen to have independednt home directories for your userid? Could that possibly make a difference to the MPI launcher? On

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-13 Thread Max Mellette
Hi Gilles, Thanks for the suggestions; the results are below. Any ideas where to go from here? - Seems that selinux is not installed: user@b09-30:~$ sestatus The program 'sestatus' is currently not installed. You can install it by typing: sudo apt install policycoreutils - Output from

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-12 Thread Gilles Gouaillardet
Max, the 'T' state of the ssh process is very puzzling. can you try to run /usr/bin/ssh -x b09-32 orted on b09-30 and see what happens ? (it should fail with an error message, instead of hanging) In order to check there is no firewall, can you run instead iptables -L Also, is 'selinux' enabled

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-11 Thread Max Mellette
Hi Jeff, Thanks for the reply. FYI since I originally posted this, I uninstalled OpenMPI 3.0.1 and installed 3.1.0, but I'm still experiencing the same problem. When I run the command without the `--mca plm_base_verbose 100` flag, it hangs indefinitely with no output. As far as I can tell,

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-11 Thread Jeff Squyres (jsquyres)
On May 4, 2018, at 1:08 PM, Max Mellette wrote: > > I'm trying to set up OpenMPI 3.0.1 on a pair of linux machines, but I'm > running into a problem where mpirun hangs when I try to execute a simple > command across the two machines: > > $ mpirun --host b09-30,b09-32