On May 15, 2018, at 1:39 AM, Max Mellette wrote:
>
> Thanks everyone for all your assistance. The problem seems to be resolved
> now, although I'm not entirely sure why these changes made a difference.
> There were two things I changed:
>
> (1) I had some additional `export ...` lines in .bash
Hi Max
Name resolution in /etc/hosts is a simple solution for (2).
I hope this helps,
Gus
> On May 15, 2018, at 01:39, Max Mellette wrote:
>
> Thanks everyone for all your assistance. The problem seems to be resolved
> now, although I'm not entirely sure why these changes made a difference.
Thanks everyone for all your assistance. The problem seems to be resolved
now, although I'm not entirely sure why these changes made a difference.
There were two things I changed:
(1) I had some additional `export ...` lines in .bashrc before the `export
PATH=...` and `export LD_LIBRARY_PATH=...`
In the initial report, the /usr/bin/ssh process was in the 'T' state
(it generally hints the process is attached by a debugger)
/usr/bin/ssh -x b09-32 orted
did behave as expected (e.g. orted was executed, exited with an error
since the command line is invalid, and error message was received)
Yes, that "T" state is quite puzzling. You didn't attach a debugger or hit the
ssh with a signal, did you?
(we had a similar situation on the devel list recently, but it only happened
with a very old version of Slurm. We concluded that it was a SLURM bug that
has since been fixed. And just t
You got that error because the orted is looking for its rank on the cmd line
and not finding it.
> On May 14, 2018, at 12:37 PM, Max Mellette wrote:
>
> Hi Gus,
>
> Thanks for the suggestions. The correct version of openmpi seems to be
> getting picked up; I also prepended .bashrc with the i
Hi Gus,
Thanks for the suggestions. The correct version of openmpi seems to be
getting picked up; I also prepended .bashrc with the installation path like
you suggested, but it didn't seemed to help:
user@b09-30:~$ cat .bashrc
export
PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/
Hi Max
Just in case, as environment mix often happens.
Could it be that you are inadvertently picking another
installation of OpenMPI, perhaps installed from packages
in /usr , or /usr/local?
That's easy to check with 'which mpiexec' or
'which mpicc', for instance.
Have you tried to prepend (as
John,
Thanks for the suggestions. In this case there is no cluster manager / job
scheduler; these are just a couple of individual hosts in a rack. The
reason for the generic names is that I anonymized the full network address
in the previous posts, truncating to just the host name.
My home direct
One very, very stupid question here. This arose over on the Slurm list
actually.
Those hostnames look like quite generic names, ie they are part of an HPC
cluster?
Do they happen to have independednt home directories for your userid?
Could that possibly make a difference to the MPI launcher?
On 14
Hi Gilles,
Thanks for the suggestions; the results are below. Any ideas where to go
from here?
- Seems that selinux is not installed:
user@b09-30:~$ sestatus
The program 'sestatus' is currently not installed. You can install it by
typing:
sudo apt install policycoreutils
- Output from o
Max,
the 'T' state of the ssh process is very puzzling.
can you try to run
/usr/bin/ssh -x b09-32 orted
on b09-30 and see what happens ?
(it should fail with an error message, instead of hanging)
In order to check there is no firewall, can you run instead
iptables -L
Also, is 'selinux' enabled ?
Hi Jeff,
Thanks for the reply. FYI since I originally posted this, I uninstalled
OpenMPI 3.0.1 and installed 3.1.0, but I'm still experiencing the same
problem.
When I run the command without the `--mca plm_base_verbose 100` flag, it
hangs indefinitely with no output.
As far as I can tell, these
On May 4, 2018, at 1:08 PM, Max Mellette wrote:
>
> I'm trying to set up OpenMPI 3.0.1 on a pair of linux machines, but I'm
> running into a problem where mpirun hangs when I try to execute a simple
> command across the two machines:
>
> $ mpirun --host b09-30,b09-32 hostname
Do you see the o
14 matches
Mail list logo