Hi Max

Just in case, as environment mix often happens.
Could it be that you are inadvertently picking another
installation of OpenMPI, perhaps installed from packages
in /usr , or /usr/local?
That's easy to check with 'which mpiexec' or
'which mpicc', for instance.

Have you tried to prepend (as opposed to append) OpenMPI
to your PATH? Say:

export PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'

I hope this helps,
Gus Correa

On 05/14/2018 12:40 PM, Max Mellette wrote:
John,

Thanks for the suggestions. In this case there is no cluster manager / job scheduler; these are just a couple of individual hosts in a rack. The reason for the generic names is that I anonymized the full network address in the previous posts, truncating to just the host name.

My home directory is network-mounted to both hosts. In fact, I uninstalled OpenMPI 3.0.1 from /usr/local on both hosts, and installed OpenMPI 3.1.0 into my home directory at `/home/user/openmpi_install`, also updating .bashrc appropriately:

user@b09-30:~$ cat .bashrc
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/user/openmpi_install/bin
export LD_LIBRARY_PATH=/home/user/openmpi_install/lib

So the environment should be the same on both hosts.

Thanks,
Max

On Mon, May 14, 2018 at 12:29 AM, John Hearns via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:

    One very, very stupid question here. This arose over on the Slurm
    list actually.
    Those hostnames look like quite generic names, ie they are part of
    an HPC cluster?
    Do they happen to have independednt home directories for your userid?
    Could that possibly make a difference to the MPI launcher?

    On 14 May 2018 at 06:44, Max Mellette <wmell...@ucsd.edu
    <mailto:wmell...@ucsd.edu>> wrote:

        Hi Gilles,

        Thanks for the suggestions; the results are below. Any ideas
        where to go from here?

        ----- Seems that selinux is not installed:

        user@b09-30:~$ sestatus
        The program 'sestatus' is currently not installed. You can
        install it by typing:
        sudo apt install policycoreutils

        ----- Output from orted:

        user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
        [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
        file ess_env_module.c at line 147
        [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad
        parameter in file util/session_dir.c at line 106
        [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad
        parameter in file util/session_dir.c at line 345
        [b09-32:197698] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad
        parameter in file base/ess_base_std_orted.c at line 270
        
--------------------------------------------------------------------------
        It looks like orte_init failed for some reason; your parallel
        process is
        likely to abort.  There are many reasons that a parallel process can
        fail during orte_init; some of which are due to configuration or
        environment problems.  This failure appears to be an internal
        failure;
        here's some additional information (which may only be relevant to an
        Open MPI developer):

           orte_session_dir failed
           --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
        
--------------------------------------------------------------------------

        ----- iptables rules:

        user@b09-30:~$ sudo iptables -L
        Chain INPUT (policy ACCEPT)
        target     prot opt source               destination
        ufw-before-logging-input  all  --  anywhere             anywhere
        ufw-before-input  all  --  anywhere             anywhere
        ufw-after-input  all  --  anywhere             anywhere
        ufw-after-logging-input  all  --  anywhere             anywhere
        ufw-reject-input  all  --  anywhere             anywhere
        ufw-track-input  all  --  anywhere             anywhere

        Chain FORWARD (policy ACCEPT)
        target     prot opt source               destination
        ufw-before-logging-forward  all  --  anywhere             anywhere
        ufw-before-forward  all  --  anywhere             anywhere
        ufw-after-forward  all  --  anywhere             anywhere
        ufw-after-logging-forward  all  --  anywhere             anywhere
        ufw-reject-forward  all  --  anywhere             anywhere
        ufw-track-forward  all  --  anywhere             anywhere

        Chain OUTPUT (policy ACCEPT)
        target     prot opt source               destination
        ufw-before-logging-output  all  --  anywhere             anywhere
        ufw-before-output  all  --  anywhere             anywhere
        ufw-after-output  all  --  anywhere             anywhere
        ufw-after-logging-output  all  --  anywhere             anywhere
        ufw-reject-output  all  --  anywhere             anywhere
        ufw-track-output  all  --  anywhere             anywhere

        Chain ufw-after-forward (1 references)
        target     prot opt source               destination

        Chain ufw-after-input (1 references)
        target     prot opt source               destination

        Chain ufw-after-logging-forward (1 references)
        target     prot opt source               destination

        Chain ufw-after-logging-input (1 references)
        target     prot opt source               destination

        Chain ufw-after-logging-output (1 references)
        target     prot opt source               destination

        Chain ufw-after-output (1 references)
        target     prot opt source               destination

        Chain ufw-before-forward (1 references)
        target     prot opt source               destination

        Chain ufw-before-input (1 references)
        target     prot opt source               destination

        Chain ufw-before-logging-forward (1 references)
        target     prot opt source               destination

        Chain ufw-before-logging-input (1 references)
        target     prot opt source               destination

        Chain ufw-before-logging-output (1 references)
        target     prot opt source               destination

        Chain ufw-before-output (1 references)
        target     prot opt source               destination

        Chain ufw-reject-forward (1 references)
        target     prot opt source               destination

        Chain ufw-reject-input (1 references)
        target     prot opt source               destination

        Chain ufw-reject-output (1 references)
        target     prot opt source               destination

        Chain ufw-track-forward (1 references)
        target     prot opt source               destination

        Chain ufw-track-input (1 references)
        target     prot opt source               destination

        Chain ufw-track-output (1 references)
        target     prot opt source               destination


        Thanks,
        Max

        _______________________________________________
        users mailing list
        users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
        https://lists.open-mpi.org/mailman/listinfo/users
        <https://lists.open-mpi.org/mailman/listinfo/users>



    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://lists.open-mpi.org/mailman/listinfo/users
    <https://lists.open-mpi.org/mailman/listinfo/users>




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to