Jody,

jody wrote:
Hi TIm
thanks for the suggestions.

I now set both paths  in .zshenv but it seems that LD_LIBRARY_PATH
still does not get set.
The ldd experment shows that all openmpi libraries are not found,
and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is not.
Are you setting LD_LIBRARY_PATH anywhere else in your scripts? I have, on more than one occasion, forgotten that I needed to do:
export LD_LIBRARY_PATH="/foo:$LD_LIBRARY_PATH"

Instead of just:
export LD_LIBRARY_PATH="/foo"


It is rather unclear why this happens...

As to thew second problem:
$ mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2 [aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: connect to 130.60.49.134:40618 <http://130.60.49.134:40618> failed: (103) [aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: connect to 130.60.49.134:40618 <http://130.60.49.134:40618> failed, connecting over all interfaces failed!
[aim-nano_02:05455] OOB: Connection to HNP lost
[aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164 [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] ERROR: A daemon on node nano_02 failed to start as expected. [ aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] ERROR: There may be more information available from [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] ERROR: the remote shell (see above). [ aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] ERROR: The daemon exited unexpectedly with status 1. [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196

The strange thing is that nano_02's address is 130.60.49.130 <http://130.60.49.130> and plankton's (the caller) is 130.60.49 134. I also made sure that nano_02 cann ssh to plankton without password, but that didn't change the output.

What is happening here is that the daemon launched on nano_02 is trying to contact mpirun on plankton, and is failing for some reason.

Do you have any firewalls/port filtering enabled on nano_02? Open MPI generally cannot be run when there are any firewalls on the machines being used.

Hope this helps,

Tim


Does this message give any hints as to the problem?

Jody


On 8/14/07, *Tim Prins* <tpr...@open-mpi.org <mailto:tpr...@open-mpi.org>> wrote:

    Hi Jody,

    jody wrote:
     > Hi
     > I installed openmpi 1.2.2 on a quad core intel machine running
    fedora 6
     > (hostname plankton)
     > I set PATH and LD_LIBRARY in the .zshrc file:
    Note that .zshrc is only used for interactive logins. You need to setup
    your system so the LD_LIBRARY_PATH and PATH is also set for
    non-interactive logins. See this zsh FAQ entry for what files you need
    to modify:
    http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19
    <http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19>

    (BTW: I do not use zsh, but my assumption is that the file you want to
    set the PATH and LD_LIBRARY_PATH in is .zshenv)
     > $ echo $PATH
     >
    
/opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin

     >
     > $ echo $LD_LIBRARY_PATH
     > /opt/openmpi/lib:
     >
     > When i run
     > $ mpirun -np 2 ./MPITest2
     > i get the message
     > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
     > cannot open shared object file: No such file or directory
     > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
     > cannot open shared object file: No such file or directory
     >
     > However
     > $ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
     > works.  Any explanation?
    Yes, the LD_LIBRARY_PATH is probably not set correctly. Try running:
    mpirun -np 2 ldd ./MPITest2

    This should show what libraries your executable is using. Make sure all
    of the libraries are resolved.

    Also, try running:
    mpirun -np 1 printenv |grep LD_LIBRARY_PATH
    to see what the LD_LIBRARY_PATH is for you executables. Note that you
    can NOT simply run mpirun echo $LD_LIBRARY_PATH, as the variable
    will be
    interpreted in the executing shell.

     >
     > Second problem:
     > I have also  installed openmpi 1.2.2 on an AMD machine running gentoo
     > linux (hostname nano_02).
     > Here as well PATH and LD_LIBRARY_PATH are set correctly,
     > and
     > $ mpirun -np 2 ./MPITest2
     > works locally on nano_02.
     >
     > If, however, from plankton i call
     > $ mpirun -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2
     > the call hangs with no output whatsoever.
     > Any pointers on how to solve this problem?
    Try running:
    mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02
    ./MPI2Test2

    This should give some more output as to what is happening.

    Hope this helps,

    Tim

     >
     > Thank You
     >   Jody
     >
     >
     >
     >
    ------------------------------------------------------------------------
     >
     > _______________________________________________
     > users mailing list
     > us...@open-mpi.org <mailto:us...@open-mpi.org>
     > http://www.open-mpi.org/mailman/listinfo.cgi/users

    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users
    <http://www.open-mpi.org/mailman/listinfo.cgi/users>



------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to