Tim, thanks for your suggestions. There seems to be something wrong with the PATH: jody@aim-nano_02 ~/progs $ ssh 130.60.49.128 printenv | grep PATH PATH=/usr/bin:/bin:/usr/sbin:/sbin
which i don't understand. Logging via ssh into 130.60.49.128 i get: jody@aim-nano_02 ~/progs $ ssh 130.60.49.128 Last login: Mon Jul 9 18:26:11 2007 from 130.60.49.129 jody@aim-nano_00 ~ $ cat .bash_profile # /etc/skel/.bash_profile # This file is sourced by bash for login shells. The following line # runs your .bashrc and is recommended by the bash info pages. [[ -f ~/.bashrc ]] && . ~/.bashrc PATH=/opt/openmpi/bin:$PATH export PATH LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH jody@aim-nano_00 ~ $ echo $PATH /opt/openmpi/bin:/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.5:/opt/sun- jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10 /jre/javaws:/usr/qt/3/bin (aim-nano_00 is the name of 130.60.49.128) So why is the path set when i ssh by hand, but not otherwise? The suggestion with the --prefix option also didn't work: jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi --hostfile hostfile ./a.out [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file dss/dss_peek.c at line 59 (after which the thing seems to hang....) If i use the aim-nano_02 (130.60.49.130) instead of a hostfile, jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi --host 130.60.49.130 ./a.out it works, as it does if i run it on the machine itself the standard way jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --host 130.60.49.130./a.out Is there anything else i could try? Jody On 7/9/07, Tim Prins <tpr...@open-mpi.org> wrote:
jody wrote: > Hi Tim > (I accidentally sent the previous message before it was ready - here's > the complete one) > Thank You for your reply. > Unfortunately my workstation, on which i could successfully run openmpi > applications, has died. But one my replacement machine (which > i assume i have setup in an equivalent way) i now get errors even when i
try
> to run an openmpi application in a simple way: > > jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --hostfile hostfile
./a.out
> bash: orted: command not found > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.129 failed to > start as expected. > [aim-nano_02:22145] ERROR: There may be more information available from > [aim-nano_02:22145] ERROR: the remote shell (see above). > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
127.
> [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.128 failed to > start as expected. > [aim-nano_02:22145] ERROR: There may be more information available from > [aim-nano_02:22145] ERROR: the remote shell (see above). > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status
127.
> > However, i set PATH and LD_LIBRARY_PATH to the correct paths both in > .bashrc AND .bash_profile. I assume you are using bash. You might try changing your .profile as well. > > For example: > jody@aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 echo $PATH >
/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.1.2:/opt/sun- jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10 /jre/javaws:/usr/qt/3/bin
When you do this, $PATH gets interpreted on the local host, not the remote host. Try instead: ssh 130.60.49.128 printenv |grep PATH > > But: > jody@aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 orted > bash: orted: command not found > You could also do: ssh 130.60.49.128 which orted This will show you the paths it looked in for the orted. > Do You have any suggestions? To avoid dealing with paths (assuming everything is installed in the same directory on all nodes) you can also try the suggestion here (although I think that once it is setup modifying PATHs is the easier way to go, less typing :): http://www.open-mpi.org/faq/?category=running#mpirun-prefix Hope this helps, Tim > > Thank You > Jody > > On 7/9/07, Tim Prins <tpr...@open-mpi.org> wrote: >> Hi Jody, >> >> Sorry for the super long delay. I don't know how this one got lost... >> >> I run like this all the time. Unfortunately, it is not as simple as I >> would like. Here is what I do: >> >> 1. Log into the machine using ssh -X >> 2. Run mpirun with the following parameters: >> -mca pls rsh (This makes sure that Open MPI uses the rsh/ssh
launcher.
>> It may not be necessary depending on your setup) >> -mca pls_rsh_agent "ssh -X" (To make sure X information is
forwarded.
>> This might not be necessary if you have ssh setup to always forward X >> information) >> --debug-daemons (This ensures that the ssh connections to the
backed
>> nodes are kept open. Otherwise, they are closed and X information
cannot
>> be forwarded. Unfortunately, this will also cause some debugging output >> to be printed, but right now there is no other way :( ) >> >> So, the complete command is: >> mpirun -np 4 -mca pls rsh -mca pls_rsh_agent "ssh -X" --debug-daemons >> xterm -e gdb my_prog >> >> I hope this helps. Let me know if you are still experiencing problems. >> >> Tim >> >> >> jody wrote: >>> Hi >>> For debugging i usually run each process in a separate X-window. >>> This works well if i set the DISPLAY variable to the computer >>> from which i am starting my OpenMPI application. >>> >>> This method fails however, if i log in (via ssh) to my workstation >>> from a third computer and then start my OpenMPI application, >>> only the processes running on the workstation i logged into can >>> open their windows on the third computers. The processes on >>> the other computers cant open their windows. >>> >>> This is how i start the processes >>> >>> mpirun -np 4 -x DISPLAY run_gdb.sh ./TestApp >>> >>> where run_gdb.sh looks like this >>> ------------------------- >>> #!/bin/csh -f >>> >>> echo "Running GDB on node `hostname`" >>> xterm -e gdb $* >>> exit 0 >>> ------------------------- >>> The output from the processes on the other computer: >>> xterm Xt error: Can't open display: localhost:12.0 >>> >>> I there a way to tell OpenMPI to forward the X windows >>> over yet another ssh connection? >>> >>> Thanks >>> Jody >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users