Hi again, Yes the error output is the same: root@sun:~# mpirun --hostfile hostfile main [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [sun:23748] ERROR: A daemon on node saturn failed to start as expected. [sun:23748] ERROR: There may be more information available from [sun:23748] ERROR: the remote shell (see above). [sun:23748] ERROR: The daemon exited unexpectedly with status 255. [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196 -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
-------------------------------------------------------------------------- I wrote the following to my .ssh/environment (on all machines) LD_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib; PATH=$PATH:/usr/local/lib; export LD_LIBRARY_PATH; export PATH; and added the statement you told me to the ssd_config (on all machines): PermitUserEnvironment yes And it seems to me that the pathes are correct now. My shell is bash (/bin/bash) When running locate orted (to find out where exactly my openmpi installation is (compilation defaults) i saw that, on sun there was a /usr/bin/orted while there wasn't one on saturn. I deleted /usr/bin/orted on sun and tried again with the option --prefix /usr/local/ (which seems to be my installation directory) but it didn't work (same error). Is there a script or anything like that with which I can uninstall openmpi, because i'll might try a new compilation to /opt/openmpi since it doesn't look like I would be able to solve the problem. jody schrieb: > Now that the PATHs seem to be set correctly for > ssh i don't know what the problem could be. > > Is the error message still the same on as in the first mail? > Did you do the envorpnment/sshd_config on both machines? > What shell are you using? > > On other test you could make is to start your application > with the --prefix option: > > $mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main > > (assuming your Open MPI installation lies in /opt/openmpi > on both machines) > > > Jody > > On 10/1/07, Dino Rossegger <dino.rosseg...@gmx.at> wrote: >> Hi Jodi, >> did the steps as you said, but it didn't work for me. >> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and >> made the changes to sshd_config. >> >> But this all didn't solve my problem, although the pahts seemed to be >> set correctly (judging what ssh saturn `printenv >> test` says). I also >> restarted the ssh server, the error is the same. >> >> Hope you can help me out here and thanks for your help so far >> dino >> >> jody schrieb: >>> Dino - >>> I had a similar problem. >>> I was only able to solve it by setting PATH and LS_LIBRARY_PATH >>> in the file ~/ssh/environment on the client and setting >>> PermitUserEnvironment yes >>> in /etc/ssh/sshd_config on the server (for this you need root >>> prioviledge though) >>> >>> To be on the safe side, i did both on all my nodes >>> >>> Jody >>> >>> On 9/27/07, Dino Rossegger <dino.rosseg...@gmx.at> wrote: >>>> Hi Jody, >>>> >>>> Thanks for your help, it really is the case that either in PATH nor in >>>> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out, >>>> hope it works. >>>> >>>> jody schrieb: >>>>> Hi Dino >>>>> >>>>> Try >>>>> ssh saturn printenv | grep PATH >>>>> >from your host sun to see what your environment variables are when >>>>> ssh is run without a shell. >>>>> >>>>> >>>>> On 9/27/07, Dino Rossegger <dino.rosseg...@gmx.at> wrote: >>>>>> Hi, >>>>>> >>>>>> I have a problem running a simple programm mpihello.cpp. >>>>>> >>>>>> Here is a excerp of the error and the command >>>>>> root@sun:~# mpirun -H sun,saturn main >>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>> base/pls_base_orted_cmds.c at line 275 >>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at >>>>>> line 1164 >>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line >>>>>> 90 >>>>>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected. >>>>>> [sun:25213] ERROR: There may be more information available from >>>>>> [sun:25213] ERROR: the remote shell (see above). >>>>>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255. >>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>> base/pls_base_orted_cmds.c at line 188 >>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at >>>>>> line 1196 >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun was unable to cleanly terminate the daemons for this job. >>>>>> Returned value Timeout instead of ORTE_SUCCESS. >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> The program is runable from each node alone (mpirun -np2 main) >>>>>> >>>>>> My PathVariables: >>>>>> $PATH >>>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho >>>>>> $LD_LIBRARY_PATH >>>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib >>>>>> >>>>>> Passwordless ssh is up 'n running >>>>>> >>>>>> I walked through the FAQ and Mailing Lists but couldn't find any >>>>>> solution for my problem. >>>>>> >>>>>> Thanks >>>>>> Dino R. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >