Re: [OMPI users] orted: command not found
Hi Jose Sorry for entering the discussion late. From tracing the email thread, I somewhat gather the following: 1. you have installed Open MPI 1.1.2 on two 686 boxes 2. you created a hostfile on one of the nodes and execute mpirun from that node. You gave us a prefix indicating where we should find the Open MPI executables on each node 3. you were getting an error message indicating that mpirun was unable to find your executable 4. you didn't encounter this problem when running on a cluster If I have those facts correct, then the problem is simple. My guess is that the cluster you were using has a shared file system - hence, the remote nodes "see" your executable in the same relative location across the cluster. In your simple setup with the two boxes, it sounds like you don't have a shared file system. When mpirun attempts to locate the executable on bernie-3, it won't find the file since it doesn't exist on that node's file system. Once you copied the file over to bernie-3, then mpirun could find it so everything works fine. We hope to add file pre-positioning at some point in the future for systems such as yours. However, that is some time away due to priorities. For now, Open MPI requires that your executable (and the Open MPI executables and libraries) be available on each node you are trying to use. Hope that helps to explain the problem. Ralph On 1/2/07 2:03 PM, "jcolmena...@ula.ve"wrote: > I had configured the hostfile located at > ~prefix/etc/openmpi-default-hostfile. > > I copied the file to bernie-3, and it worked... > > Now, at the cluster I was working at the Universidad de Los Andes > (Venezuela) -I decided to install mpi on three machines I was able to put > together as a personal proyect- all I had to do was to compile and run my > applications, that is, I never copied any file to any other machine... > now, I had to. I'm sorry if it was obvious and made you guys loose some > time, but why on a cluster I didn't have to copy any files, and now I must > do so? > > Thanks for you patiance! > > Jose > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] orted: command not found
I had configured the hostfile located at ~prefix/etc/openmpi-default-hostfile. I copied the file to bernie-3, and it worked... Now, at the cluster I was working at the Universidad de Los Andes (Venezuela) -I decided to install mpi on three machines I was able to put together as a personal proyect- all I had to do was to compile and run my applications, that is, I never copied any file to any other machine... now, I had to. I'm sorry if it was obvious and made you guys loose some time, but why on a cluster I didn't have to copy any files, and now I must do so? Thanks for you patiance! Jose
Re: [OMPI users] orted: command not found
On 1/2/07, Gurhan Ozenwrote: On 1/2/07, jcolmena...@ula.ve wrote: > > First you should make sure that PATH and LD_LIBRARY_PATH are defined > > in the section of your .bashrc file that get parsed for non > > interactive sessions. Run "mpirun -np 1 printenv" and check if PATH > > and LD_LIBRARY_PATH have the values you expect. > > in fact they do: > > bernie@bernie-1:~/proyecto$ mpirun -np 1 printenv > SHELL=/bin/bash > SSH_CLIENT=192.168.1.142 4109 22 > USER=bernie > LD_LIBRARY_PATH=/usr/local/openmpi/lib:/usr/local/openmpi/lib: > MAIL=/var/mail/bernie > PATH=/usr/local/openmpi/bin:/usr/local/openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/games > PWD=/home/bernie > LANG=en_US.UTF-8 > HISTCONTROL=ignoredups > SHLVL=1 > HOME=/home/bernie > MPI_DIR=/usr/local/openmpi > LOGNAME=bernie > SSH_CONNECTION=192.168.1.142 4109 192.168.1.113 22 > LESSOPEN=| /usr/bin/lesspipe %s > LESSCLOSE=/usr/bin/lesspipe %s %s > _=/usr/local/openmpi/bin/orted > OMPI_MCA_universe=bernie@bernie-1:default-universe > OMPI_MCA_ns_nds=env > OMPI_MCA_ns_nds_vpid_start=0 > OMPI_MCA_ns_nds_num_procs=1 > OMPI_MCA_mpi_paffinity_processor=0 > OMPI_MCA_ns_replica_uri=0.0.0;tcp://192.168.1.142:4775 > OMPI_MCA_gpr_replica_uri=0.0.0;tcp://192.168.1.142:4775 > OMPI_MCA_orte_base_nodename=192.168.1.113 > OMPI_MCA_ns_nds_cellid=0 > OMPI_MCA_ns_nds_jobid=1 > OMPI_MCA_ns_nds_vpid=0 > > > > For your second question you should give the path to your prueba.bin > > executable. I'll do something like "mpirun --prefix /usr/local/ > > openmpi -np 2 ./prueba.bin". The reason is that usually "." is not in > > the PATH. > > > > bernie@bernie-1:~/proyecto$ mpirun --prefix /usr/local/openmpi -np 2 > ./prueba.bin > -- > Failed to find or execute the following executable: > > Host: bernie-3 > Executable: ./prueba.bin > > Cannot continue. > -- > > and the file IS there: > > bernie@bernie-1:~/proyecto$ ls prueba* > prueba.bin prueba.f90 prueba.f90~ > Wait a minute.. you are running mpirun from bernie-1 without proving any hostfile or hostnames .. So both processes should be running on bernie-1 host, yet the error says it can't find the executable on bernie-3. Why is this? Make sure that the file exists on bernie-3 and is executable. gurhan > > I must be missing something pretty silly, but have been looking around for > days to no avail! > What are the permissions on the file? Is it an executable file? gurhan > Jose > > thanks > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] orted: command not found
it is executable bernie@bernie-1:~/proyecto$ ls -l prueba.bin -rwxr-xr-x 1 bernie bernie 9619 2007-01-02 12:18 prueba.bin
Re: [OMPI users] orted: command not found
On 1/2/07, jcolmena...@ula.vewrote: > First you should make sure that PATH and LD_LIBRARY_PATH are defined > in the section of your .bashrc file that get parsed for non > interactive sessions. Run "mpirun -np 1 printenv" and check if PATH > and LD_LIBRARY_PATH have the values you expect. in fact they do: bernie@bernie-1:~/proyecto$ mpirun -np 1 printenv SHELL=/bin/bash SSH_CLIENT=192.168.1.142 4109 22 USER=bernie LD_LIBRARY_PATH=/usr/local/openmpi/lib:/usr/local/openmpi/lib: MAIL=/var/mail/bernie PATH=/usr/local/openmpi/bin:/usr/local/openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/games PWD=/home/bernie LANG=en_US.UTF-8 HISTCONTROL=ignoredups SHLVL=1 HOME=/home/bernie MPI_DIR=/usr/local/openmpi LOGNAME=bernie SSH_CONNECTION=192.168.1.142 4109 192.168.1.113 22 LESSOPEN=| /usr/bin/lesspipe %s LESSCLOSE=/usr/bin/lesspipe %s %s _=/usr/local/openmpi/bin/orted OMPI_MCA_universe=bernie@bernie-1:default-universe OMPI_MCA_ns_nds=env OMPI_MCA_ns_nds_vpid_start=0 OMPI_MCA_ns_nds_num_procs=1 OMPI_MCA_mpi_paffinity_processor=0 OMPI_MCA_ns_replica_uri=0.0.0;tcp://192.168.1.142:4775 OMPI_MCA_gpr_replica_uri=0.0.0;tcp://192.168.1.142:4775 OMPI_MCA_orte_base_nodename=192.168.1.113 OMPI_MCA_ns_nds_cellid=0 OMPI_MCA_ns_nds_jobid=1 OMPI_MCA_ns_nds_vpid=0 > For your second question you should give the path to your prueba.bin > executable. I'll do something like "mpirun --prefix /usr/local/ > openmpi -np 2 ./prueba.bin". The reason is that usually "." is not in > the PATH. > bernie@bernie-1:~/proyecto$ mpirun --prefix /usr/local/openmpi -np 2 ./prueba.bin -- Failed to find or execute the following executable: Host: bernie-3 Executable: ./prueba.bin Cannot continue. -- and the file IS there: bernie@bernie-1:~/proyecto$ ls prueba* prueba.bin prueba.f90 prueba.f90~ I must be missing something pretty silly, but have been looking around for days to no avail! What are the permissions on the file? Is it an executable file? gurhan Jose thanks ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] orted: command not found
I installed openmpi 1.1.2 on two 686 boxes runing ubuntu 6.10. Followed the instructions given in the FAQ. Nevertheless, I get the following message: [bernie-1:05053] ERROR: A daemon on node 192.168.1.113 failed to start as expected. [bernie-1:05053] ERROR: There may be more information available from [bernie-1:05053] ERROR: the remote shell (see above). [bernie-1:05053] ERROR: The daemon exited unexpectedly with status 127. now, I've been browsing the web, including the mailing lists, and it appears that the error should be that I have not declared the variables export PATH="/usr/local/openmpi/bin:${PATH}" export LD_LIBRARY_PATH="/usr/local/openmpi/lib:${LD_LIBRARY_PATH}" at the node, wich I have. I have even created all the posible folders proposed at the FAQ for remote loggins, although I'm using bash. If I do a ssh user@remote_node, I can connect without being asked for a password, and if I type mpif90, I get: "gfortran: no input files", wich should mean that indeed the PATH and LD_LIBRARY_PATH are being updated on the remote logging. But, if I do: bash$ mpirun --prefix /usr/local/openmpi -np 2 prueba.bin the result is: -- Failed to find the following executable: Host: bernie-3 Executable: prueba.bin Cannot continue. -- mpirun noticed that job rank 0 with PID 0 on node "192.168.1.113" exited on signal 4. I've been looking around, but have not been able to find what does the signal 4 means. Just in case, I was running an example program wich runs fine at my university cluster. Nevertheless, decided to run an even simpler one, wich I include, for it may be that the error is there (I definitly hope not!...) program test use mpi implicit none integer :: myid,sizze,ierr call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,sizze,ierr) call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr) print *,"I'm using ",sizze," processors" print *,"of wich I'm the number ",myid call MPI_FINALIZE(ierr) end program test This is the first time I have installed -and use- any parallel programing program or library, and I'm doing it as a personal proyect for a graduate curse, so any help will be greatly appreciated! Best regards Jose Colmenares