What happens if you try to mpirun a non-MPI program like, "date" or "hostname"?


On Feb 11, 2011, at 6:14 AM, Marcela Castro León wrote:

> Excuse me. I forgot the attaching.
> 
> 2011/2/11 Marcela Castro León <mcast...@gmail.com>
> Hello:
> 
> I've the same version ob Ubuntu 10.04. The original version was Ubuntu Server 
> 9.1 (64) and upgraded both of them to 10.04. 
> Yesterday I've updated and upgraded to the same level again. But I've got the 
> same error after that.
> The machine are exactly the same, HP Compaq with inter Core I5.
> 
> Anyway I've compared the version of openmpi and gcc, and are the same too: 
> 1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the 
> two system.
> 
> I would appreciate a lot any help to solve it.
> Thank you.
> 
> Marcela.
> 2011/2/10 Jeff Squyres <jsquy...@cisco.com>
> 
> I typically see these kinds of errors when there's an Open MPI version 
> mismatch between the nodes, and/or if there are slightly different flavors of 
> Linux installed on each node (i.e., you're technically in a heterogeneous 
> situation, but you're trying to run a single application binary).  Can you 
> verify:
> 
> 1. that you have exactly the same version of Open MPI installed on all nodes? 
>  (and that your application was compiled against that exact version)
> 
> 2. that you have exactly the same OS/update level installed on all nodes 
> (e.g., same versions of glibc, etc.)
> 
> 
> On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote:
> 
> > Hello
> > I've a program that allways works fine, but i'm trying it on a new cluster 
> > and fails when I execute it on more than one machine.
> > I mean, if I execute alone on each host, everything works fine.
> > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt
> >
> > But when I execute
> > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile 
> > /home/radic/mfile ../test parcorto.txt
> >
> > I get this error:
> >
> > mpirun has exited due to process rank 0 with PID 2132 on
> > node santacruz exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> > --------------------------------------------------------------------------
> >
> > Though the machinefile (mfile) had only one machine, the programs fails.
> > This is the current content:
> >
> > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > santacruz
> > chubut
> >
> > I've debug the program and the error occurs after proc0 do an
> > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat);
> > from the remote process.
> >
> > I've done several test I'll mention:
> >
> > 1) Change the order on machinefile
> > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile
> > chubut
> > santacruz
> >
> > In that case, I get this error:
> > [chubut:2194] *** An error occurred in MPI_Recv
> > [chubut:2194] *** on communicator MPI_COMM_WORLD
> > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated
> > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > and then
> > --------------------------------------------------------------------------
> > mpirun has exited due to process rank 0 with PID 2194 on
> > node chubut exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> > --------------------------------------------------------------------------
> >
> > 2) I've got the same error executing on host chubut intead of santacruz,
> > 3) a simple mpi programs like  MPI_Hello world are working fine, but I 
> > suppose that are very simple program.
> >
> > radic@santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile 
> > MPI_Hello
> > Hola Mundo Hola Marce 1
> > Hola Mundo Hola Marce 0
> > Hola Mundo Hola Marce 2
> >
> >
> > This is the information you ask for tuntime problem.
> > a) radic@santacruz:~$ mpirun -version
> > mpirun (Open MPI) 1.4.1
> > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get 
> > install, so, I don't have a config.log
> > c) The ompi_info --all is on the file ompi_info.zip
> > d) These are PATH and LD_LIBRARY_PATH
> > radic@santacruz:~$ echo $PATH
> > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> > radic@santacruz:~$ echo $LD_LIBRARY_PATH
> >
> >
> > Thank you very much.
> >
> > Marcela.
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> <scgcc><scompi><chgcc><chompi>


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to