Hello Terry,
Yes, "edu" is the user and 10.4.5.126 is the IP address. Because both
computers have different usernames, I think that I should write the username
otherwise it does not work. In fact, on the computer 10.4.5.123 I write:
mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out
and on the computer 10.4.5.126 I write:
mpirun -np 2 --host sofia@10.4.5.123,10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out
If I try only with the IP and I write on the computer 10.4.5.123:
mpirun -np 2 --host 10.4.5.123,10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out
then the computer ask me the password of sofia@10.4.5.126 and then I type
the password and does not work because the user is "edu" not "sofia".
I am trying to install dbx, if I can attach a debugger I will tell you.
Thank you very much.
Sofia
Hello Terry,
Thank you very much for your help.
> Sofia,
>
> I took your program and actually ran it successfully on my systems
> using Open MPI r19400. A couple questions:
>
> 1. Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local
> ./PruebaSumaParalela.out
>
Yes. In this case, the program works perfectly.
> 2. Can you try and run the code the following way and is the output
> different?
> mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca
> mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>
The program also hangs but the output is different. In both computers I
get the following:
Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv
Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the
> processes and print out a stack?
>
I do not know how to do that.
With Solaris, I usually do the following:
% dbx - <pid of process>
dbx> where
<stack prints out>
> 4. What version of Open MPI are you using, on what type of machine,
> using which OS?
>
Openmpi-1.2.2 in both computers
In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC
2008 i686 GNU/Linux
In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC
2007 i686 GNU/Linux
Sorry for the bonehead question but is edu@10.4.5.126 the actual machine
name? Is its IP address really 10.4.5.126? Can you try that instead? I
would guess the issue is that the tcp btl is somehow not matching the two
nodes as being connected to each other.
No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/