Hello Terry,

Yes, "edu" is the user and 10.4.5.126 is the IP address. Because both computers have different usernames, I think that I should write the username otherwise it does not work. In fact, on the computer 10.4.5.123 I write:

mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --prefix /usr/local ./PruebaSumaParalela.out

and on the computer 10.4.5.126 I write:

mpirun -np 2 --host sofia@10.4.5.123,10.4.5.126 --prefix /usr/local ./PruebaSumaParalela.out

If I try only with the IP and  I write on the computer 10.4.5.123:

mpirun -np 2 --host 10.4.5.123,10.4.5.126 --prefix /usr/local ./PruebaSumaParalela.out

then the computer ask me the password of sofia@10.4.5.126 and then I type the password and does not work because the user is "edu" not "sofia".

I am trying to install dbx, if I can attach a debugger I will tell you.

Thank you very much.

Sofia


Hello Terry,

Thank you very much for your help.


> Sofia,
>
> I took your program and actually ran it successfully on my systems > using Open MPI r19400. A couple questions:
>
> 1.  Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./PruebaSumaParalela.out
>


Yes. In this case, the program works perfectly.


> 2. Can you try and run the code the following way and is the output > different? > mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca > mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>


The program also hangs but the output is different. In both computers I get the following:

Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv


Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the > processes and print out a stack?
>


I do not know how to do that.


With Solaris, I usually do the following:
% dbx - <pid of process>
dbx>  where
<stack prints out>

> 4. What version of Open MPI are you using, on what type of machine, > using which OS?
>


Openmpi-1.2.2 in both computers

In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008 i686 GNU/Linux

In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux


Sorry for the bonehead question but is edu@10.4.5.126 the actual machine name? Is its IP address really 10.4.5.126? Can you try that instead? I would guess the issue is that the tcp btl is somehow not matching the two nodes as being connected to each other.




No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/

Reply via email to