Hello Terry,

Yes, "edu" is the user and 10.4.5.126 is the IP address. Because both computers have different usernames, I think that I should write the username otherwise it does not work. In fact, on the computer 10.4.5.123 I write:

mpirun -np 2 --host 10.4.5.123,[email protected] --prefix /usr/local ./PruebaSumaParalela.out

and on the computer 10.4.5.126 I write:

mpirun -np 2 --host [email protected],10.4.5.126 --prefix /usr/local ./PruebaSumaParalela.out

If I try only with the IP and  I write on the computer 10.4.5.123:

mpirun -np 2 --host 10.4.5.123,10.4.5.126 --prefix /usr/local ./PruebaSumaParalela.out

then the computer ask me the password of [email protected] and then I type the password and does not work because the user is "edu" not "sofia".

I am trying to install dbx, if I can attach a debugger I will tell you.

Thank you very much.

Sofia


Hello Terry,

Thank you very much for your help.


> Sofia,
>
> I took your program and actually ran it successfully on my systems > using Open MPI r19400. A couple questions:
>
> 1.  Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./PruebaSumaParalela.out
>


Yes. In this case, the program works perfectly.


> 2. Can you try and run the code the following way and is the output > different? > mpirun -np 2 --host 10.4.5.123,[email protected] --mca > mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>


The program also hangs but the output is different. In both computers I get the following:

Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv


Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the > processes and print out a stack?
>


I do not know how to do that.


With Solaris, I usually do the following:
% dbx - <pid of process>
dbx>  where
<stack prints out>

> 4. What version of Open MPI are you using, on what type of machine, > using which OS?
>


Openmpi-1.2.2 in both computers

In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008 i686 GNU/Linux

In [email protected] I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux


Sorry for the bonehead question but is [email protected] the actual machine name? Is its IP address really 10.4.5.126? Can you try that instead? I would guess the issue is that the tcp btl is somehow not matching the two nodes as being connected to each other.




No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/

Reply via email to