Date: Wed, 17 Sep 2008 16:23:59 +0200
From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID: <0625EEFB84E04647A1930A963A8DF7E3@aparicio1>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response
Hello Terry,
Thank you very much for your help.
> Sofia,
>
> I took your program and actually ran it successfully on my systems using
> Open MPI r19400. A couple questions:
>
> 1. Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local
> ./PruebaSumaParalela.out
>
Yes. In this case, the program works perfectly.
> 2. Can you try and run the code the following way and is the output
> different?
> mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca mpi_preconnect_all
> 1 --prefix /usr/local ./PruebaSumaParalela.out
>
The program also hangs but the output is different. In both computers I get
the following:
Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv
Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the
> processes and print out a stack?
>
I do not know how to do that.
With Solaris, I usually do the following:
% dbx - <pid of process>
dbx> where
<stack prints out>
> 4. What version of Open MPI are you using, on what type of machine, using
> which OS?
>
Openmpi-1.2.2 in both computers
In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008
i686 GNU/Linux
In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC
2007 i686 GNU/Linux
Sorry for the bonehead question but is edu@10.4.5.126 the actual machine
name? Is its IP address really 10.4.5.126? Can you try that instead?
I would guess the issue is that the tcp btl is somehow not matching the
two nodes as being connected to each other.
--td
Sofia