Date: Wed, 17 Sep 2008 16:23:59 +0200
From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID: <0625EEFB84E04647A1930A963A8DF7E3@aparicio1>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
        reply-type=response

Hello Terry,

Thank you very much for your help.

> Sofia,
>
> I took your program and actually ran it successfully on my systems using > Open MPI r19400. A couple questions:
>
> 1.  Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./PruebaSumaParalela.out
>

Yes. In this case, the program works perfectly.

> 2. Can you try and run the code the following way and is the output > different? > mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca mpi_preconnect_all > 1 --prefix /usr/local ./PruebaSumaParalela.out
>

The program also hangs but the output is different. In both computers I get the following:

Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv

Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the > processes and print out a stack?
>

I do not know how to do that.

With Solaris, I usually do the following:
% dbx - <pid of process>
dbx>  where
<stack prints out>

> 4. What version of Open MPI are you using, on what type of machine, using > which OS?
>

Openmpi-1.2.2 in both computers

In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008 i686 GNU/Linux

In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux

Sorry for the bonehead question but is edu@10.4.5.126 the actual machine name? Is its IP address really 10.4.5.126? Can you try that instead? I would guess the issue is that the tcp btl is somehow not matching the two nodes as being connected to each other.

--td
Sofia

Reply via email to