Hi

Thanks for your answer.

Jeff Squyres wrote:
On May 31, 2011, at 10:55 AM, francoise.r...@obs.ujf-grenoble.fr wrote:

I reproduced the problem with the following code :

I'm not sure I can reconcile this statement with your later statements...?
I execute the program on 2 nodes of 12 cores each (a total of 24 processes), it 
doesn't stop.

Your first statement seems to imply that you got the sample program to hang, but this statement says that it worked fine.
I am able to run this sample program fine, too.  :-\

Sorry for the misunderstanding. When I say that the program is frozen and it does not stop it means that the program hang at the "MPI_COMM_DUP" instruction level.

Adding the 2 lines above in the code, just before the MPI_COMM_DUP call, I 
remark that several process have the same rank for COMM_NODES communicator .
CALL MPI_COMM_RANK(COMM_NODES, MYID2, IERR)
WRITE(*,*) 'before DUP call myid is ', MYID, 'myid2 is ', MYID2

That definitely should not be.  Can you show the output for this?
Here's the output (the rank 17 is missing and the 22 is twice :

before DUP myid is 1 myid2 is 0
before DUP myid is 2 myid2 is 1
before DUP myid is 3 myid2 is 2
before DUP myid is 4 myid2 is 3
before DUP myid is 5 myid2 is 4
before DUP myid is 6 myid2 is 5
before DUP myid is 7 myid2 is 6
before DUP myid is 8 myid2 is 7
before DUP myid is 9 myid2 is 8
before DUP myid is 10 myid2 is 9
before DUP myid is 11 myid2 is 10
before DUP myid is 12 myid2 is 11
before DUP myid is 13 myid2 is 12
before DUP myid is 14 myid2 is 13
before DUP myid is 15 myid2 is 14
before DUP myid is 16 myid2 is 15
before DUP myid is 17 myid2 is 16
before DUP myid is 18 myid2 is 18
before DUP myid is 19 myid2 is 19
before DUP myid is 20 myid2 is 20
before DUP myid is 21 myid2 is 21
before DUP myid is 22 myid2 is 22
before DUP myid is 23 myid2 is 22


I put those lines in an I see unique rank values for all processes.

Are you using the wrong mpif.h,
I have verified the include path and it is ok.
Moreover, I am able to run the program on 2 nodes and a total of 12 tasks (mpirun -np 12) or with 2 nodes with a total of 18 tasks. The rank values are ok. But the program hang beyond 18 tasks. And the rank values are not unique in these cases. It's the same behaviour for 4 nodes, for example.

Best regards
F. Roch

Reply via email to