I reproduced the problem with the following code :
PROGRAM testDUP
INCLUDE "mpif.h"
PARAMETER( MASTER = 0 )
INTEGER color
INTEGER COMM, COMM_NODES, COMM_LOAD, MYID, IERR
COMM=MPI_COMM_WORLD
CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(COMM, MYID, IERR)
IF ( MYID .eq. MASTER ) THEN
color = MPI_UNDEFINED
ELSE
color = 0
END IF
CALL MPI_COMM_SPLIT( COMM, color, 0,COMM_NODES, IERR )
IF (MYID .NE. MASTER) THEN
CALL MPI_COMM_DUP( COMM_NODES, COMM_LOAD, IERR )
ENDIF
CALL MPI_FINALIZE(IERR)
END PROGRAM
I execute the program on 2 nodes of 12 cores each (a total of 24
processes), it doesn't stop.
Adding the 2 lines above in the code, just before the MPI_COMM_DUP call,
I remark that several process have the same rank for COMM_NODES
communicator .
CALL MPI_COMM_RANK(COMM_NODES, MYID2, IERR)
WRITE(*,*) 'before DUP call myid is ', MYID, 'myid2 is ', MYID2
Jeff Squyres wrote:
On May 26, 2011, at 4:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote:
CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
IF ( id%PAR .eq. 0 ) THEN
IF ( id%MYID .eq. MASTER ) THEN
color = MPI_UNDEFINED
ELSE
color = 0
END IF
CALL MPI_COMM_SPLIT( id%COMM, color, 0, id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS - 1
ELSE
CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS
END IF
IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
ENDIF
Yes, id%myid is relative to id%comm. It is assigned, just before in the code,
by all the processes, by the following call :
CALL MPI_COMM_RANK(id%COMM, id%MYID, IERR)
I'm out of ideas. :-(
Can you create a short reproducer code?