I reproduced the problem with the following code :


PROGRAM testDUP
INCLUDE "mpif.h"

PARAMETER( MASTER = 0 )
INTEGER color
INTEGER COMM, COMM_NODES, COMM_LOAD, MYID, IERR

COMM=MPI_COMM_WORLD

CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(COMM, MYID, IERR)
IF ( MYID .eq. MASTER ) THEN
color = MPI_UNDEFINED
ELSE
color = 0
END IF

CALL MPI_COMM_SPLIT( COMM, color, 0,COMM_NODES, IERR )

IF (MYID .NE. MASTER) THEN
CALL MPI_COMM_DUP( COMM_NODES, COMM_LOAD, IERR )
ENDIF

CALL MPI_FINALIZE(IERR)
END PROGRAM


I execute the program on 2 nodes of 12 cores each (a total of 24 processes), it doesn't stop. Adding the 2 lines above in the code, just before the MPI_COMM_DUP call, I remark that several process have the same rank for COMM_NODES communicator .
CALL MPI_COMM_RANK(COMM_NODES, MYID2, IERR)
WRITE(*,*) 'before DUP call myid is ', MYID, 'myid2 is ', MYID2



Jeff Squyres wrote:
On May 26, 2011, at 4:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote:

    CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
    IF ( id%PAR .eq. 0 ) THEN
       IF ( id%MYID .eq. MASTER ) THEN
          color = MPI_UNDEFINED
       ELSE
          color = 0
       END IF
       CALL MPI_COMM_SPLIT( id%COMM, color, 0, id%COMM_NODES, IERR )
       id%NSLAVES = id%NPROCS - 1
    ELSE
       CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
       id%NSLAVES = id%NPROCS
    END IF

    IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
       CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
    ENDIF

Yes, id%myid is relative to id%comm. It is assigned, just before in the code, 
by all the processes, by the following call :
CALL MPI_COMM_RANK(id%COMM, id%MYID, IERR)

I'm out of ideas.  :-(

Can you create a short reproducer code?


Reply via email to