Hi! The following code shows a bad behaviour when running over openib.
Openmpi: 1.3.3 With openib it dies with "error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5 ", with tcp or shmem it works as expected. #include <stdio.h> #include <stdlib.h> #include <time.h> #include "mpi.h" int main(int argc, char *argv[]) { int rank; int n; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); fprintf(stderr, "I am %d at %d\n", rank, time(NULL)); fflush(stderr); n = 4; MPI_Bcast(&n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD); fprintf(stderr, "I am %d at %d\n", rank, time(NULL)); fflush(stderr); if (rank == 0) { sleep(60); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize( ); exit(0); } I know about the internal openmpi reason for it do behave as it does. But i think that it should be allowed to behave as it does. This example is a bit engineered but there are codes where a similar situation can occur, i.e. the Bcast sender doing lots of other work after the Bcast before the next MPI call. VASP is a candidate for this. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se