I have a problem where MPI_Bcast hangs when called rapidly over and over again. This problem manifests itself on our new cluster, but not on our older one. The new cluster has Cascade Lake processors. Each node contains 2 sockets with 18 cores per socket. Cluster size is 128 nodes with an EDR infiniband network. Below is a reproducer for the issue. The hanging problem occurs in both the Zoltan partitioning software from DOE and Fun3d from NASA.
The reproducer fails with openmpi-3.1.4 and gcc-9.2. The software mentioned above has failed with both gcc-9.2 and intel-19.2 with several versions of openmpi (from the 2 and 3 series). One thing that seems to fix the problem is setting both: setenv OMPI_MCA_coll_sync_barrier_before 1 setenv OMPI_MCA_coll_sync_barrier_after 1 Setting these to 10 still showed hanging problems in the software. The reproducer hangs on our system when run with 30 nodes and 36 processes per node. The make command is: mpicxx main.cpp -o test.x. After salloc -n 1080, I run with mpirun ./test.x. When the program hangs, I get on one of the allocated nodes and run gstack on one of the running PIDs to get a stack trace. It always contains PMPI_Bcast. Thanks in advance for any advice on the problem. Kris Garrett #include <stdio.h> #include <mpi.h> const int BUFFER_LEN = 2000; const int OUTER_MAX = 200; int main(int argc, char **argv) { int mpi_size, my_rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &mpi_size); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); for (int outer = 0; outer < OUTER_MAX; outer++) { for (int root = 0; root < mpi_size; root++) { int *buf; int len; if (my_rank == 0) printf("%d\n", root); if (my_rank == root) len = BUFFER_LEN; MPI_Bcast(&len, 1, MPI_INT, root, MPI_COMM_WORLD); buf = new int[len]; if (my_rank == root) { for (int i = 0; i < len; i++) buf[i] = 1.0; } MPI_Bcast(buf, len, MPI_INT, root, MPI_COMM_WORLD); delete[] buf; } } MPI_Finalize(); return 0; }