Patrick, Thanks for the report and the reproducer.
I was able to confirm the issue with python and Fortran, but - I can only reproduce it with pml/ucx (read --mca pml ob1 --mca btl tcp,self works fine) - I can only reproduce it with bcast algorithm 8 and 9 As a workaround, you can keep using ucx but manually change the bcast algo mpirun --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1 ... /* you can replace the bcast algorithm with any value between 1 and 7 included */ Cheers, Gilles On Mon, Aug 24, 2020 at 10:58 PM Patrick McNally via users <users@lists.open-mpi.org> wrote: > > I apologize in advance for the size of the example source and probably the > length of the email, but this has been a pain to track down. > > Our application uses System V style shared memory pretty extensively and have > recently found that in certain circumstances, OpenMPI appears to provide > ranks with stale data. The attached archive contains sample code that > demonstrates the issue. There is a subroutine that uses a shared memory > array to broadcast from a single rank on one compute node to a single rank on > all other compute nodes. The first call sends all 1s, then all 2s, and so > on. The receiving rank(s) get all 1s on the first execution, but on > subsequent executions they receive some 2s and some 1s; then some 3s, some > 2s, and some 1s. The code contains a version of this routine in both C and > Fortran but only the Fortran version appears to exhibit the problem. > > I've tried this with OpenMPI 3.1.5, 4.0.2, and 4.0.4 and on two different > systems with very different configurations and both show the problem. On one > of the machines, it only appears to happen when MPI is initialized with > mpi4py, so I've included that in the test as well. Other than that, the > behavior is very consistent across machines. When run with the same number > of ranks and same size array, the two machines even show the invalid values > at the same indices. > > Please let me know if you need any additional information. > > Thanks, > Patrick