Hi Joseph, in your code, you are updating the local buffer, which is also exposed via the window, right after the lock_all call, but the stores (baseptr[i] = 1000 + loffs++, let's call those the buffer initialization) are may overwrite the outcome of other concurrent operations, i.e. the accumulate calls in your case.
Another process that has already advanced to the accumulate loop may change data in the local window, but your local process has not completed the initialization. Thus you loose the outcome of accumulates by initialization in case of process skew. I provoked process skew by adding a if (comm_rank == 0) { sleep(1); } before the initialization loop, which enables me to reproduce the wrong results using GCC 6.3 and OpenMPI 2.0.2 and executing the program with two MPI processes. The lock_all call after the buffer initialization gives you no collective synchronization in the windows' communicator (as hinted on p. 446 in the 3.1 standard). That is, other processes have already performed their accumulate phase while the local one is still (or not yet) in the initialization and overwrites the data (see above). You might consider an EXCLUSIVE lock around your initialization, but this wont solve the issue, because any other process may do its accumulate phase after the window creation but before you enter the buffer initialization loop. As far as I understand your MWE code, the initialization should complete before the accumulate loop starts to get the correct results. I suppose a missing MPI_Barrier before the accumulate loop. Since you are using the unified model, you can omit the proposed exclusive lock (see above) as well. Hope this helps. Regards, Steffen On 03/01/2017 04:03 PM, Joseph Schuchart wrote: > Hi all, > > We are seeing issues in one of our applications, in which processes in a > shared communicator allocate a shared MPI window and execute > MPI_Accumulate simultaneously on it to iteratively update each process' > values. The test boils down to the sample code attached. Sample output > is as follows: > > ``` > $ mpirun -n 4 ./mpi_shared_accumulate > [1] baseptr[0]: 1010 (expected 1010) > [1] baseptr[1]: 1011 (expected 1011) > [1] baseptr[2]: 1012 (expected 1012) > [1] baseptr[3]: 1013 (expected 1013) > [1] baseptr[4]: 1014 (expected 1014) > [2] baseptr[0]: 1005 (expected 1010) [!!!] > [2] baseptr[1]: 1006 (expected 1011) [!!!] > [2] baseptr[2]: 1007 (expected 1012) [!!!] > [2] baseptr[3]: 1008 (expected 1013) [!!!] > [2] baseptr[4]: 1009 (expected 1014) [!!!] > [3] baseptr[0]: 1010 (expected 1010) > [0] baseptr[0]: 1010 (expected 1010) > [0] baseptr[1]: 1011 (expected 1011) > [0] baseptr[2]: 1012 (expected 1012) > [0] baseptr[3]: 1013 (expected 1013) > [0] baseptr[4]: 1014 (expected 1014) > [3] baseptr[1]: 1011 (expected 1011) > [3] baseptr[2]: 1012 (expected 1012) > [3] baseptr[3]: 1013 (expected 1013) > [3] baseptr[4]: 1014 (expected 1014) > ``` > > Each process should hold the same values but sometimes (not on all > executions) random processes diverge (marked through [!!!]). > > I made the following observations: > > 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with > MPICH 3.2. > 2) The issue occurs only if the window is allocated through > MPI_Win_allocate_shared, using MPI_Win_allocate works fine. > 3) The code assumes that MPI_Accumulate atomically updates individual > elements (please correct me if that is not covered by the MPI standard). > > Both OpenMPI and the example code were compiled using GCC 5.4.1 and run > on a Linux system (single node). OpenMPI was configure with > --enable-mpi-thread-multiple and --with-threads but the application is > not multi-threaded. Please let me know if you need any other information. > > Cheers > Joseph > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users