Re: [OMPI users] Shared Windows and MPI_Accumulate

Steffen Christgau Thu, 09 Mar 2017 06:12:42 -0800

Hi Joseph,

in your code, you are updating the local buffer, which is also exposed
via the window, right after the lock_all call, but the stores
(baseptr[i] = 1000 + loffs++, let's call those the buffer
initialization) are may overwrite the outcome of other concurrent
operations, i.e. the accumulate calls in your case.

Another process that has already advanced to the accumulate loop may
change data in the local window, but your local process has not
completed the initialization. Thus you loose the outcome of accumulates
by initialization in case of process skew.

I provoked process skew by adding a

if (comm_rank == 0) {
   sleep(1);
}

before the initialization loop, which enables me to reproduce the wrong
results using GCC 6.3 and OpenMPI 2.0.2 and executing the program with
two MPI processes.

The lock_all call after the buffer initialization gives you no
collective synchronization in the windows' communicator (as hinted on p.
446 in the 3.1 standard). That is, other processes have already
performed their accumulate phase while the local one is still (or not
yet) in the initialization and overwrites the data (see above).

You might consider an EXCLUSIVE lock around your initialization, but
this wont solve the issue, because any other process may do its
accumulate phase after the window creation but before you enter the
buffer initialization loop.

As far as I understand your MWE code, the initialization should complete
before the accumulate loop starts to get the correct results. I suppose
a missing MPI_Barrier before the accumulate loop. Since you are using
the unified model, you can omit the proposed exclusive lock (see above)
as well.

Hope this helps.

Regards, Steffen

On 03/01/2017 04:03 PM, Joseph Schuchart wrote:
> Hi all,
> 
> We are seeing issues in one of our applications, in which processes in a
> shared communicator allocate a shared MPI window and execute
> MPI_Accumulate simultaneously on it to iteratively update each process'
> values. The test boils down to the sample code attached. Sample output
> is as follows:
> 
> ```
> $ mpirun -n 4 ./mpi_shared_accumulate
> [1] baseptr[0]: 1010 (expected 1010)
> [1] baseptr[1]: 1011 (expected 1011)
> [1] baseptr[2]: 1012 (expected 1012)
> [1] baseptr[3]: 1013 (expected 1013)
> [1] baseptr[4]: 1014 (expected 1014)
> [2] baseptr[0]: 1005 (expected 1010) [!!!]
> [2] baseptr[1]: 1006 (expected 1011) [!!!]
> [2] baseptr[2]: 1007 (expected 1012) [!!!]
> [2] baseptr[3]: 1008 (expected 1013) [!!!]
> [2] baseptr[4]: 1009 (expected 1014) [!!!]
> [3] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[1]: 1011 (expected 1011)
> [0] baseptr[2]: 1012 (expected 1012)
> [0] baseptr[3]: 1013 (expected 1013)
> [0] baseptr[4]: 1014 (expected 1014)
> [3] baseptr[1]: 1011 (expected 1011)
> [3] baseptr[2]: 1012 (expected 1012)
> [3] baseptr[3]: 1013 (expected 1013)
> [3] baseptr[4]: 1014 (expected 1014)
> ```
> 
> Each process should hold the same values but sometimes (not on all
> executions) random processes diverge (marked through [!!!]).
> 
> I made the following observations:
> 
> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with
> MPICH 3.2.
> 2) The issue occurs only if the window is allocated through
> MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
> 3) The code assumes that MPI_Accumulate atomically updates individual
> elements (please correct me if that is not covered by the MPI standard).
> 
> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
> on a Linux system (single node). OpenMPI was configure with
> --enable-mpi-thread-multiple and --with-threads but the application is
> not multi-threaded. Please let me know if you need any other information.
> 
> Cheers
> Joseph
> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Shared Windows and MPI_Accumulate

Reply via email to