Hi all,

We are seeing issues in one of our applications, in which processes in a shared communicator allocate a shared MPI window and execute MPI_Accumulate simultaneously on it to iteratively update each process' values. The test boils down to the sample code attached. Sample output is as follows:

```
$ mpirun -n 4 ./mpi_shared_accumulate
[1] baseptr[0]: 1010 (expected 1010)
[1] baseptr[1]: 1011 (expected 1011)
[1] baseptr[2]: 1012 (expected 1012)
[1] baseptr[3]: 1013 (expected 1013)
[1] baseptr[4]: 1014 (expected 1014)
[2] baseptr[0]: 1005 (expected 1010) [!!!]
[2] baseptr[1]: 1006 (expected 1011) [!!!]
[2] baseptr[2]: 1007 (expected 1012) [!!!]
[2] baseptr[3]: 1008 (expected 1013) [!!!]
[2] baseptr[4]: 1009 (expected 1014) [!!!]
[3] baseptr[0]: 1010 (expected 1010)
[0] baseptr[0]: 1010 (expected 1010)
[0] baseptr[1]: 1011 (expected 1011)
[0] baseptr[2]: 1012 (expected 1012)
[0] baseptr[3]: 1013 (expected 1013)
[0] baseptr[4]: 1014 (expected 1014)
[3] baseptr[1]: 1011 (expected 1011)
[3] baseptr[2]: 1012 (expected 1012)
[3] baseptr[3]: 1013 (expected 1013)
[3] baseptr[4]: 1014 (expected 1014)
```

Each process should hold the same values but sometimes (not on all executions) random processes diverge (marked through [!!!]).

I made the following observations:

1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH 3.2. 2) The issue occurs only if the window is allocated through MPI_Win_allocate_shared, using MPI_Win_allocate works fine. 3) The code assumes that MPI_Accumulate atomically updates individual elements (please correct me if that is not covered by the MPI standard).

Both OpenMPI and the example code were compiled using GCC 5.4.1 and run on a Linux system (single node). OpenMPI was configure with --enable-mpi-thread-multiple and --with-threads but the application is not multi-threaded. Please let me know if you need any other information.

Cheers
Joseph

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

#include <mpi.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);

    int *baseptr;
    MPI_Win win;
    int local_elem_count = 5;
    int comm_size;
    int comm_rank;

    int *local_elem_buf = malloc(sizeof(int) * local_elem_count);

    MPI_Comm sharedmem_comm;
    MPI_Group sharedmem_group, group_all;
    MPI_Comm_split_type(
        MPI_COMM_WORLD,
        MPI_COMM_TYPE_SHARED,
        1,
        MPI_INFO_NULL,
        &sharedmem_comm);

    MPI_Comm_size(sharedmem_comm, &comm_size);
    MPI_Comm_rank(sharedmem_comm, &comm_rank);

    for (int i = 0; i < local_elem_count; ++i) {
        local_elem_buf[i] = comm_rank + 1;
    }

    MPI_Info win_info;
    MPI_Info_create(&win_info);
    MPI_Info_set(win_info, "alloc_shared_noncontig", "true");

    //NOTE: Using MPI_Win_allocate here works as expected
    MPI_Win_allocate_shared(
        local_elem_count*sizeof(int),
        sizeof(int),
        win_info, 
        sharedmem_comm, 
        &baseptr, 
        &win);
    MPI_Info_free(&win_info);

    MPI_Win_lock_all(0, win);

    int loffs = 0;
    for (int i = 0; i < local_elem_count; ++i) {
      baseptr[i] = 1000 + loffs++;
    }

    // accumulate into each process' local memory
    for (int i = 0; i < comm_size; i++) {
        MPI_Accumulate(
            local_elem_buf,
            local_elem_count,
            MPI_INT,
            i, 0,
            local_elem_count,
            MPI_INT,
            MPI_SUM,
            win);
        MPI_Win_flush(i, win);
    }

    // wait for completion of all processes
    MPI_Barrier(sharedmem_comm);
    // print local elements
    for (int i = 0; i < local_elem_count; ++i) {
      int expected = (1000 + i) +
                     ((comm_size * (comm_size + 1)) / 2);
      printf("[%i] baseptr[%i]: %i (expected %i)%s\n",
        comm_rank, i, baseptr[i], expected,
        (baseptr[i] != expected) ? " [!!!]" : "");
    }

    MPI_Win_unlock_all(win);

    MPI_Win_free(&win);


    MPI_Finalize();

    return 0;
}
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to