[OMPI users] Shared Windows and MPI_Accumulate

Joseph Schuchart Wed, 01 Mar 2017 07:07:02 -0800

Hi all,

We are seeing issues in one of our applications, in which processes in ashared communicator allocate a shared MPI window and executeMPI_Accumulate simultaneously on it to iteratively update each process'values. The test boils down to the sample code attached. Sample outputis as follows:


```
$ mpirun -n 4 ./mpi_shared_accumulate
[1] baseptr[0]: 1010 (expected 1010)
[1] baseptr[1]: 1011 (expected 1011)
[1] baseptr[2]: 1012 (expected 1012)
[1] baseptr[3]: 1013 (expected 1013)
[1] baseptr[4]: 1014 (expected 1014)
[2] baseptr[0]: 1005 (expected 1010) [!!!]
[2] baseptr[1]: 1006 (expected 1011) [!!!]
[2] baseptr[2]: 1007 (expected 1012) [!!!]
[2] baseptr[3]: 1008 (expected 1013) [!!!]
[2] baseptr[4]: 1009 (expected 1014) [!!!]
[3] baseptr[0]: 1010 (expected 1010)
[0] baseptr[0]: 1010 (expected 1010)
[0] baseptr[1]: 1011 (expected 1011)
[0] baseptr[2]: 1012 (expected 1012)
[0] baseptr[3]: 1013 (expected 1013)
[0] baseptr[4]: 1014 (expected 1014)
[3] baseptr[1]: 1011 (expected 1011)
[3] baseptr[2]: 1012 (expected 1012)
[3] baseptr[3]: 1013 (expected 1013)
[3] baseptr[4]: 1014 (expected 1014)
```

Each process should hold the same values but sometimes (not on allexecutions) random processes diverge (marked through [!!!]).


I made the following observations:

1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not withMPICH 3.2.2) The issue occurs only if the window is allocated throughMPI_Win_allocate_shared, using MPI_Win_allocate works fine.3) The code assumes that MPI_Accumulate atomically updates individualelements (please correct me if that is not covered by the MPI standard).

Both OpenMPI and the example code were compiled using GCC 5.4.1 and runon a Linux system (single node). OpenMPI was configure with--enable-mpi-thread-multiple and --with-threads but the application isnot multi-threaded. Please let me know if you need any other information.


Cheers
Joseph

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

#include <mpi.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);

    int *baseptr;
    MPI_Win win;
    int local_elem_count = 5;
    int comm_size;
    int comm_rank;

    int *local_elem_buf = malloc(sizeof(int) * local_elem_count);

    MPI_Comm sharedmem_comm;
    MPI_Group sharedmem_group, group_all;
    MPI_Comm_split_type(
        MPI_COMM_WORLD,
        MPI_COMM_TYPE_SHARED,
        1,
        MPI_INFO_NULL,
        &sharedmem_comm);

    MPI_Comm_size(sharedmem_comm, &comm_size);
    MPI_Comm_rank(sharedmem_comm, &comm_rank);

    for (int i = 0; i < local_elem_count; ++i) {
        local_elem_buf[i] = comm_rank + 1;
    }

    MPI_Info win_info;
    MPI_Info_create(&win_info);
    MPI_Info_set(win_info, "alloc_shared_noncontig", "true");

    //NOTE: Using MPI_Win_allocate here works as expected
    MPI_Win_allocate_shared(
        local_elem_count*sizeof(int),
        sizeof(int),
        win_info, 
        sharedmem_comm, 
        &baseptr, 
        &win);
    MPI_Info_free(&win_info);

    MPI_Win_lock_all(0, win);

    int loffs = 0;
    for (int i = 0; i < local_elem_count; ++i) {
      baseptr[i] = 1000 + loffs++;
    }

    // accumulate into each process' local memory
    for (int i = 0; i < comm_size; i++) {
        MPI_Accumulate(
            local_elem_buf,
            local_elem_count,
            MPI_INT,
            i, 0,
            local_elem_count,
            MPI_INT,
            MPI_SUM,
            win);
        MPI_Win_flush(i, win);
    }

    // wait for completion of all processes
    MPI_Barrier(sharedmem_comm);
    // print local elements
    for (int i = 0; i < local_elem_count; ++i) {
      int expected = (1000 + i) +
                     ((comm_size * (comm_size + 1)) / 2);
      printf("[%i] baseptr[%i]: %i (expected %i)%s\n",
        comm_rank, i, baseptr[i], expected,
        (baseptr[i] != expected) ? " [!!!]" : "");
    }

    MPI_Win_unlock_all(win);

    MPI_Win_free(&win);


    MPI_Finalize();

    return 0;
}

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Shared Windows and MPI_Accumulate

Reply via email to