Hello Joseph,

I'm still unable to reproduce this system on my SLES12 x86_64 node.

Are you building with CFLAGS=-O3?

If so, could you build without CFLAGS set and see if you still see the
failure?

Howard


2017-03-02 2:34 GMT-07:00 Joseph Schuchart <schuch...@hlrs.de>:

> Hi Howard,
>
> Thanks for trying to reproduce this. It seems that on master the issue
> occurs less frequently but is still there. I used the following bash
> one-liner on my laptop and on our Linux Cluster (single node, 4 processes):
>
> ```
> $ for i in $(seq 1 100) ; do echo $i && mpirun -n 4
> ./mpi_shared_accumulate | grep \! && break ; done
> 1
> 2
> [0] baseptr[0]: 1004 (expected 1010) [!!!]
> [0] baseptr[1]: 1005 (expected 1011) [!!!]
> [0] baseptr[2]: 1006 (expected 1012) [!!!]
> [0] baseptr[3]: 1007 (expected 1013) [!!!]
> [0] baseptr[4]: 1008 (expected 1014) [!!!]
> ```
>
> Sometimes the error occurs after one or two iterations (like above),
> sometimes only at iteration 20 or later. However, I can reproduce it within
> the 100 runs every time I run the statement above. I am attaching the
> config.log and output of ompi_info of master on my laptop. Please let me
> know if I can help with anything else.
>
> Thanks,
> Joseph
>
> On 03/01/2017 11:24 PM, Howard Pritchard wrote:
>
> Hi Joseph,
>
> I built this test with craypich (Cray MPI) and it passed.  I also tried
> with Open MPI master and the test passed.  I also tried with 2.0.2
> and can't seem to reproduce on my system.
>
> Could you post the output of config.log?
>
> Also, how intermittent is the problem?
>
>
> Thanks,
>
> Howard
>
>
>
>
> 2017-03-01 8:03 GMT-07:00 Joseph Schuchart <schuch...@hlrs.de>:
>
>> Hi all,
>>
>> We are seeing issues in one of our applications, in which processes in a
>> shared communicator allocate a shared MPI window and execute MPI_Accumulate
>> simultaneously on it to iteratively update each process' values. The test
>> boils down to the sample code attached. Sample output is as follows:
>>
>> ```
>> $ mpirun -n 4 ./mpi_shared_accumulate
>> [1] baseptr[0]: 1010 (expected 1010)
>> [1] baseptr[1]: 1011 (expected 1011)
>> [1] baseptr[2]: 1012 (expected 1012)
>> [1] baseptr[3]: 1013 (expected 1013)
>> [1] baseptr[4]: 1014 (expected 1014)
>> [2] baseptr[0]: 1005 (expected 1010) [!!!]
>> [2] baseptr[1]: 1006 (expected 1011) [!!!]
>> [2] baseptr[2]: 1007 (expected 1012) [!!!]
>> [2] baseptr[3]: 1008 (expected 1013) [!!!]
>> [2] baseptr[4]: 1009 (expected 1014) [!!!]
>> [3] baseptr[0]: 1010 (expected 1010)
>> [0] baseptr[0]: 1010 (expected 1010)
>> [0] baseptr[1]: 1011 (expected 1011)
>> [0] baseptr[2]: 1012 (expected 1012)
>> [0] baseptr[3]: 1013 (expected 1013)
>> [0] baseptr[4]: 1014 (expected 1014)
>> [3] baseptr[1]: 1011 (expected 1011)
>> [3] baseptr[2]: 1012 (expected 1012)
>> [3] baseptr[3]: 1013 (expected 1013)
>> [3] baseptr[4]: 1014 (expected 1014)
>> ```
>>
>> Each process should hold the same values but sometimes (not on all
>> executions) random processes diverge (marked through [!!!]).
>>
>> I made the following observations:
>>
>> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH
>> 3.2.
>> 2) The issue occurs only if the window is allocated through
>> MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
>> 3) The code assumes that MPI_Accumulate atomically updates individual
>> elements (please correct me if that is not covered by the MPI standard).
>>
>> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
>> on a Linux system (single node). OpenMPI was configure with
>> --enable-mpi-thread-multiple and --with-threads but the application is not
>> multi-threaded. Please let me know if you need any other information.
>>
>> Cheers
>> Joseph
>>
>> --
>> Dipl.-Inf. Joseph Schuchart
>> High Performance Computing Center Stuttgart (HLRS)
>> Nobelstr. 19
>> D-70569 Stuttgart
>>
>> Tel.: +49(0)711-68565890
>> Fax: +49(0)711-6856832
>> E-Mail: schuch...@hlrs.de
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
>
> _______________________________________________
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890 <+49%20711%2068565890>
> Fax: +49(0)711-6856832 <+49%20711%206856832>
> E-Mail: schuch...@hlrs.de
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to