Hello Joseph, I'm still unable to reproduce this system on my SLES12 x86_64 node.
Are you building with CFLAGS=-O3? If so, could you build without CFLAGS set and see if you still see the failure? Howard 2017-03-02 2:34 GMT-07:00 Joseph Schuchart <schuch...@hlrs.de>: > Hi Howard, > > Thanks for trying to reproduce this. It seems that on master the issue > occurs less frequently but is still there. I used the following bash > one-liner on my laptop and on our Linux Cluster (single node, 4 processes): > > ``` > $ for i in $(seq 1 100) ; do echo $i && mpirun -n 4 > ./mpi_shared_accumulate | grep \! && break ; done > 1 > 2 > [0] baseptr[0]: 1004 (expected 1010) [!!!] > [0] baseptr[1]: 1005 (expected 1011) [!!!] > [0] baseptr[2]: 1006 (expected 1012) [!!!] > [0] baseptr[3]: 1007 (expected 1013) [!!!] > [0] baseptr[4]: 1008 (expected 1014) [!!!] > ``` > > Sometimes the error occurs after one or two iterations (like above), > sometimes only at iteration 20 or later. However, I can reproduce it within > the 100 runs every time I run the statement above. I am attaching the > config.log and output of ompi_info of master on my laptop. Please let me > know if I can help with anything else. > > Thanks, > Joseph > > On 03/01/2017 11:24 PM, Howard Pritchard wrote: > > Hi Joseph, > > I built this test with craypich (Cray MPI) and it passed. I also tried > with Open MPI master and the test passed. I also tried with 2.0.2 > and can't seem to reproduce on my system. > > Could you post the output of config.log? > > Also, how intermittent is the problem? > > > Thanks, > > Howard > > > > > 2017-03-01 8:03 GMT-07:00 Joseph Schuchart <schuch...@hlrs.de>: > >> Hi all, >> >> We are seeing issues in one of our applications, in which processes in a >> shared communicator allocate a shared MPI window and execute MPI_Accumulate >> simultaneously on it to iteratively update each process' values. The test >> boils down to the sample code attached. Sample output is as follows: >> >> ``` >> $ mpirun -n 4 ./mpi_shared_accumulate >> [1] baseptr[0]: 1010 (expected 1010) >> [1] baseptr[1]: 1011 (expected 1011) >> [1] baseptr[2]: 1012 (expected 1012) >> [1] baseptr[3]: 1013 (expected 1013) >> [1] baseptr[4]: 1014 (expected 1014) >> [2] baseptr[0]: 1005 (expected 1010) [!!!] >> [2] baseptr[1]: 1006 (expected 1011) [!!!] >> [2] baseptr[2]: 1007 (expected 1012) [!!!] >> [2] baseptr[3]: 1008 (expected 1013) [!!!] >> [2] baseptr[4]: 1009 (expected 1014) [!!!] >> [3] baseptr[0]: 1010 (expected 1010) >> [0] baseptr[0]: 1010 (expected 1010) >> [0] baseptr[1]: 1011 (expected 1011) >> [0] baseptr[2]: 1012 (expected 1012) >> [0] baseptr[3]: 1013 (expected 1013) >> [0] baseptr[4]: 1014 (expected 1014) >> [3] baseptr[1]: 1011 (expected 1011) >> [3] baseptr[2]: 1012 (expected 1012) >> [3] baseptr[3]: 1013 (expected 1013) >> [3] baseptr[4]: 1014 (expected 1014) >> ``` >> >> Each process should hold the same values but sometimes (not on all >> executions) random processes diverge (marked through [!!!]). >> >> I made the following observations: >> >> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH >> 3.2. >> 2) The issue occurs only if the window is allocated through >> MPI_Win_allocate_shared, using MPI_Win_allocate works fine. >> 3) The code assumes that MPI_Accumulate atomically updates individual >> elements (please correct me if that is not covered by the MPI standard). >> >> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run >> on a Linux system (single node). OpenMPI was configure with >> --enable-mpi-thread-multiple and --with-threads but the application is not >> multi-threaded. Please let me know if you need any other information. >> >> Cheers >> Joseph >> >> -- >> Dipl.-Inf. Joseph Schuchart >> High Performance Computing Center Stuttgart (HLRS) >> Nobelstr. 19 >> D-70569 Stuttgart >> >> Tel.: +49(0)711-68565890 >> Fax: +49(0)711-6856832 >> E-Mail: schuch...@hlrs.de >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > > > _______________________________________________ > users mailing > listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > -- > Dipl.-Inf. Joseph Schuchart > High Performance Computing Center Stuttgart (HLRS) > Nobelstr. 19 > D-70569 Stuttgart > > Tel.: +49(0)711-68565890 <+49%20711%2068565890> > Fax: +49(0)711-6856832 <+49%20711%206856832> > E-Mail: schuch...@hlrs.de > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users