Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-09 Thread Joseph Schuchart
Well, that is embarrassing! Thank you so much for figuring this out and 
providing a detailed answer (also thanks to everyone else who tried to 
reproduce it). I guess I assumed some synchronization in lock_all even 
though I know that it is not collective. With an additional barrier 
between initialization and accumulate in our original application things 
work smoothly.


Best
Joseph


On 03/09/2017 03:10 PM, Steffen Christgau wrote:

Hi Joseph,

in your code, you are updating the local buffer, which is also exposed
via the window, right after the lock_all call, but the stores
(baseptr[i] = 1000 + loffs++, let's call those the buffer
initialization) are may overwrite the outcome of other concurrent
operations, i.e. the accumulate calls in your case.

Another process that has already advanced to the accumulate loop may
change data in the local window, but your local process has not
completed the initialization. Thus you loose the outcome of accumulates
by initialization in case of process skew.

I provoked process skew by adding a

if (comm_rank == 0) {
sleep(1);
}

before the initialization loop, which enables me to reproduce the wrong
results using GCC 6.3 and OpenMPI 2.0.2 and executing the program with
two MPI processes.

The lock_all call after the buffer initialization gives you no
collective synchronization in the windows' communicator (as hinted on p.
446 in the 3.1 standard). That is, other processes have already
performed their accumulate phase while the local one is still (or not
yet) in the initialization and overwrites the data (see above).

You might consider an EXCLUSIVE lock around your initialization, but
this wont solve the issue, because any other process may do its
accumulate phase after the window creation but before you enter the
buffer initialization loop.

As far as I understand your MWE code, the initialization should complete
before the accumulate loop starts to get the correct results. I suppose
a missing MPI_Barrier before the accumulate loop. Since you are using
the unified model, you can omit the proposed exclusive lock (see above)
as well.

Hope this helps.

Regards, Steffen

On 03/01/2017 04:03 PM, Joseph Schuchart wrote:

Hi all,

We are seeing issues in one of our applications, in which processes in a
shared communicator allocate a shared MPI window and execute
MPI_Accumulate simultaneously on it to iteratively update each process'
values. The test boils down to the sample code attached. Sample output
is as follows:

```
$ mpirun -n 4 ./mpi_shared_accumulate
[1] baseptr[0]: 1010 (expected 1010)
[1] baseptr[1]: 1011 (expected 1011)
[1] baseptr[2]: 1012 (expected 1012)
[1] baseptr[3]: 1013 (expected 1013)
[1] baseptr[4]: 1014 (expected 1014)
[2] baseptr[0]: 1005 (expected 1010) [!!!]
[2] baseptr[1]: 1006 (expected 1011) [!!!]
[2] baseptr[2]: 1007 (expected 1012) [!!!]
[2] baseptr[3]: 1008 (expected 1013) [!!!]
[2] baseptr[4]: 1009 (expected 1014) [!!!]
[3] baseptr[0]: 1010 (expected 1010)
[0] baseptr[0]: 1010 (expected 1010)
[0] baseptr[1]: 1011 (expected 1011)
[0] baseptr[2]: 1012 (expected 1012)
[0] baseptr[3]: 1013 (expected 1013)
[0] baseptr[4]: 1014 (expected 1014)
[3] baseptr[1]: 1011 (expected 1011)
[3] baseptr[2]: 1012 (expected 1012)
[3] baseptr[3]: 1013 (expected 1013)
[3] baseptr[4]: 1014 (expected 1014)
```

Each process should hold the same values but sometimes (not on all
executions) random processes diverge (marked through [!!!]).

I made the following observations:

1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with
MPICH 3.2.
2) The issue occurs only if the window is allocated through
MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
3) The code assumes that MPI_Accumulate atomically updates individual
elements (please correct me if that is not covered by the MPI standard).

Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
on a Linux system (single node). OpenMPI was configure with
--enable-mpi-thread-multiple and --with-threads but the application is
not multi-threaded. Please let me know if you need any other information.

Cheers
Joseph



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-09 Thread Steffen Christgau
On 03/09/2017 03:10 PM, Steffen Christgau wrote:
> 
> Since you are using
> the unified model, you can omit the proposed exclusive lock (see above)
> as well.

To be fair, you have to be cautious when doing that - even in the
unified model. See example 11.7 in the MPI-3.1 standard. In that
context, you might also consider example 11.9.

Regards, Steffen

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-09 Thread Steffen Christgau
Hi Joseph,

in your code, you are updating the local buffer, which is also exposed
via the window, right after the lock_all call, but the stores
(baseptr[i] = 1000 + loffs++, let's call those the buffer
initialization) are may overwrite the outcome of other concurrent
operations, i.e. the accumulate calls in your case.

Another process that has already advanced to the accumulate loop may
change data in the local window, but your local process has not
completed the initialization. Thus you loose the outcome of accumulates
by initialization in case of process skew.

I provoked process skew by adding a

if (comm_rank == 0) {
   sleep(1);
}

before the initialization loop, which enables me to reproduce the wrong
results using GCC 6.3 and OpenMPI 2.0.2 and executing the program with
two MPI processes.

The lock_all call after the buffer initialization gives you no
collective synchronization in the windows' communicator (as hinted on p.
446 in the 3.1 standard). That is, other processes have already
performed their accumulate phase while the local one is still (or not
yet) in the initialization and overwrites the data (see above).

You might consider an EXCLUSIVE lock around your initialization, but
this wont solve the issue, because any other process may do its
accumulate phase after the window creation but before you enter the
buffer initialization loop.

As far as I understand your MWE code, the initialization should complete
before the accumulate loop starts to get the correct results. I suppose
a missing MPI_Barrier before the accumulate loop. Since you are using
the unified model, you can omit the proposed exclusive lock (see above)
as well.

Hope this helps.

Regards, Steffen

On 03/01/2017 04:03 PM, Joseph Schuchart wrote:
> Hi all,
> 
> We are seeing issues in one of our applications, in which processes in a
> shared communicator allocate a shared MPI window and execute
> MPI_Accumulate simultaneously on it to iteratively update each process'
> values. The test boils down to the sample code attached. Sample output
> is as follows:
> 
> ```
> $ mpirun -n 4 ./mpi_shared_accumulate
> [1] baseptr[0]: 1010 (expected 1010)
> [1] baseptr[1]: 1011 (expected 1011)
> [1] baseptr[2]: 1012 (expected 1012)
> [1] baseptr[3]: 1013 (expected 1013)
> [1] baseptr[4]: 1014 (expected 1014)
> [2] baseptr[0]: 1005 (expected 1010) [!!!]
> [2] baseptr[1]: 1006 (expected 1011) [!!!]
> [2] baseptr[2]: 1007 (expected 1012) [!!!]
> [2] baseptr[3]: 1008 (expected 1013) [!!!]
> [2] baseptr[4]: 1009 (expected 1014) [!!!]
> [3] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[1]: 1011 (expected 1011)
> [0] baseptr[2]: 1012 (expected 1012)
> [0] baseptr[3]: 1013 (expected 1013)
> [0] baseptr[4]: 1014 (expected 1014)
> [3] baseptr[1]: 1011 (expected 1011)
> [3] baseptr[2]: 1012 (expected 1012)
> [3] baseptr[3]: 1013 (expected 1013)
> [3] baseptr[4]: 1014 (expected 1014)
> ```
> 
> Each process should hold the same values but sometimes (not on all
> executions) random processes diverge (marked through [!!!]).
> 
> I made the following observations:
> 
> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with
> MPICH 3.2.
> 2) The issue occurs only if the window is allocated through
> MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
> 3) The code assumes that MPI_Accumulate atomically updates individual
> elements (please correct me if that is not covered by the MPI standard).
> 
> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
> on a Linux system (single node). OpenMPI was configure with
> --enable-mpi-thread-multiple and --with-threads but the application is
> not multi-threaded. Please let me know if you need any other information.
> 
> Cheers
> Joseph
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-06 Thread Christoph Niethammer
Hi,

The behaviour is reproduceable on our systems:
* Linux Cluster (Intel Xeon E5-2660 v3, Scientific Linux release 6.8 (Carbon), 
Kernel 2.6.32, nightly 2.x branch) 
The error is independent of the used btl combination on the cluster (Tested 
'sm,self,vader', 'sm,self,openib', 'sm,self', 'vader,self', 'openib,self')
* Cray XC40 (using gnu 6.3 and open mpi 2.0.1, Kernel 3.0.101)
The error manifests always within 50 loop iteration of the below command line.

The behaviour is not reproduceable with neither Open MPI 2.0.1 nor 2.1.0rc2 on 
my notebook (Arch Linux, gcc 6.3.1, Kernel 4.9.11)

Best
Christoph


- Original Message -
From: "Howard Pritchard" <hpprit...@gmail.com>
To: "Open MPI Users" <users@lists.open-mpi.org>
Sent: Friday, March 3, 2017 9:02:22 PM
Subject: Re: [OMPI users] Shared Windows and MPI_Accumulate

Hello Joseph, 

I'm still unable to reproduce this system on my SLES12 x86_64 node. 

Are you building with CFLAGS=-O3? 

If so, could you build without CFLAGS set and see if you still see the failure? 

Howard 


2017-03-02 2:34 GMT-07:00 Joseph Schuchart < [ mailto:schuch...@hlrs.de | 
schuch...@hlrs.de ] > : 





Hi Howard, 

Thanks for trying to reproduce this. It seems that on master the issue occurs 
less frequently but is still there. I used the following bash one-liner on my 
laptop and on our Linux Cluster (single node, 4 processes): 

``` 
$ for i in $(seq 1 100) ; do echo $i && mpirun -n 4 ./mpi_shared_accumulate | 
grep \! && break ; done 
1 
2 
[0] baseptr[0]: 1004 (expected 1010) [!!!] 
[0] baseptr[1]: 1005 (expected 1011) [!!!] 
[0] baseptr[2]: 1006 (expected 1012) [!!!] 
[0] baseptr[3]: 1007 (expected 1013) [!!!] 
[0] baseptr[4]: 1008 (expected 1014) [!!!] 
``` 

Sometimes the error occurs after one or two iterations (like above), sometimes 
only at iteration 20 or later. However, I can reproduce it within the 100 runs 
every time I run the statement above. I am attaching the config.log and output 
of ompi_info of master on my laptop. Please let me know if I can help with 
anything else. 


Thanks, 
Joseph 

On 03/01/2017 11:24 PM, Howard Pritchard wrote: 



Hi Joseph, 

I built this test with craypich (Cray MPI) and it passed. I also tried 
with Open MPI master and the test passed. I also tried with 2.0.2 
and can't seem to reproduce on my system. 

Could you post the output of config.log? 

Also, how intermittent is the problem? 


Thanks, 

Howard 




2017-03-01 8:03 GMT-07:00 Joseph Schuchart < [ mailto:schuch...@hlrs.de | 
schuch...@hlrs.de ] > : 


Hi all, 

We are seeing issues in one of our applications, in which processes in a shared 
communicator allocate a shared MPI window and execute MPI_Accumulate 
simultaneously on it to iteratively update each process' values. The test boils 
down to the sample code attached. Sample output is as follows: 

``` 
$ mpirun -n 4 ./mpi_shared_accumulate 
[1] baseptr[0]: 1010 (expected 1010) 
[1] baseptr[1]: 1011 (expected 1011) 
[1] baseptr[2]: 1012 (expected 1012) 
[1] baseptr[3]: 1013 (expected 1013) 
[1] baseptr[4]: 1014 (expected 1014) 
[2] baseptr[0]: 1005 (expected 1010) [!!!] 
[2] baseptr[1]: 1006 (expected 1011) [!!!] 
[2] baseptr[2]: 1007 (expected 1012) [!!!] 
[2] baseptr[3]: 1008 (expected 1013) [!!!] 
[2] baseptr[4]: 1009 (expected 1014) [!!!] 
[3] baseptr[0]: 1010 (expected 1010) 
[0] baseptr[0]: 1010 (expected 1010) 
[0] baseptr[1]: 1011 (expected 1011) 
[0] baseptr[2]: 1012 (expected 1012) 
[0] baseptr[3]: 1013 (expected 1013) 
[0] baseptr[4]: 1014 (expected 1014) 
[3] baseptr[1]: 1011 (expected 1011) 
[3] baseptr[2]: 1012 (expected 1012) 
[3] baseptr[3]: 1013 (expected 1013) 
[3] baseptr[4]: 1014 (expected 1014) 
``` 

Each process should hold the same values but sometimes (not on all executions) 
random processes diverge (marked through [!!!]). 

I made the following observations: 

1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH 3.2. 
2) The issue occurs only if the window is allocated through 
MPI_Win_allocate_shared, using MPI_Win_allocate works fine. 
3) The code assumes that MPI_Accumulate atomically updates individual elements 
(please correct me if that is not covered by the MPI standard). 

Both OpenMPI and the example code were compiled using GCC 5.4.1 and run on a 
Linux system (single node). OpenMPI was configure with 
--enable-mpi-thread-multiple and --with-threads but the application is not 
multi-threaded. Please let me know if you need any other information. 

Cheers 
Joseph 

-- 
Dipl.-Inf. Joseph Schuchart 
High Performance Computing Center Stuttgart (HLRS) 
Nobelstr. 19 
D-70569 Stuttgart 

Tel.: [ tel:%2B49%280%29711-68565890 | +49(0)711-68565890 ] 
Fax: [ tel:%2B49%280%29711-6856832 | +49(0)711-6856832 ] 
E-Mail: [ mailto:schuch...@hlrs.de | schuch...@hlrs.de ] 


___ 
users mailing list 
[ mailto:users@lists.open-mpi.org | users@lists.open-mpi.org ] 

Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-03 Thread Howard Pritchard
Hello Joseph,

I'm still unable to reproduce this system on my SLES12 x86_64 node.

Are you building with CFLAGS=-O3?

If so, could you build without CFLAGS set and see if you still see the
failure?

Howard


2017-03-02 2:34 GMT-07:00 Joseph Schuchart :

> Hi Howard,
>
> Thanks for trying to reproduce this. It seems that on master the issue
> occurs less frequently but is still there. I used the following bash
> one-liner on my laptop and on our Linux Cluster (single node, 4 processes):
>
> ```
> $ for i in $(seq 1 100) ; do echo $i && mpirun -n 4
> ./mpi_shared_accumulate | grep \! && break ; done
> 1
> 2
> [0] baseptr[0]: 1004 (expected 1010) [!!!]
> [0] baseptr[1]: 1005 (expected 1011) [!!!]
> [0] baseptr[2]: 1006 (expected 1012) [!!!]
> [0] baseptr[3]: 1007 (expected 1013) [!!!]
> [0] baseptr[4]: 1008 (expected 1014) [!!!]
> ```
>
> Sometimes the error occurs after one or two iterations (like above),
> sometimes only at iteration 20 or later. However, I can reproduce it within
> the 100 runs every time I run the statement above. I am attaching the
> config.log and output of ompi_info of master on my laptop. Please let me
> know if I can help with anything else.
>
> Thanks,
> Joseph
>
> On 03/01/2017 11:24 PM, Howard Pritchard wrote:
>
> Hi Joseph,
>
> I built this test with craypich (Cray MPI) and it passed.  I also tried
> with Open MPI master and the test passed.  I also tried with 2.0.2
> and can't seem to reproduce on my system.
>
> Could you post the output of config.log?
>
> Also, how intermittent is the problem?
>
>
> Thanks,
>
> Howard
>
>
>
>
> 2017-03-01 8:03 GMT-07:00 Joseph Schuchart :
>
>> Hi all,
>>
>> We are seeing issues in one of our applications, in which processes in a
>> shared communicator allocate a shared MPI window and execute MPI_Accumulate
>> simultaneously on it to iteratively update each process' values. The test
>> boils down to the sample code attached. Sample output is as follows:
>>
>> ```
>> $ mpirun -n 4 ./mpi_shared_accumulate
>> [1] baseptr[0]: 1010 (expected 1010)
>> [1] baseptr[1]: 1011 (expected 1011)
>> [1] baseptr[2]: 1012 (expected 1012)
>> [1] baseptr[3]: 1013 (expected 1013)
>> [1] baseptr[4]: 1014 (expected 1014)
>> [2] baseptr[0]: 1005 (expected 1010) [!!!]
>> [2] baseptr[1]: 1006 (expected 1011) [!!!]
>> [2] baseptr[2]: 1007 (expected 1012) [!!!]
>> [2] baseptr[3]: 1008 (expected 1013) [!!!]
>> [2] baseptr[4]: 1009 (expected 1014) [!!!]
>> [3] baseptr[0]: 1010 (expected 1010)
>> [0] baseptr[0]: 1010 (expected 1010)
>> [0] baseptr[1]: 1011 (expected 1011)
>> [0] baseptr[2]: 1012 (expected 1012)
>> [0] baseptr[3]: 1013 (expected 1013)
>> [0] baseptr[4]: 1014 (expected 1014)
>> [3] baseptr[1]: 1011 (expected 1011)
>> [3] baseptr[2]: 1012 (expected 1012)
>> [3] baseptr[3]: 1013 (expected 1013)
>> [3] baseptr[4]: 1014 (expected 1014)
>> ```
>>
>> Each process should hold the same values but sometimes (not on all
>> executions) random processes diverge (marked through [!!!]).
>>
>> I made the following observations:
>>
>> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH
>> 3.2.
>> 2) The issue occurs only if the window is allocated through
>> MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
>> 3) The code assumes that MPI_Accumulate atomically updates individual
>> elements (please correct me if that is not covered by the MPI standard).
>>
>> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
>> on a Linux system (single node). OpenMPI was configure with
>> --enable-mpi-thread-multiple and --with-threads but the application is not
>> multi-threaded. Please let me know if you need any other information.
>>
>> Cheers
>> Joseph
>>
>> --
>> Dipl.-Inf. Joseph Schuchart
>> High Performance Computing Center Stuttgart (HLRS)
>> Nobelstr. 19
>> D-70569 Stuttgart
>>
>> Tel.: +49(0)711-68565890
>> Fax: +49(0)711-6856832
>> E-Mail: schuch...@hlrs.de
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
>
> ___
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890 <+49%20711%2068565890>
> Fax: +49(0)711-6856832 <+49%20711%206856832>
> E-Mail: schuch...@hlrs.de
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Shared Windows and MPI_Accumulate

2017-03-01 Thread Howard Pritchard
Hi Joseph,

I built this test with craypich (Cray MPI) and it passed.  I also tried
with Open MPI master and the test passed.  I also tried with 2.0.2
and can't seem to reproduce on my system.

Could you post the output of config.log?

Also, how intermittent is the problem?


Thanks,

Howard




2017-03-01 8:03 GMT-07:00 Joseph Schuchart :

> Hi all,
>
> We are seeing issues in one of our applications, in which processes in a
> shared communicator allocate a shared MPI window and execute MPI_Accumulate
> simultaneously on it to iteratively update each process' values. The test
> boils down to the sample code attached. Sample output is as follows:
>
> ```
> $ mpirun -n 4 ./mpi_shared_accumulate
> [1] baseptr[0]: 1010 (expected 1010)
> [1] baseptr[1]: 1011 (expected 1011)
> [1] baseptr[2]: 1012 (expected 1012)
> [1] baseptr[3]: 1013 (expected 1013)
> [1] baseptr[4]: 1014 (expected 1014)
> [2] baseptr[0]: 1005 (expected 1010) [!!!]
> [2] baseptr[1]: 1006 (expected 1011) [!!!]
> [2] baseptr[2]: 1007 (expected 1012) [!!!]
> [2] baseptr[3]: 1008 (expected 1013) [!!!]
> [2] baseptr[4]: 1009 (expected 1014) [!!!]
> [3] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[0]: 1010 (expected 1010)
> [0] baseptr[1]: 1011 (expected 1011)
> [0] baseptr[2]: 1012 (expected 1012)
> [0] baseptr[3]: 1013 (expected 1013)
> [0] baseptr[4]: 1014 (expected 1014)
> [3] baseptr[1]: 1011 (expected 1011)
> [3] baseptr[2]: 1012 (expected 1012)
> [3] baseptr[3]: 1013 (expected 1013)
> [3] baseptr[4]: 1014 (expected 1014)
> ```
>
> Each process should hold the same values but sometimes (not on all
> executions) random processes diverge (marked through [!!!]).
>
> I made the following observations:
>
> 1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with MPICH
> 3.2.
> 2) The issue occurs only if the window is allocated through
> MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
> 3) The code assumes that MPI_Accumulate atomically updates individual
> elements (please correct me if that is not covered by the MPI standard).
>
> Both OpenMPI and the example code were compiled using GCC 5.4.1 and run on
> a Linux system (single node). OpenMPI was configure with
> --enable-mpi-thread-multiple and --with-threads but the application is not
> multi-threaded. Please let me know if you need any other information.
>
> Cheers
> Joseph
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuch...@hlrs.de
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users