Re: [lng-odp] [PATCH] linux-gen: barrier: Use correct memory ordering

2017-09-21 Thread Maxim Uvarov
Merged. Sorry for delay.

Maxim.

On 09/06/17 19:37, Brian Brooks wrote:
> ping
> 
> On Sat, Aug 26, 2017 at 9:52 AM, Bill Fischofer
>  wrote:
>> On Sat, Aug 26, 2017 at 12:40 AM, Brian Brooks  wrote:
>>
>>> Memory accesses that happen-before, in program order, a call to
>>> odp_barrier_wait() cannot be reordered to after the call. Similarly,
>>> memory accesses that happen-after, in program order, a call to
>>> odp_barrier_wait() cannot be reordered to before the call.
>>>
>>> The current implementation of barriers uses sequentially consistent
>>> fences on either side of odp_barrier_wait().
>>>
>>> The correct memory ordering for barriers is release upon entering
>>> odp_barrier_wait(), to prevent reordering to after the barrier, and
>>> acquire upon exiting odp_barrier_wait(), to prevent reordering to
>>> before the barrier.
>>>
>>> The measurable performance difference is negligible on weakly ordered
>>> architectures such as ARM, so the highlight of this change is correctness.
>>>
>>> Signed-off-by: Brian Brooks 
>>>
>>
>> Reviewed-by: Bill Fischofer 
>>
>>
>>> ---
>>>  platform/linux-generic/odp_barrier.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/platform/linux-generic/odp_barrier.c
>>> b/platform/linux-generic/odp_barrier.c
>>> index 5eb354de..f70bdbf8 100644
>>> --- a/platform/linux-generic/odp_barrier.c
>>> +++ b/platform/linux-generic/odp_barrier.c
>>> @@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier)
>>> uint32_t count;
>>> int wasless;
>>>
>>> -   odp_mb_full();
>>> +   odp_mb_release();
>>>
>>> count   = odp_atomic_fetch_inc_u32(&barrier->bar);
>>> wasless = count < barrier->count;
>>> @@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier)
>>> odp_cpu_pause();
>>> }
>>>
>>> -   odp_mb_full();
>>> +   odp_mb_acquire();
>>>  }
>>> --
>>> 2.14.1
>>>
>>>



Re: [lng-odp] [PATCH] linux-gen: barrier: Use correct memory ordering

2017-09-06 Thread Brian Brooks
ping

On Sat, Aug 26, 2017 at 9:52 AM, Bill Fischofer
 wrote:
> On Sat, Aug 26, 2017 at 12:40 AM, Brian Brooks  wrote:
>
>> Memory accesses that happen-before, in program order, a call to
>> odp_barrier_wait() cannot be reordered to after the call. Similarly,
>> memory accesses that happen-after, in program order, a call to
>> odp_barrier_wait() cannot be reordered to before the call.
>>
>> The current implementation of barriers uses sequentially consistent
>> fences on either side of odp_barrier_wait().
>>
>> The correct memory ordering for barriers is release upon entering
>> odp_barrier_wait(), to prevent reordering to after the barrier, and
>> acquire upon exiting odp_barrier_wait(), to prevent reordering to
>> before the barrier.
>>
>> The measurable performance difference is negligible on weakly ordered
>> architectures such as ARM, so the highlight of this change is correctness.
>>
>> Signed-off-by: Brian Brooks 
>>
>
> Reviewed-by: Bill Fischofer 
>
>
>> ---
>>  platform/linux-generic/odp_barrier.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/platform/linux-generic/odp_barrier.c
>> b/platform/linux-generic/odp_barrier.c
>> index 5eb354de..f70bdbf8 100644
>> --- a/platform/linux-generic/odp_barrier.c
>> +++ b/platform/linux-generic/odp_barrier.c
>> @@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier)
>> uint32_t count;
>> int wasless;
>>
>> -   odp_mb_full();
>> +   odp_mb_release();
>>
>> count   = odp_atomic_fetch_inc_u32(&barrier->bar);
>> wasless = count < barrier->count;
>> @@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier)
>> odp_cpu_pause();
>> }
>>
>> -   odp_mb_full();
>> +   odp_mb_acquire();
>>  }
>> --
>> 2.14.1
>>
>>


Re: [lng-odp] [PATCH] linux-gen: barrier: Use correct memory ordering

2017-08-26 Thread Bill Fischofer
On Sat, Aug 26, 2017 at 12:40 AM, Brian Brooks  wrote:

> Memory accesses that happen-before, in program order, a call to
> odp_barrier_wait() cannot be reordered to after the call. Similarly,
> memory accesses that happen-after, in program order, a call to
> odp_barrier_wait() cannot be reordered to before the call.
>
> The current implementation of barriers uses sequentially consistent
> fences on either side of odp_barrier_wait().
>
> The correct memory ordering for barriers is release upon entering
> odp_barrier_wait(), to prevent reordering to after the barrier, and
> acquire upon exiting odp_barrier_wait(), to prevent reordering to
> before the barrier.
>
> The measurable performance difference is negligible on weakly ordered
> architectures such as ARM, so the highlight of this change is correctness.
>
> Signed-off-by: Brian Brooks 
>

Reviewed-by: Bill Fischofer 


> ---
>  platform/linux-generic/odp_barrier.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/platform/linux-generic/odp_barrier.c
> b/platform/linux-generic/odp_barrier.c
> index 5eb354de..f70bdbf8 100644
> --- a/platform/linux-generic/odp_barrier.c
> +++ b/platform/linux-generic/odp_barrier.c
> @@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier)
> uint32_t count;
> int wasless;
>
> -   odp_mb_full();
> +   odp_mb_release();
>
> count   = odp_atomic_fetch_inc_u32(&barrier->bar);
> wasless = count < barrier->count;
> @@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier)
> odp_cpu_pause();
> }
>
> -   odp_mb_full();
> +   odp_mb_acquire();
>  }
> --
> 2.14.1
>
>


[lng-odp] [PATCH] linux-gen: barrier: Use correct memory ordering

2017-08-25 Thread Brian Brooks
Memory accesses that happen-before, in program order, a call to
odp_barrier_wait() cannot be reordered to after the call. Similarly,
memory accesses that happen-after, in program order, a call to
odp_barrier_wait() cannot be reordered to before the call.

The current implementation of barriers uses sequentially consistent
fences on either side of odp_barrier_wait().

The correct memory ordering for barriers is release upon entering
odp_barrier_wait(), to prevent reordering to after the barrier, and
acquire upon exiting odp_barrier_wait(), to prevent reordering to
before the barrier.

The measurable performance difference is negligible on weakly ordered
architectures such as ARM, so the highlight of this change is correctness.

Signed-off-by: Brian Brooks 
---
 platform/linux-generic/odp_barrier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/platform/linux-generic/odp_barrier.c 
b/platform/linux-generic/odp_barrier.c
index 5eb354de..f70bdbf8 100644
--- a/platform/linux-generic/odp_barrier.c
+++ b/platform/linux-generic/odp_barrier.c
@@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier)
uint32_t count;
int wasless;
 
-   odp_mb_full();
+   odp_mb_release();
 
count   = odp_atomic_fetch_inc_u32(&barrier->bar);
wasless = count < barrier->count;
@@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier)
odp_cpu_pause();
}
 
-   odp_mb_full();
+   odp_mb_acquire();
 }
-- 
2.14.1