Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-23 Thread Chris M. Thomasson
On Friday, April 20, 2018 at 2:06:22 AM UTC-7, Dmitry Vyukov wrote:
>
> On Mon, Apr 16, 2018 at 12:32 AM, Chris M. Thomasson 
>  wrote: 
> > 
> > 
> > On Friday, April 13, 2018 at 11:45:51 PM UTC-7, Dmitry Vyukov wrote: 
> >> 
> >> On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson  
>
> >> wrote: 
> >> > On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote: 
> >> >> 
> >> >> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson 
> >> >>  
> >> >> wrote: 
> >> >> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov 
> wrote: 
> >> >> >> 
> >> >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson 
> >> >> >>  wrote: 
> [...]
> > D should be a pure relaxed store, and C should not be covered by the 
> > RELEASE. Iirc, it works this way on SPARC RMO mode. However, on x86, C 
> will 
> > be covered because each store has implied release characteristics, wb 
> memory 
> > aside for a moment. 
>
> C and D are completely symmetric wrt the RELEASE. Later you can 
> discover that there is also a thread that does: 
>
> // consumer 2 
> while (C != 3) backoff; 
> ACQUIRE 
> assert(A == 1 && B == 2); 
>
> And now suddenly C is release operation exactly the same way D is. 
>

Argh! D is the "control" signal. C should not be used as a signal to 
acquire. Damn. C should not have to be involved because the RELEASE is 
_before_ C was assigned.

Argh.


> >> At lease this is how this is defined in C/C++ standards. 
> >> ACQUIRE/RELEASE fences do not establish any happens-before relations 
> >> themselves. You still need a load in one thread to observe a value 
> >> stored in another thread. And only that "materializes" standalone 
> >> fence synchronization. So a store that materializes RELEASE fence will 
> >> always be a subsequent store. 
> > 
> > 
> > Humm... That is too strict, and has to be there whether we use 
> standalone 
> > fences or not. 
>
> No, in C/C++ memory ordering constrains tied to memory operations act 
> only on that memory operation. 


> Consider: 
>
> DATA = 1; 
> C.store(1, memory_order_release); 
> D.store(1, memory_order_relaxed); 
>
> vs: 
>
> DATA = 1; 
> atomic_thread_fence(memory_order_release); 
> C.store(1, memory_order_relaxed); 
> D.store(1, memory_order_relaxed); 
>
>
> And 2 consumers: 
>
> // consumer 1 
> while (C.load(memory_order_acquire) == 0) backoff(); 
> assert(DATA == 1); 
>
> // consumer 2 
> while (D.load(memory_order_acquire) == 0) backoff(); 
> assert(DATA == 1); 
>
> Both consumers are correct wrt the atomic_thread_fence version of 
> producer. But only the first one is correct wrt the store(1, 
> memory_order_release) version of producer. 
>
> And this can actually break on x86 because: 
>

Iirc, x86 has implied release semantics on stores, and acquire on loads. If 
we load C and observe 1, we will see DATA as 1. However, D is not in sync 
at all. If we load D and see it as being 1, then C = 1 and DATA = 1.

Consumer 1 will see C = 1 and DATA = 1: D will be out of sync
Consumer 2 will see D = 1, C = 1 and DATA = 1: They will all be in sync


 [...]

> > Yes! The stand alone fence can say, we want to perform an acquire 
> barrier 
> > wrt m_head. Something like that should be able to create more fine grain 
> > setups. Perhaps even something like the following pseudo-code: 
> > __ 
> > // setup 
> > int a = 0; 
> > int b = 0; 
> > int c = 0; 
> > signal = false; 
> > 
> > // producer 
> > a = 1; 
> > b = 2; 
> > RELEASE(, , ); 
> > c = 3; 
> > STORE_RELAXED(, true); 
> > 
> > // consumers 
> > while (LOAD_RELAXED() != true) backoff; 
> > ACQUIRE(, , ); 
> > assert(a == 1 && b == 2); 
> > __ 
> > 
> > The consumers would always see a and b as 1 and 2, however, c was not 
> > covered, so it is an incoherent state wrt said consumers. 
> > 
> > The acquire would only target a and b, as would the release. 
> > 
> > Hummm... Just thinking out loud here. :^) 
>
> This would help verification tools tremendously... but unfortunately 
> it's not the reality we are living in :) 
>

Shi% happens! However, it would allow one to combine the flexibility of 
standalone fences and targeted memory addresses all in one.

Need to think some more on this.

Thank you! :^D

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/24cd3cc6-49bc-4e9c-b045-4eb7650ddf7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-20 Thread Dmitry Vyukov
On Mon, Apr 16, 2018 at 12:32 AM, Chris M. Thomasson
 wrote:
>
>
> On Friday, April 13, 2018 at 11:45:51 PM UTC-7, Dmitry Vyukov wrote:
>>
>> On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson 
>> wrote:
>> > On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote:
>> >>
>> >> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson
>> >> 
>> >> wrote:
>> >> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote:
>> >> >>
>> >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
>> >> >>  wrote:
>> >> >> > Notice how there is an acquire barrier inside of the CAS loop
>> >> >> > within
>> >> >> > the
>> >> >> > enqueue and dequeue functions of:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>> >> [...]
>> >> > Executing an acquire barrier on every iteration of the CAS loop is
>> >> > not
>> >> > necessary. The actual version count keeps everything in order.
>> >> >
>> >> > However, you do need a single acquire fence _after_ the CAS loop
>> >> > succeeds in
>> >> > order to get a clear view of the element.
>> >>
>> >> This is true.
>> >
>> >
>> > Agreed. I personally like the ability to see the membars being separated
>> > out
>> > and
>> >
>> > standing alone. It is a habit of mine from SPARC. Now, tagged standalone
>> > membars
>> >
>> > aside for a moment, perhaps ones that can include memory locations they
>> > are
>> >
>> > interested in... ;^)
>> >
>> >
>> >>
>> >>
>> >> I don't like standalone fences because they are plague for
>> >> verification. Consider, a single release fences turns _all_ subsequent
>> >> relaxed atomic stores ever executed by the thread into release
>> >> operations (they release memory state up to the fence point) and
>> >> handling of acquire/release operations is an O(N) operation (and
>> >> generally done under a mutex).
>> >
>> >
>> > A release operation should make sure all _prior_ operations are visible
>> > _before_
>> >
>> > they are visible to another thread. They have no effect on subsequent
>> > relaxed
>> >
>> > operations. For instance:
>> >
>> >
>> > // producer
>> >
>> > A = 1
>> >
>> > B = 2
>> >
>> > RELEASE
>> >
>> > C = 3
>> >
>> > D = 4
>> >
>> >
>> > // consumer
>> >
>> > while (D != 4) backoff;
>> >
>> > ACQUIRE
>> >
>> > assert(A == 1 && B == 2);
>> >
>> >
>> > Well, A and B are going be in sync with an acquire such that the assert
>> > will
>> > never
>> > fail, however C can be hoisted up and not be in sync at all! C is
>> > incoherent
>> > wrt the
>> > consumer because it was not covered by the standalone release barrier.
>>
>>
>> In this case the RELEASE turned store to D into a release-store (a
>> subsequent store).
>> And ACQUIRE turned load of D into an acquire-load (a preceding load).
>
>
> D should be a pure relaxed store, and C should not be covered by the
> RELEASE. Iirc, it works this way on SPARC RMO mode. However, on x86, C will
> be covered because each store has implied release characteristics, wb memory
> aside for a moment.

C and D are completely symmetric wrt the RELEASE. Later you can
discover that there is also a thread that does:

// consumer 2
while (C != 3) backoff;
ACQUIRE
assert(A == 1 && B == 2);

And now suddenly C is release operation exactly the same way D is.


>> At lease this is how this is defined in C/C++ standards.
>> ACQUIRE/RELEASE fences do not establish any happens-before relations
>> themselves. You still need a load in one thread to observe a value
>> stored in another thread. And only that "materializes" standalone
>> fence synchronization. So a store that materializes RELEASE fence will
>> always be a subsequent store.
>
>
> Humm... That is too strict, and has to be there whether we use standalone
> fences or not.

No, in C/C++ memory ordering constrains tied to memory operations act
only on that memory operation.

Consider:

DATA = 1;
C.store(1, memory_order_release);
D.store(1, memory_order_relaxed);

vs:

DATA = 1;
atomic_thread_fence(memory_order_release);
C.store(1, memory_order_relaxed);
D.store(1, memory_order_relaxed);


And 2 consumers:

// consumer 1
while (C.load(memory_order_acquire) == 0) backoff();
assert(DATA == 1);

// consumer 2
while (D.load(memory_order_acquire) == 0) backoff();
assert(DATA == 1);

Both consumers are correct wrt the atomic_thread_fence version of
producer. But only the first one is correct wrt the store(1,
memory_order_release) version of producer.

And this can actually break on x86 because:

DATA = 1;
C.store(1, memory_order_release);
D.store(1, memory_order_relaxed);

can be compiled to machine code as:

D.store(1, memory_order_relaxed);
DATA = 1;
C.store(1, memory_order_release);

But:

DATA = 1;
atomic_thread_fence(memory_order_release);
C.store(1, memory_order_relaxed);
D.store(1, memory_order_relaxed);

cannot be compiled to machine code as (store to D cannot hoist above
the release fence):


Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-15 Thread Chris M. Thomasson


On Friday, April 13, 2018 at 11:45:51 PM UTC-7, Dmitry Vyukov wrote:
>
> On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson  > wrote: 
> > On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote: 
> >> 
> >> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson  
>
> >> wrote: 
> >> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote: 
> >> >> 
> >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson 
> >> >>  wrote: 
> >> >> > Notice how there is an acquire barrier inside of the CAS loop 
> within 
> >> >> > the 
> >> >> > enqueue and dequeue functions of: 
> >> >> > 
> >> >> > 
> >> >> > 
> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue 
> >> [...] 
> >> > Executing an acquire barrier on every iteration of the CAS loop is 
> not 
> >> > necessary. The actual version count keeps everything in order. 
> >> > 
> >> > However, you do need a single acquire fence _after_ the CAS loop 
> >> > succeeds in 
> >> > order to get a clear view of the element. 
> >> 
> >> This is true. 
> > 
> > 
> > Agreed. I personally like the ability to see the membars being separated 
> out 
> > and 
> > 
> > standing alone. It is a habit of mine from SPARC. Now, tagged standalone 
> > membars 
> > 
> > aside for a moment, perhaps ones that can include memory locations they 
> are 
> > 
> > interested in... ;^) 
> > 
> > 
> >> 
> >> 
> >> I don't like standalone fences because they are plague for 
> >> verification. Consider, a single release fences turns _all_ subsequent 
> >> relaxed atomic stores ever executed by the thread into release 
> >> operations (they release memory state up to the fence point) and 
> >> handling of acquire/release operations is an O(N) operation (and 
> >> generally done under a mutex). 
> > 
> > 
> > A release operation should make sure all _prior_ operations are visible 
> > _before_ 
> > 
> > they are visible to another thread. They have no effect on subsequent 
> > relaxed 
> > 
> > operations. For instance: 
> > 
> > 
> > // producer 
> > 
> > A = 1 
> > 
> > B = 2 
> > 
> > RELEASE 
> > 
> > C = 3 
> > 
> > D = 4 
> > 
> > 
> > // consumer 
> > 
> > while (D != 4) backoff; 
> > 
> > ACQUIRE 
> > 
> > assert(A == 1 && B == 2); 
> > 
> > 
> > Well, A and B are going be in sync with an acquire such that the assert 
> will 
> > never 
> > fail, however C can be hoisted up and not be in sync at all! C is 
> incoherent 
> > wrt the 
> > consumer because it was not covered by the standalone release barrier. 
>
>
> In this case the RELEASE turned store to D into a release-store (a 
> subsequent store). 
> And ACQUIRE turned load of D into an acquire-load (a preceding load). 
>

D should be a pure relaxed store, and C should not be covered by the 
RELEASE. Iirc, it works this way on SPARC RMO mode. However, on x86, C will 
be covered because each store has implied release characteristics, wb 
memory aside for a moment.

 

>
> At lease this is how this is defined in C/C++ standards. 
> ACQUIRE/RELEASE fences do not establish any happens-before relations 
> themselves. You still need a load in one thread to observe a value 
> stored in another thread. And only that "materializes" standalone 
> fence synchronization. So a store that materializes RELEASE fence will 
> always be a subsequent store. 
>

Humm... That is too strict, and has to be there whether we use standalone 
fences or not. The store to D = 4 makes A and B wrt the RELEASE visible to 
the consumer threads that look for D = 4 and execute the ACQUIRE barrier 
after that fact has been observed. Afaict, C should NOT be covered.

 

>
>
> >> The same for acquire fences: a single 
> >> acquire fences turns _all_ loads ever executed by the thread into 
> >> acquire operations ton he corresponding memory locations, which means 
> >> that you need to handle all relaxed loads as a "shadow" acquire loads 
> >> for the case they will be materialized by a subsequent acquire fence. 
>

That sounds to coarse.

 

> > 
> > 
> > An acquire operation should make sure all operations wrt the release are 
> > visible 
> > 
> > _before_ any subsequent operations can be performed _after_ that fact is 
> > 
> > accomplished. 
> > 
> > 
> > Well, fwiw, the membars that can be embedded into the CAS wrt acquire 
> and 
> > 
> > release do effect prior and subsequent activity anyway, standalone or 
> not. A 
> > release will 
> > 
> > dump prior stores such that an acquire barrier will see them all. Now, 
> when 
> > we 
> > 
> > are dealing with a consume barrier, well that is targeting the release 
> > dependency 
> > 
> > chain wrt the pointer. A consume barrier is more precisely targeted when 
> > compared 
> > 
> > to the wider spectrum of an acquire. Btw, iirc consume is emulated in 
> Relacy 
> > as 
> > 
> > acquire right? 
> > 
> > 
> > Also, think of popping all nodes at once from an atomic LIFO: 
> > 
> > 
> > 

Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-14 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson  wrote:
> On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote:
>>
>> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson 
>> wrote:
>> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote:
>> >>
>> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
>> >>  wrote:
>> >> > Notice how there is an acquire barrier inside of the CAS loop within
>> >> > the
>> >> > enqueue and dequeue functions of:
>> >> >
>> >> >
>> >> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>> [...]
>> > Executing an acquire barrier on every iteration of the CAS loop is not
>> > necessary. The actual version count keeps everything in order.
>> >
>> > However, you do need a single acquire fence _after_ the CAS loop
>> > succeeds in
>> > order to get a clear view of the element.
>>
>> This is true.
>
>
> Agreed. I personally like the ability to see the membars being separated out
> and
>
> standing alone. It is a habit of mine from SPARC. Now, tagged standalone
> membars
>
> aside for a moment, perhaps ones that can include memory locations they are
>
> interested in... ;^)
>
>
>>
>>
>> I don't like standalone fences because they are plague for
>> verification. Consider, a single release fences turns _all_ subsequent
>> relaxed atomic stores ever executed by the thread into release
>> operations (they release memory state up to the fence point) and
>> handling of acquire/release operations is an O(N) operation (and
>> generally done under a mutex).
>
>
> A release operation should make sure all _prior_ operations are visible
> _before_
>
> they are visible to another thread. They have no effect on subsequent
> relaxed
>
> operations. For instance:
>
>
> // producer
>
> A = 1
>
> B = 2
>
> RELEASE
>
> C = 3
>
> D = 4
>
>
> // consumer
>
> while (D != 4) backoff;
>
> ACQUIRE
>
> assert(A == 1 && B == 2);
>
>
> Well, A and B are going be in sync with an acquire such that the assert will
> never
> fail, however C can be hoisted up and not be in sync at all! C is incoherent
> wrt the
> consumer because it was not covered by the standalone release barrier.


In this case the RELEASE turned store to D into a release-store (a
subsequent store).
And ACQUIRE turned load of D into an acquire-load (a preceding load).

At lease this is how this is defined in C/C++ standards.
ACQUIRE/RELEASE fences do not establish any happens-before relations
themselves. You still need a load in one thread to observe a value
stored in another thread. And only that "materializes" standalone
fence synchronization. So a store that materializes RELEASE fence will
always be a subsequent store.


>> The same for acquire fences: a single
>> acquire fences turns _all_ loads ever executed by the thread into
>> acquire operations ton he corresponding memory locations, which means
>> that you need to handle all relaxed loads as a "shadow" acquire loads
>> for the case they will be materialized by a subsequent acquire fence.
>
>
> An acquire operation should make sure all operations wrt the release are
> visible
>
> _before_ any subsequent operations can be performed _after_ that fact is
>
> accomplished.
>
>
> Well, fwiw, the membars that can be embedded into the CAS wrt acquire and
>
> release do effect prior and subsequent activity anyway, standalone or not. A
> release will
>
> dump prior stores such that an acquire barrier will see them all. Now, when
> we
>
> are dealing with a consume barrier, well that is targeting the release
> dependency
>
> chain wrt the pointer. A consume barrier is more precisely targeted when
> compared
>
> to the wider spectrum of an acquire. Btw, iirc consume is emulated in Relacy
> as
>
> acquire right?
>
>
> Also, think of popping all nodes at once from an atomic LIFO:
>
>
> https://groups.google.com/d/topic/comp.lang.c++/V0s__czQwa0/discussion
>
>
> Well, how can we accomplish the following without using standalone fences?:
>
>
>   // try to flush all of our nodes
>   node* flush()
>   {
>   node* n = m_head.exchange(nullptr, mb_relaxed);
>
>   if (n)
>   {
>   mb_fence(mb_acquire);
>   }
>
>   return n;
>   }

I can't disagree. They are definitely more flexible.


>> The same is actually true for human reasoning. Say, I am reading your
>> code. We have 3 load operations in the loop and an acquire fences
>> after the loop. Now the question is: which of the loads we wanted to
>> turn into acquire by adding the fence? Or is it maybe 2 of them?
>> Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a
>> different function?
>> One can, of course, comment that, but Relacy won't check comments, so
>> I won't trust them ;)
>
>
>
> Interesting. Still makes me think of tagged membars. I will get back to you
> with
>
> a more detailed response.


You mean something like:

   // try to flush 

Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-08 Thread Chris M. Thomasson
On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote:
>
> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson  > wrote: 
> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote: 
> >> 
> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson 
> >>  wrote: 
> >> > Notice how there is an acquire barrier inside of the CAS loop within 
> the 
> >> > enqueue and dequeue functions of: 
> >> > 
> >> > 
> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue 
> [...]
> > Executing an acquire barrier on every iteration of the CAS loop is not 
> > necessary. The actual version count keeps everything in order. 
> > 
> > However, you do need a single acquire fence _after_ the CAS loop 
> succeeds in 
> > order to get a clear view of the element. 
>
> This is true. 
>

Agreed. I personally like the ability to see the membars being separated 
out and 

standing alone. It is a habit of mine from SPARC. Now, tagged standalone 
membars 

aside for a moment, perhaps ones that can include memory locations they are 

interested in... ;^)
 

>
> I don't like standalone fences because they are plague for 
> verification. Consider, a single release fences turns _all_ subsequent 
> relaxed atomic stores ever executed by the thread into release 
> operations (they release memory state up to the fence point) and 
> handling of acquire/release operations is an O(N) operation (and 
> generally done under a mutex).


A release operation should make sure all _prior_ operations are visible 
_before_ 

they are visible to another thread. They have no effect on subsequent 
relaxed 

operations. For instance:


// producer

A = 1

B = 2

RELEASE

C = 3

D = 4


// consumer

while (D != 4) backoff;

ACQUIRE

assert(A == 1 && B == 2);


Well, A and B are going be in sync with an acquire such that the assert 
will never 
fail, however C can be hoisted up and not be in sync at all! C is 
incoherent wrt the 
consumer because it was not covered by the standalone release barrier. 


The same for acquire fences: a single 
> acquire fences turns _all_ loads ever executed by the thread into 
> acquire operations ton he corresponding memory locations, which means 
> that you need to handle all relaxed loads as a "shadow" acquire loads 
> for the case they will be materialized by a subsequent acquire fence. 
>

An acquire operation should make sure all operations wrt the release are 
visible 

_before_ any subsequent operations can be performed _after_ that fact is 

accomplished.


Well, fwiw, the membars that can be embedded into the CAS wrt acquire and 

release do effect prior and subsequent activity anyway, standalone or not. 
A release will 

dump prior stores such that an acquire barrier will see them all. Now, when 
we 

are dealing with a consume barrier, well that is targeting the release 
dependency 

chain wrt the pointer. A consume barrier is more precisely targeted when 
compared 

to the wider spectrum of an acquire. Btw, iirc consume is emulated in 
Relacy as 

acquire right?


Also, think of popping all nodes at once from an atomic LIFO:


https://groups.google.com/d/topic/comp.lang.c++/V0s__czQwa0/discussion


Well, how can we accomplish the following without using standalone fences?:


  // try to flush all of our nodes 
  node* flush() 
  { 
  node* n = m_head.exchange(nullptr, mb_relaxed); 

  if (n) 
  { 
  mb_fence(mb_acquire); 
  } 

  return n; 
  }   

 

>
> The same is actually true for human reasoning. Say, I am reading your 
> code. We have 3 load operations in the loop and an acquire fences 
> after the loop. Now the question is: which of the loads we wanted to 
> turn into acquire by adding the fence? Or is it maybe 2 of them? 
> Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a 
> different function? 
> One can, of course, comment that, but Relacy won't check comments, so 
> I won't trust them ;) 
>
 

Interesting. Still makes me think of tagged membars. I will get back to you 
with 

a more detailed response.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/22caa87d-eef7-462e-a7a2-8e54b5436fab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [lock-free] One reason why I like atomic_thread_fence...

2018-04-03 Thread Dmitry Vyukov
On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
 wrote:
> Notice how there is an acquire barrier inside of the CAS loop within the
> enqueue and dequeue functions of:
>
>
> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>
>
> ?
>
>
> Well, fwiw we can get rid of them by using stand alone fences:
>
>
> Btw, sorry for the membar abstraction. I can quickly get sick and tired of
> having to type out (std::memory_order_relaxed) all the damn time. ;^)
>
>
> Anyway, here is the simple code using Relacy:
>
> ___
>
> /* Membar Abstraction
>
> ___*/
>
> #define mbrlx std::memory_order_relaxed
>
> #define mbacq std::memory_order_acquire
>
> #define mbrel std::memory_order_release
>
> #define mb(mb_membar) std::atomic_thread_fence(mb_membar)
>
>
>
> template
>
> struct fifo
>
> {
>
> struct cell
>
> {
>
> VAR_T(T) m_data;
>
> std::atomic m_ver;
>
> };
>
>
> std::atomic m_head;
>
> std::atomic m_tail;
>
> cell m_cells[T_N];
>
>
> fifo() : m_head(0), m_tail(0)
>
> {
>
> for (unsigned int i = 0; i < T_N; ++i)
>
> {
>
> m_cells[i].m_ver.store(i, mbrlx);
>
> }
>
> }
>
>
> bool push_try(T const& data)
>
> {
>
> cell* c = nullptr;
>
> unsigned int ver = m_head.load(mbrlx);
>
>
> do
>
> {
>
> c = _cells[ver & (T_N - 1)];
>
> if (c->m_ver.load(mbrlx) != ver) return false;
>
>
> } while (! m_head.compare_exchange_weak(ver, ver + 1, mbrlx));
>
>
> mb(mbacq);
>
> VAR(c->m_data) = data;
>
> mb(mbrel);
>
>
> c->m_ver.store(ver + 1, mbrlx);
>
>
> return true;
>
> }
>
>
> bool pop_try(T& data)
>
> {
>
> cell* c = nullptr;
>
> unsigned int ver = m_tail.load(mbrlx);
>
>
> do
>
> {
>
> c = _cells[ver & (T_N - 1)];
>
> if (c->m_ver.load(mbrlx) != ver + 1) return false;
>
>
> } while (! m_tail.compare_exchange_weak(ver, ver + 1, mbrlx));
>
>
> mb(mbacq);
>
> data = VAR(c->m_data);
>
> mb(mbrel);
>
>
> c->m_ver.store(ver + T_N, mbrlx);
>
>
> return true;
>
> }
>
> };
>
> ___
>
>
>
> There is no need for any membars within the loops. The version count keeps
> everything in order.

compare_exchange in C/C++ has optional failure memory ordering, so the
following looks equivalent to me ;)

m_tail.compare_exchange_weak(ver, ver + 1, acquire, relaxed)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3sKGMzW9_okAH3%3Dzp6DDkvSoviE0JEe3LcrEy-7F9t7sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.