Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Paul E. McKenney
On Tue, May 01, 2012 at 11:12:15AM -0400, Mathieu Desnoyers wrote:
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
   Hi!
   
   After 1 year of development, with the last 6-7 months spent polishing
   the API and testing the implementation, I think it is getting about time
   to release the RCU lock-free hash table in a new Userspace RCU version
   (0.7).
   
   I recently described the guarantees provided by the hash table in more
   detail, and created tests for the uniqueness guarantee for traversals
   performed concurrently with add_unique and add_replace operations. I
   also added test modes that create long hash chains, to test corner-cases
   of the hash table.
   
   One small thing I wonder is whether we should document that the hash
   table update operations imply full memory barriers ?
  
  The Linux kernel's rule seems good here -- if a hash-table operation is
  atomic and returns a value, it should imply a full barrier.  So:
  
  cds_lfht_new(): No point in claiming barriers -- publishing the
  pointer to the hash table is where the barriers are important.
  
  cds_lfht_destroy(): Ditto.
  
  cds_lfht_lookup(): Not an update (let alone an atomic update), no barriers.
  
  cds_lfht_next_duplicate(): Ditto.
  
  cds_lfht_first(): Ditto.
  
  cds_lfht_next(): Ditto.
  
  cds_lfht_add(): Atomic update, but no return value, so no barrier
  implied.
 
 Yep, makes sense. We use cmpxchg internally to perform the update, but
 it could make sense to eventually use a cmpxchg that has no memory
 barriers to perform this update. So I agree on not providing a memory
 barrier guarantee on the add operation, since it does not return any
 value.
 
  cds_lfht_add_unique(): Atomic update that returns a value, so should
  imply a full memory barrier.
 
 add_unique is a bit special:
 
 - if it returns the node received as parameter, it means the add
   succeeded, which imply an update, and thus a memory barrier.
 
 - if it returns a different node than the one received as parameter, it
   failed, and thus means that it only performed a lookup, so there is no
   guarantee that a memory barrier has been executed.

Good point!  But I would suggest handling this like some architectures
handle a failing cmpxchg() -- there is a memory barrier beforehand,
but not afterwards.  This works well because although the caller can
easily supply a memory barrier after a failing cmpxchg(), there is no
way to supply one beforehand except by supplying a redundant barrier in
the case of a successful cmpxchg().

So I suggest that a failing add_unique() guarantee full memory barrier
beforehand, but not afterwards.

Thanx, Paul

  cds_lfht_add_replace(): Ditto.
  
  cds_lfht_replace(): Ditto.
  
  cds_lfht_del(): Ditto.
 
 Yep, makes sense.
 
 I'll add this documentation in the API.
 
 Thanks!
 
 Mathieu
 
  
  cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
  no barriers.
  
  cds_lfht_resize(): Atomic update, but no return value, so no barrier
  implied.
  
  cds_lfht_for_each(): Not an update (let alone an atomic update),
  no barriers.
  
  cds_lfht_for_each_duplicate(): Ditto.
  
  cds_lfht_for_each_entry(): Ditto.
  
  cds_lfht_for_each_entry_duplicate(): Ditto.
  
  Thanx, Paul
  
   I'm personally getting confident that the hash table API is clean
   enough, and the implementation well tested, but I'd like to have your
   thoughts on the readiness of the hash table for production use.
   
   Thanks,
   
   Mathieu
   
   -- 
   Mathieu Desnoyers
   Operating System Efficiency RD Consultant
   EfficiOS Inc.
   http://www.efficios.com
   
   ___
   rp mailing list
   r...@svcs.cs.pdx.edu
   http://svcs.cs.pdx.edu/mailman/listinfo/rp
   
  
  
  ___
  lttng-dev mailing list
  lttng-dev@lists.lttng.org
  http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
 
 -- 
 Mathieu Desnoyers
 Operating System Efficiency RD Consultant
 EfficiOS Inc.
 http://www.efficios.com
 


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Paul E. McKenney
On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
 * Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote:
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
Hi!

After 1 year of development, with the last 6-7 months spent polishing
the API and testing the implementation, I think it is getting about time
to release the RCU lock-free hash table in a new Userspace RCU version
(0.7).

I recently described the guarantees provided by the hash table in more
detail, and created tests for the uniqueness guarantee for traversals
performed concurrently with add_unique and add_replace operations. I
also added test modes that create long hash chains, to test corner-cases
of the hash table.

One small thing I wonder is whether we should document that the hash
table update operations imply full memory barriers ?
   
   The Linux kernel's rule seems good here -- if a hash-table operation is
   atomic and returns a value, it should imply a full barrier.  So:
   
   cds_lfht_new(): No point in claiming barriers -- publishing the
 pointer to the hash table is where the barriers are important.
   
   cds_lfht_destroy(): Ditto.
   
   cds_lfht_lookup(): Not an update (let alone an atomic update), no 
   barriers.
   
   cds_lfht_next_duplicate(): Ditto.
   
   cds_lfht_first(): Ditto.
   
   cds_lfht_next(): Ditto.
   
   cds_lfht_add(): Atomic update, but no return value, so no barrier
 implied.
  
  Yep, makes sense. We use cmpxchg internally to perform the update, but
  it could make sense to eventually use a cmpxchg that has no memory
  barriers to perform this update. So I agree on not providing a memory
  barrier guarantee on the add operation, since it does not return any
  value.
  
   
   cds_lfht_add_unique(): Atomic update that returns a value, so should
 imply a full memory barrier.
  
  add_unique is a bit special:
  
  - if it returns the node received as parameter, it means the add
succeeded, which imply an update, and thus a memory barrier.
 
 Hrm, thinking further: if we make the add operation not act as a
 full memory barrier, then the add_unique success should not act as a
 full mb neither.

Think of it as being similar to the Linux kernel's atomic_inc() and
atomic_add_return() primitives.  The latter guarantees memory barriers
and the former does not.

  - if it returns a different node than the one received as parameter, it
failed, and thus means that it only performed a lookup, so there is no
guarantee that a memory barrier has been executed.
  
  
   
   cds_lfht_add_replace(): Ditto.
   
   cds_lfht_replace(): Ditto.
   
   cds_lfht_del(): Ditto.
 
 One more point: del is similar to add_unique: if it succeeds, we
 execute a full memory barrier. If it fails, no memory barrier is
 guaranteed. But if we want to make the guarantees relax, we might not
 want to guarantee that a memory barrier is present in any of the del
 cases.
 
 In the end, the only primitives for which I think it really makes sense
 to provide memory barrier semantic is the add_replace and replace : they
 actually act as an higher-level cmpxchg over the hash table nodes.

I believe that this should do the same -- memory barrier before, but no
memory barrier after on failure.

Another approach is C++11, in which there are a couple of arguments to
the compare-and-swap primitive specifying the memory ordering constraints
for the success and failure cases, respectively.

Unless you expect use cases with lots of failing cds_lfht_del() and
add_unique() calls, the performance difference should not be significant.

Thanx, Paul

 Thoughts ?
 
 Thanks,
 
 Mathieu
 
  
  Yep, makes sense.
  
  I'll add this documentation in the API.
  
  Thanks!
  
  Mathieu
  
   
   cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
 no barriers.
   
   cds_lfht_resize(): Atomic update, but no return value, so no barrier
 implied.
   
   cds_lfht_for_each(): Not an update (let alone an atomic update),
 no barriers.
   
   cds_lfht_for_each_duplicate(): Ditto.
   
   cds_lfht_for_each_entry(): Ditto.
   
   cds_lfht_for_each_entry_duplicate(): Ditto.
   
 Thanx, Paul
   
I'm personally getting confident that the hash table API is clean
enough, and the implementation well tested, but I'd like to have your
thoughts on the readiness of the hash table for production use.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency RD Consultant
EfficiOS Inc.
http://www.efficios.com

___
rp mailing list
r...@svcs.cs.pdx.edu
http://svcs.cs.pdx.edu/mailman/listinfo/rp

   
   
   ___
 

Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Paul E. McKenney
On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
   * Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote:
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
  Hi!
  
  After 1 year of development, with the last 6-7 months spent 
  polishing
  the API and testing the implementation, I think it is getting about 
  time
  to release the RCU lock-free hash table in a new Userspace RCU 
  version
  (0.7).
  
  I recently described the guarantees provided by the hash table in 
  more
  detail, and created tests for the uniqueness guarantee for 
  traversals
  performed concurrently with add_unique and add_replace operations. I
  also added test modes that create long hash chains, to test 
  corner-cases
  of the hash table.
  
  One small thing I wonder is whether we should document that the hash
  table update operations imply full memory barriers ?
 
 The Linux kernel's rule seems good here -- if a hash-table operation 
 is
 atomic and returns a value, it should imply a full barrier.  So:
 
 cds_lfht_new(): No point in claiming barriers -- publishing the
   pointer to the hash table is where the barriers are important.
 
 cds_lfht_destroy(): Ditto.
 
 cds_lfht_lookup(): Not an update (let alone an atomic update), no 
 barriers.
 
 cds_lfht_next_duplicate(): Ditto.
 
 cds_lfht_first(): Ditto.
 
 cds_lfht_next(): Ditto.
 
 cds_lfht_add(): Atomic update, but no return value, so no barrier
   implied.

Yep, makes sense. We use cmpxchg internally to perform the update, but
it could make sense to eventually use a cmpxchg that has no memory
barriers to perform this update. So I agree on not providing a memory
barrier guarantee on the add operation, since it does not return any
value.

 
 cds_lfht_add_unique(): Atomic update that returns a value, so should
   imply a full memory barrier.

add_unique is a bit special:

- if it returns the node received as parameter, it means the add
  succeeded, which imply an update, and thus a memory barrier.
   
   Hrm, thinking further: if we make the add operation not act as a
   full memory barrier, then the add_unique success should not act as a
   full mb neither.
  
  Think of it as being similar to the Linux kernel's atomic_inc() and
  atomic_add_return() primitives.  The latter guarantees memory barriers
  and the former does not.
 
 I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
 different.
 
 atomic_inc:
  - input: increment
  - output: none
  - effect: increment target address value
 
 atomic_add_return:
  - input: increment
  - output: incremented value
  - effect: increment target address value, atomically reading the
resulting value.
 
 hash table add:
  - input: new node to add
  - output: none
  - effect: atomically add the new node into the table
 
 hash table add_unique (success):
  - input: new node to add
  - output: (we just return whether the operation has succeeded)
  - effect: atomically add the new node into the table
 
 hash table add_unique (failure):
  - input: new node to try adding
  - output: (we just return whether the operation has succeeded)
  - effect: simple lookup (read)
 
 So as we see, the add_unique failure only performs a read. Adding a
 memory barrier before this read would require us to add a memory barrier
 also on the success path, which would degrade performance. The success
 path does: lookup failure, cmpxchg to add the node, retry if changed.
 Adding a memory barrier before the lookup would add an extra memory
 barrier in addition to the one located in the cmpxchg, and I don't think
 we want that overhead.

Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
architectures place the memory barrier before unconditionally, but place
a memory barrier after only if the cmpxchg() succeeds.  Of course,
architectures for which the cmpxchg instruction implies a barrier
don't need any explicit memory-barrier instructions.

I do see your point about the memory barrier in the failure case requiring
an additional memory barrier in the success case, however.

- if it returns a different node than the one received as parameter, it
  failed, and thus means that it only performed a lookup, so there is no
  guarantee that a memory barrier has been executed.


 
 cds_lfht_add_replace(): Ditto.
 
 cds_lfht_replace(): Ditto.
 
 cds_lfht_del(): Ditto.
   
   One more point: del is similar to add_unique: if it succeeds, we
   execute a full memory barrier. If it fails, no memory barrier is
   guaranteed. But if 

Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Mathieu Desnoyers
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
* Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote:
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
   Hi!
   
   After 1 year of development, with the last 6-7 months spent 
   polishing
   the API and testing the implementation, I think it is getting 
   about time
   to release the RCU lock-free hash table in a new Userspace RCU 
   version
   (0.7).
   
   I recently described the guarantees provided by the hash table in 
   more
   detail, and created tests for the uniqueness guarantee for 
   traversals
   performed concurrently with add_unique and add_replace 
   operations. I
   also added test modes that create long hash chains, to test 
   corner-cases
   of the hash table.
   
   One small thing I wonder is whether we should document that the 
   hash
   table update operations imply full memory barriers ?
  
  The Linux kernel's rule seems good here -- if a hash-table 
  operation is
  atomic and returns a value, it should imply a full barrier.  So:
  
  cds_lfht_new(): No point in claiming barriers -- publishing the
  pointer to the hash table is where the barriers are important.
  
  cds_lfht_destroy(): Ditto.
  
  cds_lfht_lookup(): Not an update (let alone an atomic update), no 
  barriers.
  
  cds_lfht_next_duplicate(): Ditto.
  
  cds_lfht_first(): Ditto.
  
  cds_lfht_next(): Ditto.
  
  cds_lfht_add(): Atomic update, but no return value, so no barrier
  implied.
 
 Yep, makes sense. We use cmpxchg internally to perform the update, but
 it could make sense to eventually use a cmpxchg that has no memory
 barriers to perform this update. So I agree on not providing a memory
 barrier guarantee on the add operation, since it does not return any
 value.
 
  
  cds_lfht_add_unique(): Atomic update that returns a value, so should
  imply a full memory barrier.
 
 add_unique is a bit special:
 
 - if it returns the node received as parameter, it means the add
   succeeded, which imply an update, and thus a memory barrier.

Hrm, thinking further: if we make the add operation not act as a
full memory barrier, then the add_unique success should not act as a
full mb neither.
   
   Think of it as being similar to the Linux kernel's atomic_inc() and
   atomic_add_return() primitives.  The latter guarantees memory barriers
   and the former does not.
  
  I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
  different.
  
  atomic_inc:
   - input: increment
   - output: none
   - effect: increment target address value
  
  atomic_add_return:
   - input: increment
   - output: incremented value
   - effect: increment target address value, atomically reading the
 resulting value.
  
  hash table add:
   - input: new node to add
   - output: none
   - effect: atomically add the new node into the table
  
  hash table add_unique (success):
   - input: new node to add
   - output: (we just return whether the operation has succeeded)
   - effect: atomically add the new node into the table
  
  hash table add_unique (failure):
   - input: new node to try adding
   - output: (we just return whether the operation has succeeded)
   - effect: simple lookup (read)
  
  So as we see, the add_unique failure only performs a read. Adding a
  memory barrier before this read would require us to add a memory barrier
  also on the success path, which would degrade performance. The success
  path does: lookup failure, cmpxchg to add the node, retry if changed.
  Adding a memory barrier before the lookup would add an extra memory
  barrier in addition to the one located in the cmpxchg, and I don't think
  we want that overhead.
 
 Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
 architectures place the memory barrier before unconditionally, but place
 a memory barrier after only if the cmpxchg() succeeds.  Of course,
 architectures for which the cmpxchg instruction implies a barrier
 don't need any explicit memory-barrier instructions.
 
 I do see your point about the memory barrier in the failure case requiring
 an additional memory barrier in the success case, however.
 
 - if it returns a different node than the one received as parameter, 
 it
   failed, and thus means that it only performed a lookup, so there is 
 no
   guarantee that a memory barrier has been executed.
 
 
  
  cds_lfht_add_replace(): Ditto.
  
  cds_lfht_replace(): 

Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Paul E. McKenney
On Tue, May 01, 2012 at 05:02:06PM -0400, Mathieu Desnoyers wrote:
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
   * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
 * Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote:
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
Hi!

After 1 year of development, with the last 6-7 months spent 
polishing
the API and testing the implementation, I think it is getting 
about time
to release the RCU lock-free hash table in a new Userspace RCU 
version
(0.7).

I recently described the guarantees provided by the hash table 
in more
detail, and created tests for the uniqueness guarantee for 
traversals
performed concurrently with add_unique and add_replace 
operations. I
also added test modes that create long hash chains, to test 
corner-cases
of the hash table.

One small thing I wonder is whether we should document that the 
hash
table update operations imply full memory barriers ?
   
   The Linux kernel's rule seems good here -- if a hash-table 
   operation is
   atomic and returns a value, it should imply a full barrier.  So:
   
   cds_lfht_new(): No point in claiming barriers -- publishing the
 pointer to the hash table is where the barriers are important.
   
   cds_lfht_destroy(): Ditto.
   
   cds_lfht_lookup(): Not an update (let alone an atomic update), no 
   barriers.
   
   cds_lfht_next_duplicate(): Ditto.
   
   cds_lfht_first(): Ditto.
   
   cds_lfht_next(): Ditto.
   
   cds_lfht_add(): Atomic update, but no return value, so no barrier
 implied.
  
  Yep, makes sense. We use cmpxchg internally to perform the update, 
  but
  it could make sense to eventually use a cmpxchg that has no memory
  barriers to perform this update. So I agree on not providing a 
  memory
  barrier guarantee on the add operation, since it does not return 
  any
  value.
  
   
   cds_lfht_add_unique(): Atomic update that returns a value, so 
   should
 imply a full memory barrier.
  
  add_unique is a bit special:
  
  - if it returns the node received as parameter, it means the add
succeeded, which imply an update, and thus a memory barrier.
 
 Hrm, thinking further: if we make the add operation not act as a
 full memory barrier, then the add_unique success should not act as a
 full mb neither.

Think of it as being similar to the Linux kernel's atomic_inc() and
atomic_add_return() primitives.  The latter guarantees memory barriers
and the former does not.
   
   I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
   different.
   
   atomic_inc:
- input: increment
- output: none
- effect: increment target address value
   
   atomic_add_return:
- input: increment
- output: incremented value
- effect: increment target address value, atomically reading the
  resulting value.
   
   hash table add:
- input: new node to add
- output: none
- effect: atomically add the new node into the table
   
   hash table add_unique (success):
- input: new node to add
- output: (we just return whether the operation has succeeded)
- effect: atomically add the new node into the table
   
   hash table add_unique (failure):
- input: new node to try adding
- output: (we just return whether the operation has succeeded)
- effect: simple lookup (read)
   
   So as we see, the add_unique failure only performs a read. Adding a
   memory barrier before this read would require us to add a memory barrier
   also on the success path, which would degrade performance. The success
   path does: lookup failure, cmpxchg to add the node, retry if changed.
   Adding a memory barrier before the lookup would add an extra memory
   barrier in addition to the one located in the cmpxchg, and I don't think
   we want that overhead.
  
  Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
  architectures place the memory barrier before unconditionally, but place
  a memory barrier after only if the cmpxchg() succeeds.  Of course,
  architectures for which the cmpxchg instruction implies a barrier
  don't need any explicit memory-barrier instructions.
  
  I do see your point about the memory barrier in the failure case requiring
  an additional memory barrier in the success case, however.
  
  - if it returns a different node than the one received as 
  parameter, it
failed, 

Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-01 Thread Mathieu Desnoyers
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Tue, May 01, 2012 at 05:02:06PM -0400, Mathieu Desnoyers wrote:
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
  * Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote:
   * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers 
wrote:
 Hi!
 
 After 1 year of development, with the last 6-7 months spent 
 polishing
 the API and testing the implementation, I think it is getting 
 about time
 to release the RCU lock-free hash table in a new Userspace 
 RCU version
 (0.7).
 
 I recently described the guarantees provided by the hash 
 table in more
 detail, and created tests for the uniqueness guarantee for 
 traversals
 performed concurrently with add_unique and add_replace 
 operations. I
 also added test modes that create long hash chains, to test 
 corner-cases
 of the hash table.
 
 One small thing I wonder is whether we should document that 
 the hash
 table update operations imply full memory barriers ?

The Linux kernel's rule seems good here -- if a hash-table 
operation is
atomic and returns a value, it should imply a full barrier.  So:

cds_lfht_new(): No point in claiming barriers -- publishing the
pointer to the hash table is where the barriers are 
important.

cds_lfht_destroy(): Ditto.

cds_lfht_lookup(): Not an update (let alone an atomic update), 
no barriers.

cds_lfht_next_duplicate(): Ditto.

cds_lfht_first(): Ditto.

cds_lfht_next(): Ditto.

cds_lfht_add(): Atomic update, but no return value, so no 
barrier
implied.
   
   Yep, makes sense. We use cmpxchg internally to perform the 
   update, but
   it could make sense to eventually use a cmpxchg that has no memory
   barriers to perform this update. So I agree on not providing a 
   memory
   barrier guarantee on the add operation, since it does not 
   return any
   value.
   

cds_lfht_add_unique(): Atomic update that returns a value, so 
should
imply a full memory barrier.
   
   add_unique is a bit special:
   
   - if it returns the node received as parameter, it means the add
 succeeded, which imply an update, and thus a memory barrier.
  
  Hrm, thinking further: if we make the add operation not act as a
  full memory barrier, then the add_unique success should not act as a
  full mb neither.
 
 Think of it as being similar to the Linux kernel's atomic_inc() and
 atomic_add_return() primitives.  The latter guarantees memory barriers
 and the former does not.

I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
different.

atomic_inc:
 - input: increment
 - output: none
 - effect: increment target address value

atomic_add_return:
 - input: increment
 - output: incremented value
 - effect: increment target address value, atomically reading the
   resulting value.

hash table add:
 - input: new node to add
 - output: none
 - effect: atomically add the new node into the table

hash table add_unique (success):
 - input: new node to add
 - output: (we just return whether the operation has succeeded)
 - effect: atomically add the new node into the table

hash table add_unique (failure):
 - input: new node to try adding
 - output: (we just return whether the operation has succeeded)
 - effect: simple lookup (read)

So as we see, the add_unique failure only performs a read. Adding a
memory barrier before this read would require us to add a memory barrier
also on the success path, which would degrade performance. The success
path does: lookup failure, cmpxchg to add the node, retry if changed.
Adding a memory barrier before the lookup would add an extra memory
barrier in addition to the one located in the cmpxchg, and I don't think
we want that overhead.
   
   Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
   architectures place the memory barrier before unconditionally, but place
   a memory barrier after only if the cmpxchg() succeeds.  Of course,
   architectures for which the cmpxchg instruction implies a barrier
   don't need any explicit memory-barrier instructions.
   
   I do see your