Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-08 Thread Paul E. McKenney
On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote:
 
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
 [...]
  Just to make sure I understand -- the reason that the del functions
  say no memory barrier instead of acts like rcu_dereference() is
  that the del functions don't return anything.
 
 [...]
   @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct 
   cds_lfht_node *node);
 * function.
 * Call with rcu_read_lock held.
 * Threads calling this API need to be registered RCU read-side threads.
   + * This function does not issue any memory barrier.
 */
 
 One more question about the del memory ordering semantic. Following
 commit
 
 commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
 Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com
 Date:   Mon Dec 19 16:45:51 2011 -0500
 
 rculfhash: Relax atomicity guarantees required by removal operation
 
 The atomicity guarantees for the removal operation do not need to be as
 strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
 introduced REMOVAL_OWNER_FLAG to perform removal.
 
 Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com
 
 
 The del operation is performed in two steps:
 
 1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone)
 unconditionally into the node's next pointer.
 2 - uatomic_xchg(), which atomically exchanges the old pointer with
 its current value (read) or'd with the REMOVAL_OWNER flag. The trick
 is that if the xchg returns a pointer with the REMOVAL_OWNER flag
 set, it means we are not the first thread to set this flag, so we
 should not free the node. However, if xchg returns a node without
 the REMOVAL_OWNER flag set, we are indeed the first to set it, so
 we should call free.
 
 Now regarding memory ordering semantics, should we consider the atomic
 action of del to apply when the or is called, or when the xchg is
 called ? Or should we simply document that the del effect on the node
 happens in two separate steps ?
 
 The way I see it, the actual effect of removal, as seen from RCU read
 traversal and lookup point of view, is observed as soon as the REMOVED
 tombstone is set, so I would think that the atomic publication of the
 removal is performed by the or.
 
 However, we ensure full memory barriers around xchg, but not usually 
 around or. Therefore, the current implementation does not issue a 
 memory barrier before the or, so we should either change our planned
 memory barrier documentation, or the implementation, to match. This
 would probably require creation of a cmm_smp_mb__before_uatomic_or(), so
 x86 does not end up issuing a useless memory barrer.

My kneejerk reaction is that the or is really doing the deletion.
Readers and other updaters care about the deletion, not about which CPU
is going to do the free.

Or did I misunderstand how this works?

Thanx, Paul

 Thoughts ?
 
 Thanks,
 
 Mathieu
 
 -- 
 Mathieu Desnoyers
 Operating System Efficiency RD Consultant
 EfficiOS Inc.
 http://www.efficios.com
 
 ___
 rp mailing list
 r...@svcs.cs.pdx.edu
 http://svcs.cs.pdx.edu/mailman/listinfo/rp
 


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-08 Thread Mathieu Desnoyers
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote:
  
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
  [...]
   Just to make sure I understand -- the reason that the del functions
   say no memory barrier instead of acts like rcu_dereference() is
   that the del functions don't return anything.
  
  [...]
@@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct 
cds_lfht_node *node);
  * function.
  * Call with rcu_read_lock held.
  * Threads calling this API need to be registered RCU read-side 
threads.
+ * This function does not issue any memory barrier.
  */
  
  One more question about the del memory ordering semantic. Following
  commit
  
  commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
  Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com
  Date:   Mon Dec 19 16:45:51 2011 -0500
  
  rculfhash: Relax atomicity guarantees required by removal operation
  
  The atomicity guarantees for the removal operation do not need to be as
  strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
  introduced REMOVAL_OWNER_FLAG to perform removal.
  
  Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com
  
  
  The del operation is performed in two steps:
  
  1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone)
  unconditionally into the node's next pointer.
  2 - uatomic_xchg(), which atomically exchanges the old pointer with
  its current value (read) or'd with the REMOVAL_OWNER flag. The trick
  is that if the xchg returns a pointer with the REMOVAL_OWNER flag
  set, it means we are not the first thread to set this flag, so we
  should not free the node. However, if xchg returns a node without
  the REMOVAL_OWNER flag set, we are indeed the first to set it, so
  we should call free.
  
  Now regarding memory ordering semantics, should we consider the atomic
  action of del to apply when the or is called, or when the xchg is
  called ? Or should we simply document that the del effect on the node
  happens in two separate steps ?
  
  The way I see it, the actual effect of removal, as seen from RCU read
  traversal and lookup point of view, is observed as soon as the REMOVED
  tombstone is set, so I would think that the atomic publication of the
  removal is performed by the or.
  
  However, we ensure full memory barriers around xchg, but not usually 
  around or. Therefore, the current implementation does not issue a 
  memory barrier before the or, so we should either change our planned
  memory barrier documentation, or the implementation, to match. This
  would probably require creation of a cmm_smp_mb__before_uatomic_or(), so
  x86 does not end up issuing a useless memory barrer.
 
 My kneejerk reaction is that the or is really doing the deletion.
 Readers and other updaters care about the deletion, not about which CPU
 is going to do the free.
 
 Or did I misunderstand how this works?

You got it right, this is how I see it too.

However, in order to provide a full memory barrier before the or, we'd
need to add a cmm_smp_mb() before the or (I don't think we want to
presume that our or operation issues full memory barriers on all
architectures).

However, on x86, the lock; or does issue a full memory barrier. So I
think we should introduce a macro that can translate into a memory
barrier on architectures that need it, and to nothing on x86.

Thoughts ?

Thanks,

Mathieu

 
   Thanx, Paul
 
  Thoughts ?
  
  Thanks,
  
  Mathieu
  
  -- 
  Mathieu Desnoyers
  Operating System Efficiency RD Consultant
  EfficiOS Inc.
  http://www.efficios.com
  
  ___
  rp mailing list
  r...@svcs.cs.pdx.edu
  http://svcs.cs.pdx.edu/mailman/listinfo/rp
  
 
 
 ___
 lttng-dev mailing list
 lttng-dev@lists.lttng.org
 http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
Operating System Efficiency RD Consultant
EfficiOS Inc.
http://www.efficios.com

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-08 Thread Paul E. McKenney
On Tue, May 08, 2012 at 02:48:27PM -0400, Mathieu Desnoyers wrote:
 * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote:
   
   * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
   [...]
Just to make sure I understand -- the reason that the del functions
say no memory barrier instead of acts like rcu_dereference() is
that the del functions don't return anything.
   
   [...]
 @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct 
 cds_lfht_node *node);
   * function.
   * Call with rcu_read_lock held.
   * Threads calling this API need to be registered RCU read-side 
 threads.
 + * This function does not issue any memory barrier.
   */
   
   One more question about the del memory ordering semantic. Following
   commit
   
   commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
   Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com
   Date:   Mon Dec 19 16:45:51 2011 -0500
   
   rculfhash: Relax atomicity guarantees required by removal operation
   
   The atomicity guarantees for the removal operation do not need to be 
   as
   strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
   introduced REMOVAL_OWNER_FLAG to perform removal.
   
   Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com
   
   
   The del operation is performed in two steps:
   
   1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone)
   unconditionally into the node's next pointer.
   2 - uatomic_xchg(), which atomically exchanges the old pointer with
   its current value (read) or'd with the REMOVAL_OWNER flag. The trick
   is that if the xchg returns a pointer with the REMOVAL_OWNER flag
   set, it means we are not the first thread to set this flag, so we
   should not free the node. However, if xchg returns a node without
   the REMOVAL_OWNER flag set, we are indeed the first to set it, so
   we should call free.
   
   Now regarding memory ordering semantics, should we consider the atomic
   action of del to apply when the or is called, or when the xchg is
   called ? Or should we simply document that the del effect on the node
   happens in two separate steps ?
   
   The way I see it, the actual effect of removal, as seen from RCU read
   traversal and lookup point of view, is observed as soon as the REMOVED
   tombstone is set, so I would think that the atomic publication of the
   removal is performed by the or.
   
   However, we ensure full memory barriers around xchg, but not usually 
   around or. Therefore, the current implementation does not issue a 
   memory barrier before the or, so we should either change our planned
   memory barrier documentation, or the implementation, to match. This
   would probably require creation of a cmm_smp_mb__before_uatomic_or(), so
   x86 does not end up issuing a useless memory barrer.
  
  My kneejerk reaction is that the or is really doing the deletion.
  Readers and other updaters care about the deletion, not about which CPU
  is going to do the free.
  
  Or did I misunderstand how this works?
 
 You got it right, this is how I see it too.
 
 However, in order to provide a full memory barrier before the or, we'd
 need to add a cmm_smp_mb() before the or (I don't think we want to
 presume that our or operation issues full memory barriers on all
 architectures).
 
 However, on x86, the lock; or does issue a full memory barrier. So I
 think we should introduce a macro that can translate into a memory
 barrier on architectures that need it, and to nothing on x86.
 
 Thoughts ?

Makes sense to me!

Thanx, Paul

 Thanks,
 
 Mathieu
 
  
  Thanx, Paul
  
   Thoughts ?
   
   Thanks,
   
   Mathieu
   
   -- 
   Mathieu Desnoyers
   Operating System Efficiency RD Consultant
   EfficiOS Inc.
   http://www.efficios.com
   
   ___
   rp mailing list
   r...@svcs.cs.pdx.edu
   http://svcs.cs.pdx.edu/mailman/listinfo/rp
   
  
  
  ___
  lttng-dev mailing list
  lttng-dev@lists.lttng.org
  http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
 
 -- 
 Mathieu Desnoyers
 Operating System Efficiency RD Consultant
 EfficiOS Inc.
 http://www.efficios.com
 


___
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev