Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table
On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote: * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote: [...] Just to make sure I understand -- the reason that the del functions say no memory barrier instead of acts like rcu_dereference() is that the del functions don't return anything. [...] @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node); * function. * Call with rcu_read_lock held. * Threads calling this API need to be registered RCU read-side threads. + * This function does not issue any memory barrier. */ One more question about the del memory ordering semantic. Following commit commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0 Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com Date: Mon Dec 19 16:45:51 2011 -0500 rculfhash: Relax atomicity guarantees required by removal operation The atomicity guarantees for the removal operation do not need to be as strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly introduced REMOVAL_OWNER_FLAG to perform removal. Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com The del operation is performed in two steps: 1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone) unconditionally into the node's next pointer. 2 - uatomic_xchg(), which atomically exchanges the old pointer with its current value (read) or'd with the REMOVAL_OWNER flag. The trick is that if the xchg returns a pointer with the REMOVAL_OWNER flag set, it means we are not the first thread to set this flag, so we should not free the node. However, if xchg returns a node without the REMOVAL_OWNER flag set, we are indeed the first to set it, so we should call free. Now regarding memory ordering semantics, should we consider the atomic action of del to apply when the or is called, or when the xchg is called ? Or should we simply document that the del effect on the node happens in two separate steps ? The way I see it, the actual effect of removal, as seen from RCU read traversal and lookup point of view, is observed as soon as the REMOVED tombstone is set, so I would think that the atomic publication of the removal is performed by the or. However, we ensure full memory barriers around xchg, but not usually around or. Therefore, the current implementation does not issue a memory barrier before the or, so we should either change our planned memory barrier documentation, or the implementation, to match. This would probably require creation of a cmm_smp_mb__before_uatomic_or(), so x86 does not end up issuing a useless memory barrer. My kneejerk reaction is that the or is really doing the deletion. Readers and other updaters care about the deletion, not about which CPU is going to do the free. Or did I misunderstand how this works? Thanx, Paul Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency RD Consultant EfficiOS Inc. http://www.efficios.com ___ rp mailing list r...@svcs.cs.pdx.edu http://svcs.cs.pdx.edu/mailman/listinfo/rp ___ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote: * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote: [...] Just to make sure I understand -- the reason that the del functions say no memory barrier instead of acts like rcu_dereference() is that the del functions don't return anything. [...] @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node); * function. * Call with rcu_read_lock held. * Threads calling this API need to be registered RCU read-side threads. + * This function does not issue any memory barrier. */ One more question about the del memory ordering semantic. Following commit commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0 Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com Date: Mon Dec 19 16:45:51 2011 -0500 rculfhash: Relax atomicity guarantees required by removal operation The atomicity guarantees for the removal operation do not need to be as strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly introduced REMOVAL_OWNER_FLAG to perform removal. Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com The del operation is performed in two steps: 1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone) unconditionally into the node's next pointer. 2 - uatomic_xchg(), which atomically exchanges the old pointer with its current value (read) or'd with the REMOVAL_OWNER flag. The trick is that if the xchg returns a pointer with the REMOVAL_OWNER flag set, it means we are not the first thread to set this flag, so we should not free the node. However, if xchg returns a node without the REMOVAL_OWNER flag set, we are indeed the first to set it, so we should call free. Now regarding memory ordering semantics, should we consider the atomic action of del to apply when the or is called, or when the xchg is called ? Or should we simply document that the del effect on the node happens in two separate steps ? The way I see it, the actual effect of removal, as seen from RCU read traversal and lookup point of view, is observed as soon as the REMOVED tombstone is set, so I would think that the atomic publication of the removal is performed by the or. However, we ensure full memory barriers around xchg, but not usually around or. Therefore, the current implementation does not issue a memory barrier before the or, so we should either change our planned memory barrier documentation, or the implementation, to match. This would probably require creation of a cmm_smp_mb__before_uatomic_or(), so x86 does not end up issuing a useless memory barrer. My kneejerk reaction is that the or is really doing the deletion. Readers and other updaters care about the deletion, not about which CPU is going to do the free. Or did I misunderstand how this works? You got it right, this is how I see it too. However, in order to provide a full memory barrier before the or, we'd need to add a cmm_smp_mb() before the or (I don't think we want to presume that our or operation issues full memory barriers on all architectures). However, on x86, the lock; or does issue a full memory barrier. So I think we should introduce a macro that can translate into a memory barrier on architectures that need it, and to nothing on x86. Thoughts ? Thanks, Mathieu Thanx, Paul Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency RD Consultant EfficiOS Inc. http://www.efficios.com ___ rp mailing list r...@svcs.cs.pdx.edu http://svcs.cs.pdx.edu/mailman/listinfo/rp ___ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency RD Consultant EfficiOS Inc. http://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table
On Tue, May 08, 2012 at 02:48:27PM -0400, Mathieu Desnoyers wrote: * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote: * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote: [...] Just to make sure I understand -- the reason that the del functions say no memory barrier instead of acts like rcu_dereference() is that the del functions don't return anything. [...] @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node); * function. * Call with rcu_read_lock held. * Threads calling this API need to be registered RCU read-side threads. + * This function does not issue any memory barrier. */ One more question about the del memory ordering semantic. Following commit commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0 Author: Mathieu Desnoyers mathieu.desnoy...@efficios.com Date: Mon Dec 19 16:45:51 2011 -0500 rculfhash: Relax atomicity guarantees required by removal operation The atomicity guarantees for the removal operation do not need to be as strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly introduced REMOVAL_OWNER_FLAG to perform removal. Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com The del operation is performed in two steps: 1 - uatomic_or(), which sets the REMOVED flag (the actual tombstone) unconditionally into the node's next pointer. 2 - uatomic_xchg(), which atomically exchanges the old pointer with its current value (read) or'd with the REMOVAL_OWNER flag. The trick is that if the xchg returns a pointer with the REMOVAL_OWNER flag set, it means we are not the first thread to set this flag, so we should not free the node. However, if xchg returns a node without the REMOVAL_OWNER flag set, we are indeed the first to set it, so we should call free. Now regarding memory ordering semantics, should we consider the atomic action of del to apply when the or is called, or when the xchg is called ? Or should we simply document that the del effect on the node happens in two separate steps ? The way I see it, the actual effect of removal, as seen from RCU read traversal and lookup point of view, is observed as soon as the REMOVED tombstone is set, so I would think that the atomic publication of the removal is performed by the or. However, we ensure full memory barriers around xchg, but not usually around or. Therefore, the current implementation does not issue a memory barrier before the or, so we should either change our planned memory barrier documentation, or the implementation, to match. This would probably require creation of a cmm_smp_mb__before_uatomic_or(), so x86 does not end up issuing a useless memory barrer. My kneejerk reaction is that the or is really doing the deletion. Readers and other updaters care about the deletion, not about which CPU is going to do the free. Or did I misunderstand how this works? You got it right, this is how I see it too. However, in order to provide a full memory barrier before the or, we'd need to add a cmm_smp_mb() before the or (I don't think we want to presume that our or operation issues full memory barriers on all architectures). However, on x86, the lock; or does issue a full memory barrier. So I think we should introduce a macro that can translate into a memory barrier on architectures that need it, and to nothing on x86. Thoughts ? Makes sense to me! Thanx, Paul Thanks, Mathieu Thanx, Paul Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency RD Consultant EfficiOS Inc. http://www.efficios.com ___ rp mailing list r...@svcs.cs.pdx.edu http://svcs.cs.pdx.edu/mailman/listinfo/rp ___ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency RD Consultant EfficiOS Inc. http://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev