* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
On Thu, May 03, 2012 at 01:13:30PM -0400, Mathieu Desnoyers wrote:
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
[...]
A write barrier would be sufficient in the case where there were only
two threads observing each other. A full memory barrier would be
needed
to prevent the assertion from firing in this sort of case (not sure
that
this is exactly right, but something like this):
Initial contents: B, C
T0: add A; del B
T1: if (!lookup B) { add B; del C }
T2: r1 = lookup C; smp_mb(); r2 = lookup A
assert(lookup C || lookup A);
What you are bringing here as counter-example is, I think, transitivity.
Yep!
I'm trying to figure out how your example could fail, and I cannot see
how. Follows a detail of the closest scenario I get to failing is the
following, but it does not fail. After that, I'm proposing a different
scenario, which I think will be more appropriate for the current topic.
Attempted detail of your scenario:
T0 T1 T2
add A
wmb
(add A globally
observable)
Almost. All that this really guarantees is that if someone sees
the add B, they will also see the add A.
del B
(del B globally
observable)
(add A NOT brought
into cache)
(del B brought into
cache)
read B
(test false)
add B
wmb
(add B globally
observable)
del C
(del C globally
observable)
Here, if someone sees the del C, they will see the add B, and
they also will have lost the opportunity to modify B before T1
reads from it and modifies it.
(add A NOT brought
into cache)
(del C brought into
cache)
read C - not there.
mb
(add A brought
into cache)
read A - there - success.
So we see that C is not there. We know that B would be there if
we looked at it. But we don't look at B, we look at A. But the
ordering back to T0's add A requires transitivity, which wmb
does not guarantee.
OK, got it!
If I look at the transitivity section in Linux memory-barriers.txt, I
notice that the example is mainly around using read barrier around loads
rather than general barrier. Let's see if I can modify that example to
come up with an example error case:
Initial content: empty
T0: add X
T1: r1 = lookup X; smp_mb; r2 = lookup Y
T2: add Y; r3 = lookup X
assert( !(r1 !r2 !r3) )
The key thing here is that if the barrier in T2 after add Y is a
smp_wmb rather than a smp_mb, this could allow the r3 = lookup X to be
reordered before add Y, thus allowing the assertion to fail.
Your example is simpler, and demonstrates the need just as well, so
let's go with your example.
I think it would be more intuitive for users if lookups vs updates
performed on the same thread are ordered with full memory barriers.
Given that we don't want to add extra barriers in read operations, it
would make sense to guarantee full memory barriers before and after
updates.
So how about we use full memory barriers before and after each of: add,
del (success), add_unique (success), replace, and add_replace ? If we
ever want to relax those ordering guarantees, then we can always add new
update APIs with a weaker ordering.
Thoughts ?
That line of reasoning makes a lot of sense to me!
Sounds good.
Here is what I propose, thoughts ?
Just to make sure I understand -- the reason that the del functions
say no memory barrier instead of acts like rcu_dereference() is
that the del functions don't return anything.
Hrm, you bring an interesting point. I think we should change the two
rcu_dereference() in _cds_lfht_del for CMM_LOAD_SHARED(). The
difference between the two is that CMM_LOAD_SHARED() does not imply a
read barrier between the read and following uses of the data pointed to
by the pointer read.
Same thing for cds_lfht_is_node_deleted: the rcu_dereference() should
be changed for a CMM_LOAD_SHARED(), because we never use