Re: [lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

2012-05-05 Thread Mathieu Desnoyers
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
 On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
  * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
   On Thu, May 03, 2012 at 01:13:30PM -0400, Mathieu Desnoyers wrote:
* Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote:
  [...]
 
 A write barrier would be sufficient in the case where there were only
 two threads observing each other.  A full memory barrier would be 
 needed
 to prevent the assertion from firing in this sort of case (not sure 
 that
 this is exactly right, but something like this):
 
 Initial contents: B, C
 
 T0: add A; del B
 T1: if (!lookup B) { add B; del C }
 T2: r1 = lookup C; smp_mb(); r2 = lookup A
 
 assert(lookup C || lookup A);

What you are bringing here as counter-example is, I think, transitivity.
   
   Yep!
   
I'm trying to figure out how your example could fail, and I cannot see
how. Follows a detail of the closest scenario I get to failing is the
following, but it does not fail. After that, I'm proposing a different
scenario, which I think will be more appropriate for the current topic.

Attempted detail of your scenario:

T0 T1 T2

add A
wmb
  (add A globally
   observable)
   
   Almost.  All that this really guarantees is that if someone sees
   the add B, they will also see the add A.
   
del B
  (del B globally
   observable)
   (add A NOT brought
into cache)
   (del B brought into
cache)
   read B
   (test false)
   add B
   wmb
   (add B globally
observable)
   del C
   (del C globally
observable)
   
   Here, if someone sees the del C, they will see the add B, and
   they also will have lost the opportunity to modify B before T1
   reads from it and modifies it.
   
  (add A NOT brought
   into cache)
  (del C brought into
   cache)
  read C - not there.
  mb
  (add A brought
   into cache)
  read A - there - success.
   
   So we see that C is not there.  We know that B would be there if
   we looked at it.  But we don't look at B, we look at A.  But the
   ordering back to T0's add A requires transitivity, which wmb
   does not guarantee.
  
  OK, got it!
  
   
If I look at the transitivity section in Linux memory-barriers.txt, I
notice that the example is mainly around using read barrier around loads
rather than general barrier. Let's see if I can modify that example to
come up with an example error case:

Initial content: empty

T0: add X
T1: r1 = lookup X; smp_mb; r2 = lookup Y
T2: add Y; r3 = lookup X

assert( !(r1  !r2  !r3) )

The key thing here is that if the barrier in T2 after add Y is a
smp_wmb rather than a smp_mb, this could allow the r3 = lookup X to be
reordered before add Y, thus allowing the assertion to fail.
   
   Your example is simpler, and demonstrates the need just as well, so
   let's go with your example.
   
I think it would be more intuitive for users if lookups vs updates
performed on the same thread are ordered with full memory barriers.
Given that we don't want to add extra barriers in read operations, it
would make sense to guarantee full memory barriers before and after
updates.

So how about we use full memory barriers before and after each of: add,
del (success), add_unique (success), replace, and add_replace ? If we
ever want to relax those ordering guarantees, then we can always add new
update APIs with a weaker ordering.

Thoughts ?
   
   That line of reasoning makes a lot of sense to me!
  
  Sounds good.
  
  Here is what I propose, thoughts ?
 
 Just to make sure I understand -- the reason that the del functions
 say no memory barrier instead of acts like rcu_dereference() is
 that the del functions don't return anything.

Hrm, you bring an interesting point. I think we should change the two
rcu_dereference() in _cds_lfht_del for CMM_LOAD_SHARED(). The
difference between the two is that CMM_LOAD_SHARED() does not imply a
read barrier between the read and following uses of the data pointed to
by the pointer read.

Same thing for cds_lfht_is_node_deleted: the rcu_dereference() should
be changed for a CMM_LOAD_SHARED(), because we never use 

[lttng-dev] tracing program flow

2012-05-05 Thread Chuck Tuffli
Hi -

I'm investigating the feasibility of replacing some home grown tracing
code with LTTng and am looking for advice from others who have used
LTTng to trace program flow (i.e. trace function entry and exit
points). One of the strengths of LTTng seems to be the ability to
selectively enable tracepoints of interest (much more useful than our
all-or-nothing tracing). To get this type of capability, would I need
to implement an entry and exit tracepoint for each function or is
there a better approach? Any other best practices people would be
willing to share? TIA.

---chuck

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev