Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Mon, Mar 02, 2015 at 09:23:45AM +0100, Peter Zijlstra wrote:
> It changes the rb-tree from using internal storage like we do now, to
> requiring external storage.

Note that one consequence of using external storage is that tree
iteration will always require two cachelines per level. One to
load the node and one to load the search key.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Sun, Mar 01, 2015 at 01:52:09PM +, Mathieu Desnoyers wrote:
> >  2) there must not be (temporary) loops in the tree structure in the
> > modifier's program order, this would cause a lookup which
> > interrupts the modifier to get stuck indefinitely.
> 
> For (2), I don't think this is how the situation should be described.
> 
> Let's consider a scenario where an interrupt nests over the modification.
> First, the modification will switch the latch to the other version of the
> tree. Therefore, the interrupt will see a fully consistent tree, not the
> tree being modified. Therefore, a temporary loop in the tree should not
> be an issue for that peculiar situation.
> 
> However, if we have another thread traversing the tree while we
> concurrently switch the latch and modify one version of the tree,
> creating a temporary loop in the tree, this thread could possibly:
> 
> A) deadlock with the modification if there is a locking dependency
>between tree modification, tree read, and another lock (transitive
>dependency).
> B) have the modifier starved by the other thread, if that thread has
>a higher scheduling priority (e.g. RT) than the modifier. The high
>priority thread would then use all its CPU time to perform the
>temporary loop.
> 
> So I agree that loops are unwanted there: it allows us to never have
> to care about situations A and B. However, the explanation about why
> should not involve, AFAIU, an interrupt handler nesting over the tree
> modification, because this is precisely one scenario that should not
> care about loops.
> 
> Thoughs ?

This is true for the (later) proposed latched RB-tree, the description
is however true in general. If you somehow did a lookup while doing the
modification and you have loops in program order, you're stuck.

So in the interest of robustness I think we want this property
nonetheless. And its 'free', I didn't have to change any code for this.

I shall however clarify this point in the latched RB-tree patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Sun, Mar 01, 2015 at 01:11:23PM -0800, Michel Lespinasse wrote:
> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra  wrote:

> > It generates slightly worse code, probably because gcc stinks at
> > volatile. But in pointer chasing heavy code a few instructions more
> > should not matter.
> 
> So, I was worried that this would penalize all rbtree users, for the
> benefit of just the one you're adding later in this series. We have
> several rbtrees where we care about performance a lot, such as the
> ones used in the scheduler or for indexing vmas.
> 
> That said, I checked with the compiler we are using here (gcc 4.7
> variant) and I didn't see any change in the generated code. So, no
> issue here for me.
> 
> If the object code really is different in your setup, please use the
> lib/rbtree_test module to check the performance impact of the change.

I can do that; I had similar results to what Ingo posted. I meant to go
build a 4.9 or 5.0 compiler to see what current GCC makes of it, but
I've not yet gotten around to doing so.

My result were with 4.8.3 iirc.

> > For 2) I have carefully audited the code and drawn every intermediate
> > link state and not found a loop.
> 
> As Mathieu remarked, we are never modifying the currently active tree,
> so the interrupt case is not the reason for avoiding loops.

Correct, for the proposed use we do no. I did however double (actually
triple) check this property because I feel its a good property to have,
no matter what you do to the tree, a (simple) lookup will be non-fatal.

But yes, I'll clarify things.

> I think your proposal will work well for the use case you have in mind
> (looking up modules based on address). However, I was wondering how
> you would compare your proposal against an alternative I hard Josh
> Triplett formulate before, where there would be one unique rbtree but
> rotations would allocate new nodes rather than modify the existing
> ones. I think this would be workable as well; I'm just not sure
> whether this would be more or less generally applicable than your
> proposal. Copying Josh in case he wants to chime in.

So I was not aware of that particular solution.

It changes the rb-tree from using internal storage like we do now, to
requiring external storage.

I do have experience with making an RCU safe (in memory) B+tree, and
there the allocations were absolutely killing performance.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Sun, Mar 01, 2015 at 01:11:23PM -0800, Michel Lespinasse wrote:
 On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote:

  It generates slightly worse code, probably because gcc stinks at
  volatile. But in pointer chasing heavy code a few instructions more
  should not matter.
 
 So, I was worried that this would penalize all rbtree users, for the
 benefit of just the one you're adding later in this series. We have
 several rbtrees where we care about performance a lot, such as the
 ones used in the scheduler or for indexing vmas.
 
 That said, I checked with the compiler we are using here (gcc 4.7
 variant) and I didn't see any change in the generated code. So, no
 issue here for me.
 
 If the object code really is different in your setup, please use the
 lib/rbtree_test module to check the performance impact of the change.

I can do that; I had similar results to what Ingo posted. I meant to go
build a 4.9 or 5.0 compiler to see what current GCC makes of it, but
I've not yet gotten around to doing so.

My result were with 4.8.3 iirc.

  For 2) I have carefully audited the code and drawn every intermediate
  link state and not found a loop.
 
 As Mathieu remarked, we are never modifying the currently active tree,
 so the interrupt case is not the reason for avoiding loops.

Correct, for the proposed use we do no. I did however double (actually
triple) check this property because I feel its a good property to have,
no matter what you do to the tree, a (simple) lookup will be non-fatal.

But yes, I'll clarify things.

 I think your proposal will work well for the use case you have in mind
 (looking up modules based on address). However, I was wondering how
 you would compare your proposal against an alternative I hard Josh
 Triplett formulate before, where there would be one unique rbtree but
 rotations would allocate new nodes rather than modify the existing
 ones. I think this would be workable as well; I'm just not sure
 whether this would be more or less generally applicable than your
 proposal. Copying Josh in case he wants to chime in.

So I was not aware of that particular solution.

It changes the rb-tree from using internal storage like we do now, to
requiring external storage.

I do have experience with making an RCU safe (in memory) B+tree, and
there the allocations were absolutely killing performance.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Sun, Mar 01, 2015 at 01:52:09PM +, Mathieu Desnoyers wrote:
   2) there must not be (temporary) loops in the tree structure in the
  modifier's program order, this would cause a lookup which
  interrupts the modifier to get stuck indefinitely.
 
 For (2), I don't think this is how the situation should be described.
 
 Let's consider a scenario where an interrupt nests over the modification.
 First, the modification will switch the latch to the other version of the
 tree. Therefore, the interrupt will see a fully consistent tree, not the
 tree being modified. Therefore, a temporary loop in the tree should not
 be an issue for that peculiar situation.
 
 However, if we have another thread traversing the tree while we
 concurrently switch the latch and modify one version of the tree,
 creating a temporary loop in the tree, this thread could possibly:
 
 A) deadlock with the modification if there is a locking dependency
between tree modification, tree read, and another lock (transitive
dependency).
 B) have the modifier starved by the other thread, if that thread has
a higher scheduling priority (e.g. RT) than the modifier. The high
priority thread would then use all its CPU time to perform the
temporary loop.
 
 So I agree that loops are unwanted there: it allows us to never have
 to care about situations A and B. However, the explanation about why
 should not involve, AFAIU, an interrupt handler nesting over the tree
 modification, because this is precisely one scenario that should not
 care about loops.
 
 Thoughs ?

This is true for the (later) proposed latched RB-tree, the description
is however true in general. If you somehow did a lookup while doing the
modification and you have loops in program order, you're stuck.

So in the interest of robustness I think we want this property
nonetheless. And its 'free', I didn't have to change any code for this.

I shall however clarify this point in the latched RB-tree patch.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-02 Thread Peter Zijlstra
On Mon, Mar 02, 2015 at 09:23:45AM +0100, Peter Zijlstra wrote:
 It changes the rb-tree from using internal storage like we do now, to
 requiring external storage.

Note that one consequence of using external storage is that tree
iteration will always require two cachelines per level. One to
load the node and one to load the search key.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Ingo Molnar

* Michel Lespinasse  wrote:

> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra  wrote:
> > Change the insert and erase code such that lockless searches are
> > non-fatal.
> >
> > In and of itself an rbtree cannot be correctly searched while
> > in-modification, we can however provide weaker guarantees that will
> > allow the rbtree to be used in conjunction with other techniques, such
> > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> >
> > For this to work we need the following guarantees from the rbtree
> > code:
> >
> >  1) a lockless reader must not see partial stores, this would allow it
> > to observe nodes that are invalid memory.
> >
> >  2) there must not be (temporary) loops in the tree structure in the
> > modifier's program order, this would cause a lookup which
> > interrupts the modifier to get stuck indefinitely.
> >
> > For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> > in particular this patch only does rb_{left,right} as those are the
> > only element required for simple searches.
> >
> > It generates slightly worse code, probably because gcc stinks at
> > volatile. But in pointer chasing heavy code a few instructions more
> > should not matter.
> 
> So, I was worried that this would penalize all rbtree users, for the 
> benefit of just the one you're adding later in this series. We have 
> several rbtrees where we care about performance a lot, such as the 
> ones used in the scheduler or for indexing vmas.
> 
> That said, I checked with the compiler we are using here (gcc 4.7 
> variant) and I didn't see any change in the generated code. So, no 
> issue here for me.

Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get 
this comparison:

lib/rbtree.o:

   textdata bss dec hex filename
   3299   0   03299 ce3 rbtree.o.before
   3308   0   03308 cec rbtree.o.after

+9 bytes.

Most of the difference seems minimal, involves an extra register move 
saving addresses in registers and using them there:

  449:  4c 8b 60 10 mov0x10(%rax),%r12
- 44d:  49 39 fccmp%rdi,%r12
- 450:  0f 84 aa 00 00 00   je 500 <__rb_insert_augmented+0x1a0>
- 456:  49 89 c6mov%rax,%r14
- 459:  4d 85 e4test   %r12,%r12
- 45c:  4c 89 63 08 mov%r12,0x8(%rbx)
- 460:  48 89 58 10 mov%rbx,0x10(%rax)
- 464:  74 0b   je 471 <__rb_insert_augmented+0x111>


  449:  4c 8b 60 10 mov0x10(%rax),%r12
+ 44d:  48 89 c6mov%rax,%rsi
+ 450:  49 39 fccmp%rdi,%r12
+ 453:  0f 84 97 00 00 00   je 4f0 <__rb_insert_augmented+0x190>
+ 459:  49 89 c6mov%rax,%r14
+ 45c:  4d 85 e4test   %r12,%r12
+ 45f:  4c 89 63 08 mov%r12,0x8(%rbx)
+ 463:  48 89 5e 10 mov%rbx,0x10(%rsi)
+ 467:  74 0b   je 474 <__rb_insert_augmented+0x114>

So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi 
and used 0x10(%rsi). That seems to be plain compiler stupidity (that 
move to %rsi is senseless with or without volatile), and gcc might 
improve in the future.

In my (admittedly quick) comparison I saw no serious changes like 
extra memory loads or stack spills.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Michel Lespinasse
On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra  wrote:
> Change the insert and erase code such that lockless searches are
> non-fatal.
>
> In and of itself an rbtree cannot be correctly searched while
> in-modification, we can however provide weaker guarantees that will
> allow the rbtree to be used in conjunction with other techniques, such
> as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
>
> For this to work we need the following guarantees from the rbtree
> code:
>
>  1) a lockless reader must not see partial stores, this would allow it
> to observe nodes that are invalid memory.
>
>  2) there must not be (temporary) loops in the tree structure in the
> modifier's program order, this would cause a lookup which
> interrupts the modifier to get stuck indefinitely.
>
> For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> in particular this patch only does rb_{left,right} as those are the
> only element required for simple searches.
>
> It generates slightly worse code, probably because gcc stinks at
> volatile. But in pointer chasing heavy code a few instructions more
> should not matter.

So, I was worried that this would penalize all rbtree users, for the
benefit of just the one you're adding later in this series. We have
several rbtrees where we care about performance a lot, such as the
ones used in the scheduler or for indexing vmas.

That said, I checked with the compiler we are using here (gcc 4.7
variant) and I didn't see any change in the generated code. So, no
issue here for me.

If the object code really is different in your setup, please use the
lib/rbtree_test module to check the performance impact of the change.

> For 2) I have carefully audited the code and drawn every intermediate
> link state and not found a loop.

As Mathieu remarked, we are never modifying the currently active tree,
so the interrupt case is not the reason for avoiding loops.


I think your proposal will work well for the use case you have in mind
(looking up modules based on address). However, I was wondering how
you would compare your proposal against an alternative I hard Josh
Triplett formulate before, where there would be one unique rbtree but
rotations would allocate new nodes rather than modify the existing
ones. I think this would be workable as well; I'm just not sure
whether this would be more or less generally applicable than your
proposal. Copying Josh in case he wants to chime in.

Reviewed-by: Michel Lespinasse 

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Mathieu Desnoyers
- Original Message -
> From: "Peter Zijlstra" 
> To: mi...@kernel.org, ru...@rustcorp.com.au, "mathieu desnoyers" 
> , o...@redhat.com,
> paul...@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org, a...@firstfloor.org, rost...@goodmis.org, 
> t...@linutronix.de, pet...@infradead.org,
> "Michel Lespinasse" , "Andrea Arcangeli" 
> , "David Woodhouse"
> , "Rik van Riel" 
> Sent: Saturday, February 28, 2015 4:24:52 PM
> Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
> 
> Change the insert and erase code such that lockless searches are
> non-fatal.
> 
> In and of itself an rbtree cannot be correctly searched while
> in-modification, we can however provide weaker guarantees that will
> allow the rbtree to be used in conjunction with other techniques, such
> as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> 
> For this to work we need the following guarantees from the rbtree
> code:
> 
>  1) a lockless reader must not see partial stores, this would allow it
> to observe nodes that are invalid memory.
> 
>  2) there must not be (temporary) loops in the tree structure in the
> modifier's program order, this would cause a lookup which
> interrupts the modifier to get stuck indefinitely.

For (2), I don't think this is how the situation should be described.

Let's consider a scenario where an interrupt nests over the modification.
First, the modification will switch the latch to the other version of the
tree. Therefore, the interrupt will see a fully consistent tree, not the
tree being modified. Therefore, a temporary loop in the tree should not
be an issue for that peculiar situation.

However, if we have another thread traversing the tree while we
concurrently switch the latch and modify one version of the tree,
creating a temporary loop in the tree, this thread could possibly:

A) deadlock with the modification if there is a locking dependency
   between tree modification, tree read, and another lock (transitive
   dependency).
B) have the modifier starved by the other thread, if that thread has
   a higher scheduling priority (e.g. RT) than the modifier. The high
   priority thread would then use all its CPU time to perform the
   temporary loop.

So I agree that loops are unwanted there: it allows us to never have
to care about situations A and B. However, the explanation about why
should not involve, AFAIU, an interrupt handler nesting over the tree
modification, because this is precisely one scenario that should not
care about loops.

Thoughs ?

Thanks!

Mathieu


> 
> For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> in particular this patch only does rb_{left,right} as those are the
> only element required for simple searches.
> 
> It generates slightly worse code, probably because gcc stinks at
> volatile. But in pointer chasing heavy code a few instructions more
> should not matter.
> 
> For 2) I have carefully audited the code and drawn every intermediate
> link state and not found a loop.
> 
> Cc: Oleg Nesterov 
> Cc: Michel Lespinasse 
> Cc: Andrea Arcangeli 
> Cc: David Woodhouse 
> Cc: Rik van Riel 
> Cc: Mathieu Desnoyers 
> Cc: "Paul E. McKenney" 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  include/linux/rbtree.h   |   10 +
>  include/linux/rbtree_augmented.h |   21 +++
>  lib/rbtree.c |   71
>  ++-
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> --- a/include/linux/rbtree.h
> +++ b/include/linux/rbtree.h
> @@ -31,6 +31,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct rb_node {
>   unsigned long  __rb_parent_color;
> @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
>   *rb_link = node;
>  }
>  
> +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node *
> parent,
> + struct rb_node ** rb_link)
> +{
> + node->__rb_parent_color = (unsigned long)parent;
> + node->rb_left = node->rb_right = NULL;
> +
> + rcu_assign_pointer(*rb_link, node);
> +}
> +
>  #define rb_entry_safe(ptr, type, member) \
>   ({ typeof(ptr) ptr = (ptr); \
>  ptr ? rb_entry(ptr, type, member) : NULL; \
> --- a/include/linux/rbtree_augmented.h
> +++ b/include/linux/rbtree_augmented.h
> @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
>  {
>   if (parent) {
>   if (parent->rb_left == old)
> - parent->rb_left = new;
> + WRITE_ONCE(parent->rb_left, new);
>   else
> -

Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Michel Lespinasse
On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote:
 Change the insert and erase code such that lockless searches are
 non-fatal.

 In and of itself an rbtree cannot be correctly searched while
 in-modification, we can however provide weaker guarantees that will
 allow the rbtree to be used in conjunction with other techniques, such
 as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()).

 For this to work we need the following guarantees from the rbtree
 code:

  1) a lockless reader must not see partial stores, this would allow it
 to observe nodes that are invalid memory.

  2) there must not be (temporary) loops in the tree structure in the
 modifier's program order, this would cause a lookup which
 interrupts the modifier to get stuck indefinitely.

 For 1) we must use WRITE_ONCE() for all updates to the tree structure;
 in particular this patch only does rb_{left,right} as those are the
 only element required for simple searches.

 It generates slightly worse code, probably because gcc stinks at
 volatile. But in pointer chasing heavy code a few instructions more
 should not matter.

So, I was worried that this would penalize all rbtree users, for the
benefit of just the one you're adding later in this series. We have
several rbtrees where we care about performance a lot, such as the
ones used in the scheduler or for indexing vmas.

That said, I checked with the compiler we are using here (gcc 4.7
variant) and I didn't see any change in the generated code. So, no
issue here for me.

If the object code really is different in your setup, please use the
lib/rbtree_test module to check the performance impact of the change.

 For 2) I have carefully audited the code and drawn every intermediate
 link state and not found a loop.

As Mathieu remarked, we are never modifying the currently active tree,
so the interrupt case is not the reason for avoiding loops.


I think your proposal will work well for the use case you have in mind
(looking up modules based on address). However, I was wondering how
you would compare your proposal against an alternative I hard Josh
Triplett formulate before, where there would be one unique rbtree but
rotations would allocate new nodes rather than modify the existing
ones. I think this would be workable as well; I'm just not sure
whether this would be more or less generally applicable than your
proposal. Copying Josh in case he wants to chime in.

Reviewed-by: Michel Lespinasse wal...@google.com

-- 
Michel Walken Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Mathieu Desnoyers
- Original Message -
 From: Peter Zijlstra pet...@infradead.org
 To: mi...@kernel.org, ru...@rustcorp.com.au, mathieu desnoyers 
 mathieu.desnoy...@efficios.com, o...@redhat.com,
 paul...@linux.vnet.ibm.com
 Cc: linux-kernel@vger.kernel.org, a...@firstfloor.org, rost...@goodmis.org, 
 t...@linutronix.de, pet...@infradead.org,
 Michel Lespinasse wal...@google.com, Andrea Arcangeli 
 aarca...@redhat.com, David Woodhouse
 david.woodho...@intel.com, Rik van Riel r...@redhat.com
 Sent: Saturday, February 28, 2015 4:24:52 PM
 Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
 
 Change the insert and erase code such that lockless searches are
 non-fatal.
 
 In and of itself an rbtree cannot be correctly searched while
 in-modification, we can however provide weaker guarantees that will
 allow the rbtree to be used in conjunction with other techniques, such
 as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()).
 
 For this to work we need the following guarantees from the rbtree
 code:
 
  1) a lockless reader must not see partial stores, this would allow it
 to observe nodes that are invalid memory.
 
  2) there must not be (temporary) loops in the tree structure in the
 modifier's program order, this would cause a lookup which
 interrupts the modifier to get stuck indefinitely.

For (2), I don't think this is how the situation should be described.

Let's consider a scenario where an interrupt nests over the modification.
First, the modification will switch the latch to the other version of the
tree. Therefore, the interrupt will see a fully consistent tree, not the
tree being modified. Therefore, a temporary loop in the tree should not
be an issue for that peculiar situation.

However, if we have another thread traversing the tree while we
concurrently switch the latch and modify one version of the tree,
creating a temporary loop in the tree, this thread could possibly:

A) deadlock with the modification if there is a locking dependency
   between tree modification, tree read, and another lock (transitive
   dependency).
B) have the modifier starved by the other thread, if that thread has
   a higher scheduling priority (e.g. RT) than the modifier. The high
   priority thread would then use all its CPU time to perform the
   temporary loop.

So I agree that loops are unwanted there: it allows us to never have
to care about situations A and B. However, the explanation about why
should not involve, AFAIU, an interrupt handler nesting over the tree
modification, because this is precisely one scenario that should not
care about loops.

Thoughs ?

Thanks!

Mathieu


 
 For 1) we must use WRITE_ONCE() for all updates to the tree structure;
 in particular this patch only does rb_{left,right} as those are the
 only element required for simple searches.
 
 It generates slightly worse code, probably because gcc stinks at
 volatile. But in pointer chasing heavy code a few instructions more
 should not matter.
 
 For 2) I have carefully audited the code and drawn every intermediate
 link state and not found a loop.
 
 Cc: Oleg Nesterov o...@redhat.com
 Cc: Michel Lespinasse wal...@google.com
 Cc: Andrea Arcangeli aarca...@redhat.com
 Cc: David Woodhouse david.woodho...@intel.com
 Cc: Rik van Riel r...@redhat.com
 Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
 Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
 Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
 ---
  include/linux/rbtree.h   |   10 +
  include/linux/rbtree_augmented.h |   21 +++
  lib/rbtree.c |   71
  ++-
  3 files changed, 73 insertions(+), 29 deletions(-)
 
 --- a/include/linux/rbtree.h
 +++ b/include/linux/rbtree.h
 @@ -31,6 +31,7 @@
  
  #include linux/kernel.h
  #include linux/stddef.h
 +#include linux/rcupdate.h
  
  struct rb_node {
   unsigned long  __rb_parent_color;
 @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
   *rb_link = node;
  }
  
 +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node *
 parent,
 + struct rb_node ** rb_link)
 +{
 + node-__rb_parent_color = (unsigned long)parent;
 + node-rb_left = node-rb_right = NULL;
 +
 + rcu_assign_pointer(*rb_link, node);
 +}
 +
  #define rb_entry_safe(ptr, type, member) \
   ({ typeof(ptr) ptr = (ptr); \
  ptr ? rb_entry(ptr, type, member) : NULL; \
 --- a/include/linux/rbtree_augmented.h
 +++ b/include/linux/rbtree_augmented.h
 @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
  {
   if (parent) {
   if (parent-rb_left == old)
 - parent-rb_left = new;
 + WRITE_ONCE(parent-rb_left, new);
   else
 - parent-rb_right = new;
 + WRITE_ONCE(parent-rb_right, new);
   } else
 - root-rb_node = new;
 + WRITE_ONCE

Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-03-01 Thread Ingo Molnar

* Michel Lespinasse wal...@google.com wrote:

 On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote:
  Change the insert and erase code such that lockless searches are
  non-fatal.
 
  In and of itself an rbtree cannot be correctly searched while
  in-modification, we can however provide weaker guarantees that will
  allow the rbtree to be used in conjunction with other techniques, such
  as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()).
 
  For this to work we need the following guarantees from the rbtree
  code:
 
   1) a lockless reader must not see partial stores, this would allow it
  to observe nodes that are invalid memory.
 
   2) there must not be (temporary) loops in the tree structure in the
  modifier's program order, this would cause a lookup which
  interrupts the modifier to get stuck indefinitely.
 
  For 1) we must use WRITE_ONCE() for all updates to the tree structure;
  in particular this patch only does rb_{left,right} as those are the
  only element required for simple searches.
 
  It generates slightly worse code, probably because gcc stinks at
  volatile. But in pointer chasing heavy code a few instructions more
  should not matter.
 
 So, I was worried that this would penalize all rbtree users, for the 
 benefit of just the one you're adding later in this series. We have 
 several rbtrees where we care about performance a lot, such as the 
 ones used in the scheduler or for indexing vmas.
 
 That said, I checked with the compiler we are using here (gcc 4.7 
 variant) and I didn't see any change in the generated code. So, no 
 issue here for me.

Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get 
this comparison:

lib/rbtree.o:

   textdata bss dec hex filename
   3299   0   03299 ce3 rbtree.o.before
   3308   0   03308 cec rbtree.o.after

+9 bytes.

Most of the difference seems minimal, involves an extra register move 
saving addresses in registers and using them there:

  449:  4c 8b 60 10 mov0x10(%rax),%r12
- 44d:  49 39 fccmp%rdi,%r12
- 450:  0f 84 aa 00 00 00   je 500 __rb_insert_augmented+0x1a0
- 456:  49 89 c6mov%rax,%r14
- 459:  4d 85 e4test   %r12,%r12
- 45c:  4c 89 63 08 mov%r12,0x8(%rbx)
- 460:  48 89 58 10 mov%rbx,0x10(%rax)
- 464:  74 0b   je 471 __rb_insert_augmented+0x111


  449:  4c 8b 60 10 mov0x10(%rax),%r12
+ 44d:  48 89 c6mov%rax,%rsi
+ 450:  49 39 fccmp%rdi,%r12
+ 453:  0f 84 97 00 00 00   je 4f0 __rb_insert_augmented+0x190
+ 459:  49 89 c6mov%rax,%r14
+ 45c:  4d 85 e4test   %r12,%r12
+ 45f:  4c 89 63 08 mov%r12,0x8(%rbx)
+ 463:  48 89 5e 10 mov%rbx,0x10(%rsi)
+ 467:  74 0b   je 474 __rb_insert_augmented+0x114

So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi 
and used 0x10(%rsi). That seems to be plain compiler stupidity (that 
move to %rsi is senseless with or without volatile), and gcc might 
improve in the future.

In my (admittedly quick) comparison I saw no serious changes like 
extra memory loads or stack spills.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-02-28 Thread Peter Zijlstra
Change the insert and erase code such that lockless searches are
non-fatal.

In and of itself an rbtree cannot be correctly searched while
in-modification, we can however provide weaker guarantees that will
allow the rbtree to be used in conjunction with other techniques, such
as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").

For this to work we need the following guarantees from the rbtree
code:

 1) a lockless reader must not see partial stores, this would allow it
to observe nodes that are invalid memory.

 2) there must not be (temporary) loops in the tree structure in the
modifier's program order, this would cause a lookup which
interrupts the modifier to get stuck indefinitely.

For 1) we must use WRITE_ONCE() for all updates to the tree structure;
in particular this patch only does rb_{left,right} as those are the
only element required for simple searches.

It generates slightly worse code, probably because gcc stinks at
volatile. But in pointer chasing heavy code a few instructions more
should not matter.

For 2) I have carefully audited the code and drawn every intermediate
link state and not found a loop.

Cc: Oleg Nesterov 
Cc: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Rik van Riel 
Cc: Mathieu Desnoyers 
Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/rbtree.h   |   10 +
 include/linux/rbtree_augmented.h |   21 +++
 lib/rbtree.c |   71 ++-
 3 files changed, 73 insertions(+), 29 deletions(-)

--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -31,6 +31,7 @@
 
 #include 
 #include 
+#include 
 
 struct rb_node {
unsigned long  __rb_parent_color;
@@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
*rb_link = node;
 }
 
+static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * 
parent,
+   struct rb_node ** rb_link)
+{
+   node->__rb_parent_color = (unsigned long)parent;
+   node->rb_left = node->rb_right = NULL;
+
+   rcu_assign_pointer(*rb_link, node);
+}
+
 #define rb_entry_safe(ptr, type, member) \
({ typeof(ptr) ptr = (ptr); \
   ptr ? rb_entry(ptr, type, member) : NULL; \
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
 {
if (parent) {
if (parent->rb_left == old)
-   parent->rb_left = new;
+   WRITE_ONCE(parent->rb_left, new);
else
-   parent->rb_right = new;
+   WRITE_ONCE(parent->rb_right, new);
} else
-   root->rb_node = new;
+   WRITE_ONCE(root->rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -137,7 +137,8 @@ static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 const struct rb_augment_callbacks *augment)
 {
-   struct rb_node *child = node->rb_right, *tmp = node->rb_left;
+   struct rb_node *child = node->rb_right;
+   struct rb_node *tmp = node->rb_left;
struct rb_node *parent, *rebalance;
unsigned long pc;
 
@@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod
tmp = parent;
} else {
struct rb_node *successor = child, *child2;
+
tmp = child->rb_left;
if (!tmp) {
/*
@@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod
 */
parent = successor;
child2 = successor->rb_right;
+
augment->copy(node, successor);
} else {
/*
@@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod
successor = tmp;
tmp = tmp->rb_left;
} while (tmp);
-   parent->rb_left = child2 = successor->rb_right;
-   successor->rb_right = child;
+   child2 = successor->rb_right;
+   WRITE_ONCE(parent->rb_left, child2);
+   WRITE_ONCE(successor->rb_right, child);
rb_set_parent(child, successor);
+
augment->copy(node, successor);
augment->propagate(parent, successor);
}
 
-   successor->rb_left = tmp = node->rb_left;
+   tmp = node->rb_left;
+   WRITE_ONCE(successor->rb_left, tmp);
rb_set_parent(tmp, successor);
 
pc = node->__rb_parent_color;
tmp = __rb_parent(pc);
__rb_change_child(node, successor, tmp, root);
+

[RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal

2015-02-28 Thread Peter Zijlstra
Change the insert and erase code such that lockless searches are
non-fatal.

In and of itself an rbtree cannot be correctly searched while
in-modification, we can however provide weaker guarantees that will
allow the rbtree to be used in conjunction with other techniques, such
as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()).

For this to work we need the following guarantees from the rbtree
code:

 1) a lockless reader must not see partial stores, this would allow it
to observe nodes that are invalid memory.

 2) there must not be (temporary) loops in the tree structure in the
modifier's program order, this would cause a lookup which
interrupts the modifier to get stuck indefinitely.

For 1) we must use WRITE_ONCE() for all updates to the tree structure;
in particular this patch only does rb_{left,right} as those are the
only element required for simple searches.

It generates slightly worse code, probably because gcc stinks at
volatile. But in pointer chasing heavy code a few instructions more
should not matter.

For 2) I have carefully audited the code and drawn every intermediate
link state and not found a loop.

Cc: Oleg Nesterov o...@redhat.com
Cc: Michel Lespinasse wal...@google.com
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: David Woodhouse david.woodho...@intel.com
Cc: Rik van Riel r...@redhat.com
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 include/linux/rbtree.h   |   10 +
 include/linux/rbtree_augmented.h |   21 +++
 lib/rbtree.c |   71 ++-
 3 files changed, 73 insertions(+), 29 deletions(-)

--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -31,6 +31,7 @@
 
 #include linux/kernel.h
 #include linux/stddef.h
+#include linux/rcupdate.h
 
 struct rb_node {
unsigned long  __rb_parent_color;
@@ -85,6 +86,15 @@ static inline void rb_link_node(struct r
*rb_link = node;
 }
 
+static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * 
parent,
+   struct rb_node ** rb_link)
+{
+   node-__rb_parent_color = (unsigned long)parent;
+   node-rb_left = node-rb_right = NULL;
+
+   rcu_assign_pointer(*rb_link, node);
+}
+
 #define rb_entry_safe(ptr, type, member) \
({ typeof(ptr) ptr = (ptr); \
   ptr ? rb_entry(ptr, type, member) : NULL; \
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s
 {
if (parent) {
if (parent-rb_left == old)
-   parent-rb_left = new;
+   WRITE_ONCE(parent-rb_left, new);
else
-   parent-rb_right = new;
+   WRITE_ONCE(parent-rb_right, new);
} else
-   root-rb_node = new;
+   WRITE_ONCE(root-rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -137,7 +137,8 @@ static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 const struct rb_augment_callbacks *augment)
 {
-   struct rb_node *child = node-rb_right, *tmp = node-rb_left;
+   struct rb_node *child = node-rb_right;
+   struct rb_node *tmp = node-rb_left;
struct rb_node *parent, *rebalance;
unsigned long pc;
 
@@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod
tmp = parent;
} else {
struct rb_node *successor = child, *child2;
+
tmp = child-rb_left;
if (!tmp) {
/*
@@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod
 */
parent = successor;
child2 = successor-rb_right;
+
augment-copy(node, successor);
} else {
/*
@@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod
successor = tmp;
tmp = tmp-rb_left;
} while (tmp);
-   parent-rb_left = child2 = successor-rb_right;
-   successor-rb_right = child;
+   child2 = successor-rb_right;
+   WRITE_ONCE(parent-rb_left, child2);
+   WRITE_ONCE(successor-rb_right, child);
rb_set_parent(child, successor);
+
augment-copy(node, successor);
augment-propagate(parent, successor);
}
 
-   successor-rb_left = tmp = node-rb_left;
+   tmp = node-rb_left;
+   WRITE_ONCE(successor-rb_left, tmp);
rb_set_parent(tmp,