Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Mon, Mar 02, 2015 at 09:23:45AM +0100, Peter Zijlstra wrote: > It changes the rb-tree from using internal storage like we do now, to > requiring external storage. Note that one consequence of using external storage is that tree iteration will always require two cachelines per level. One to load the node and one to load the search key. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sun, Mar 01, 2015 at 01:52:09PM +, Mathieu Desnoyers wrote: > > 2) there must not be (temporary) loops in the tree structure in the > > modifier's program order, this would cause a lookup which > > interrupts the modifier to get stuck indefinitely. > > For (2), I don't think this is how the situation should be described. > > Let's consider a scenario where an interrupt nests over the modification. > First, the modification will switch the latch to the other version of the > tree. Therefore, the interrupt will see a fully consistent tree, not the > tree being modified. Therefore, a temporary loop in the tree should not > be an issue for that peculiar situation. > > However, if we have another thread traversing the tree while we > concurrently switch the latch and modify one version of the tree, > creating a temporary loop in the tree, this thread could possibly: > > A) deadlock with the modification if there is a locking dependency >between tree modification, tree read, and another lock (transitive >dependency). > B) have the modifier starved by the other thread, if that thread has >a higher scheduling priority (e.g. RT) than the modifier. The high >priority thread would then use all its CPU time to perform the >temporary loop. > > So I agree that loops are unwanted there: it allows us to never have > to care about situations A and B. However, the explanation about why > should not involve, AFAIU, an interrupt handler nesting over the tree > modification, because this is precisely one scenario that should not > care about loops. > > Thoughs ? This is true for the (later) proposed latched RB-tree, the description is however true in general. If you somehow did a lookup while doing the modification and you have loops in program order, you're stuck. So in the interest of robustness I think we want this property nonetheless. And its 'free', I didn't have to change any code for this. I shall however clarify this point in the latched RB-tree patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sun, Mar 01, 2015 at 01:11:23PM -0800, Michel Lespinasse wrote: > On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra wrote: > > It generates slightly worse code, probably because gcc stinks at > > volatile. But in pointer chasing heavy code a few instructions more > > should not matter. > > So, I was worried that this would penalize all rbtree users, for the > benefit of just the one you're adding later in this series. We have > several rbtrees where we care about performance a lot, such as the > ones used in the scheduler or for indexing vmas. > > That said, I checked with the compiler we are using here (gcc 4.7 > variant) and I didn't see any change in the generated code. So, no > issue here for me. > > If the object code really is different in your setup, please use the > lib/rbtree_test module to check the performance impact of the change. I can do that; I had similar results to what Ingo posted. I meant to go build a 4.9 or 5.0 compiler to see what current GCC makes of it, but I've not yet gotten around to doing so. My result were with 4.8.3 iirc. > > For 2) I have carefully audited the code and drawn every intermediate > > link state and not found a loop. > > As Mathieu remarked, we are never modifying the currently active tree, > so the interrupt case is not the reason for avoiding loops. Correct, for the proposed use we do no. I did however double (actually triple) check this property because I feel its a good property to have, no matter what you do to the tree, a (simple) lookup will be non-fatal. But yes, I'll clarify things. > I think your proposal will work well for the use case you have in mind > (looking up modules based on address). However, I was wondering how > you would compare your proposal against an alternative I hard Josh > Triplett formulate before, where there would be one unique rbtree but > rotations would allocate new nodes rather than modify the existing > ones. I think this would be workable as well; I'm just not sure > whether this would be more or less generally applicable than your > proposal. Copying Josh in case he wants to chime in. So I was not aware of that particular solution. It changes the rb-tree from using internal storage like we do now, to requiring external storage. I do have experience with making an RCU safe (in memory) B+tree, and there the allocations were absolutely killing performance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sun, Mar 01, 2015 at 01:11:23PM -0800, Michel Lespinasse wrote: On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote: It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. So, I was worried that this would penalize all rbtree users, for the benefit of just the one you're adding later in this series. We have several rbtrees where we care about performance a lot, such as the ones used in the scheduler or for indexing vmas. That said, I checked with the compiler we are using here (gcc 4.7 variant) and I didn't see any change in the generated code. So, no issue here for me. If the object code really is different in your setup, please use the lib/rbtree_test module to check the performance impact of the change. I can do that; I had similar results to what Ingo posted. I meant to go build a 4.9 or 5.0 compiler to see what current GCC makes of it, but I've not yet gotten around to doing so. My result were with 4.8.3 iirc. For 2) I have carefully audited the code and drawn every intermediate link state and not found a loop. As Mathieu remarked, we are never modifying the currently active tree, so the interrupt case is not the reason for avoiding loops. Correct, for the proposed use we do no. I did however double (actually triple) check this property because I feel its a good property to have, no matter what you do to the tree, a (simple) lookup will be non-fatal. But yes, I'll clarify things. I think your proposal will work well for the use case you have in mind (looking up modules based on address). However, I was wondering how you would compare your proposal against an alternative I hard Josh Triplett formulate before, where there would be one unique rbtree but rotations would allocate new nodes rather than modify the existing ones. I think this would be workable as well; I'm just not sure whether this would be more or less generally applicable than your proposal. Copying Josh in case he wants to chime in. So I was not aware of that particular solution. It changes the rb-tree from using internal storage like we do now, to requiring external storage. I do have experience with making an RCU safe (in memory) B+tree, and there the allocations were absolutely killing performance. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sun, Mar 01, 2015 at 01:52:09PM +, Mathieu Desnoyers wrote: 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For (2), I don't think this is how the situation should be described. Let's consider a scenario where an interrupt nests over the modification. First, the modification will switch the latch to the other version of the tree. Therefore, the interrupt will see a fully consistent tree, not the tree being modified. Therefore, a temporary loop in the tree should not be an issue for that peculiar situation. However, if we have another thread traversing the tree while we concurrently switch the latch and modify one version of the tree, creating a temporary loop in the tree, this thread could possibly: A) deadlock with the modification if there is a locking dependency between tree modification, tree read, and another lock (transitive dependency). B) have the modifier starved by the other thread, if that thread has a higher scheduling priority (e.g. RT) than the modifier. The high priority thread would then use all its CPU time to perform the temporary loop. So I agree that loops are unwanted there: it allows us to never have to care about situations A and B. However, the explanation about why should not involve, AFAIU, an interrupt handler nesting over the tree modification, because this is precisely one scenario that should not care about loops. Thoughs ? This is true for the (later) proposed latched RB-tree, the description is however true in general. If you somehow did a lookup while doing the modification and you have loops in program order, you're stuck. So in the interest of robustness I think we want this property nonetheless. And its 'free', I didn't have to change any code for this. I shall however clarify this point in the latched RB-tree patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Mon, Mar 02, 2015 at 09:23:45AM +0100, Peter Zijlstra wrote: It changes the rb-tree from using internal storage like we do now, to requiring external storage. Note that one consequence of using external storage is that tree iteration will always require two cachelines per level. One to load the node and one to load the search key. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
* Michel Lespinasse wrote: > On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra wrote: > > Change the insert and erase code such that lockless searches are > > non-fatal. > > > > In and of itself an rbtree cannot be correctly searched while > > in-modification, we can however provide weaker guarantees that will > > allow the rbtree to be used in conjunction with other techniques, such > > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()"). > > > > For this to work we need the following guarantees from the rbtree > > code: > > > > 1) a lockless reader must not see partial stores, this would allow it > > to observe nodes that are invalid memory. > > > > 2) there must not be (temporary) loops in the tree structure in the > > modifier's program order, this would cause a lookup which > > interrupts the modifier to get stuck indefinitely. > > > > For 1) we must use WRITE_ONCE() for all updates to the tree structure; > > in particular this patch only does rb_{left,right} as those are the > > only element required for simple searches. > > > > It generates slightly worse code, probably because gcc stinks at > > volatile. But in pointer chasing heavy code a few instructions more > > should not matter. > > So, I was worried that this would penalize all rbtree users, for the > benefit of just the one you're adding later in this series. We have > several rbtrees where we care about performance a lot, such as the > ones used in the scheduler or for indexing vmas. > > That said, I checked with the compiler we are using here (gcc 4.7 > variant) and I didn't see any change in the generated code. So, no > issue here for me. Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get this comparison: lib/rbtree.o: textdata bss dec hex filename 3299 0 03299 ce3 rbtree.o.before 3308 0 03308 cec rbtree.o.after +9 bytes. Most of the difference seems minimal, involves an extra register move saving addresses in registers and using them there: 449: 4c 8b 60 10 mov0x10(%rax),%r12 - 44d: 49 39 fccmp%rdi,%r12 - 450: 0f 84 aa 00 00 00 je 500 <__rb_insert_augmented+0x1a0> - 456: 49 89 c6mov%rax,%r14 - 459: 4d 85 e4test %r12,%r12 - 45c: 4c 89 63 08 mov%r12,0x8(%rbx) - 460: 48 89 58 10 mov%rbx,0x10(%rax) - 464: 74 0b je 471 <__rb_insert_augmented+0x111> 449: 4c 8b 60 10 mov0x10(%rax),%r12 + 44d: 48 89 c6mov%rax,%rsi + 450: 49 39 fccmp%rdi,%r12 + 453: 0f 84 97 00 00 00 je 4f0 <__rb_insert_augmented+0x190> + 459: 49 89 c6mov%rax,%r14 + 45c: 4d 85 e4test %r12,%r12 + 45f: 4c 89 63 08 mov%r12,0x8(%rbx) + 463: 48 89 5e 10 mov%rbx,0x10(%rsi) + 467: 74 0b je 474 <__rb_insert_augmented+0x114> So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi and used 0x10(%rsi). That seems to be plain compiler stupidity (that move to %rsi is senseless with or without volatile), and gcc might improve in the future. In my (admittedly quick) comparison I saw no serious changes like extra memory loads or stack spills. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra wrote: > Change the insert and erase code such that lockless searches are > non-fatal. > > In and of itself an rbtree cannot be correctly searched while > in-modification, we can however provide weaker guarantees that will > allow the rbtree to be used in conjunction with other techniques, such > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()"). > > For this to work we need the following guarantees from the rbtree > code: > > 1) a lockless reader must not see partial stores, this would allow it > to observe nodes that are invalid memory. > > 2) there must not be (temporary) loops in the tree structure in the > modifier's program order, this would cause a lookup which > interrupts the modifier to get stuck indefinitely. > > For 1) we must use WRITE_ONCE() for all updates to the tree structure; > in particular this patch only does rb_{left,right} as those are the > only element required for simple searches. > > It generates slightly worse code, probably because gcc stinks at > volatile. But in pointer chasing heavy code a few instructions more > should not matter. So, I was worried that this would penalize all rbtree users, for the benefit of just the one you're adding later in this series. We have several rbtrees where we care about performance a lot, such as the ones used in the scheduler or for indexing vmas. That said, I checked with the compiler we are using here (gcc 4.7 variant) and I didn't see any change in the generated code. So, no issue here for me. If the object code really is different in your setup, please use the lib/rbtree_test module to check the performance impact of the change. > For 2) I have carefully audited the code and drawn every intermediate > link state and not found a loop. As Mathieu remarked, we are never modifying the currently active tree, so the interrupt case is not the reason for avoiding loops. I think your proposal will work well for the use case you have in mind (looking up modules based on address). However, I was wondering how you would compare your proposal against an alternative I hard Josh Triplett formulate before, where there would be one unique rbtree but rotations would allocate new nodes rather than modify the existing ones. I think this would be workable as well; I'm just not sure whether this would be more or less generally applicable than your proposal. Copying Josh in case he wants to chime in. Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
- Original Message - > From: "Peter Zijlstra" > To: mi...@kernel.org, ru...@rustcorp.com.au, "mathieu desnoyers" > , o...@redhat.com, > paul...@linux.vnet.ibm.com > Cc: linux-kernel@vger.kernel.org, a...@firstfloor.org, rost...@goodmis.org, > t...@linutronix.de, pet...@infradead.org, > "Michel Lespinasse" , "Andrea Arcangeli" > , "David Woodhouse" > , "Rik van Riel" > Sent: Saturday, February 28, 2015 4:24:52 PM > Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal > > Change the insert and erase code such that lockless searches are > non-fatal. > > In and of itself an rbtree cannot be correctly searched while > in-modification, we can however provide weaker guarantees that will > allow the rbtree to be used in conjunction with other techniques, such > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()"). > > For this to work we need the following guarantees from the rbtree > code: > > 1) a lockless reader must not see partial stores, this would allow it > to observe nodes that are invalid memory. > > 2) there must not be (temporary) loops in the tree structure in the > modifier's program order, this would cause a lookup which > interrupts the modifier to get stuck indefinitely. For (2), I don't think this is how the situation should be described. Let's consider a scenario where an interrupt nests over the modification. First, the modification will switch the latch to the other version of the tree. Therefore, the interrupt will see a fully consistent tree, not the tree being modified. Therefore, a temporary loop in the tree should not be an issue for that peculiar situation. However, if we have another thread traversing the tree while we concurrently switch the latch and modify one version of the tree, creating a temporary loop in the tree, this thread could possibly: A) deadlock with the modification if there is a locking dependency between tree modification, tree read, and another lock (transitive dependency). B) have the modifier starved by the other thread, if that thread has a higher scheduling priority (e.g. RT) than the modifier. The high priority thread would then use all its CPU time to perform the temporary loop. So I agree that loops are unwanted there: it allows us to never have to care about situations A and B. However, the explanation about why should not involve, AFAIU, an interrupt handler nesting over the tree modification, because this is precisely one scenario that should not care about loops. Thoughs ? Thanks! Mathieu > > For 1) we must use WRITE_ONCE() for all updates to the tree structure; > in particular this patch only does rb_{left,right} as those are the > only element required for simple searches. > > It generates slightly worse code, probably because gcc stinks at > volatile. But in pointer chasing heavy code a few instructions more > should not matter. > > For 2) I have carefully audited the code and drawn every intermediate > link state and not found a loop. > > Cc: Oleg Nesterov > Cc: Michel Lespinasse > Cc: Andrea Arcangeli > Cc: David Woodhouse > Cc: Rik van Riel > Cc: Mathieu Desnoyers > Cc: "Paul E. McKenney" > Signed-off-by: Peter Zijlstra (Intel) > --- > include/linux/rbtree.h | 10 + > include/linux/rbtree_augmented.h | 21 +++ > lib/rbtree.c | 71 > ++- > 3 files changed, 73 insertions(+), 29 deletions(-) > > --- a/include/linux/rbtree.h > +++ b/include/linux/rbtree.h > @@ -31,6 +31,7 @@ > > #include > #include > +#include > > struct rb_node { > unsigned long __rb_parent_color; > @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r > *rb_link = node; > } > > +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * > parent, > + struct rb_node ** rb_link) > +{ > + node->__rb_parent_color = (unsigned long)parent; > + node->rb_left = node->rb_right = NULL; > + > + rcu_assign_pointer(*rb_link, node); > +} > + > #define rb_entry_safe(ptr, type, member) \ > ({ typeof(ptr) ptr = (ptr); \ > ptr ? rb_entry(ptr, type, member) : NULL; \ > --- a/include/linux/rbtree_augmented.h > +++ b/include/linux/rbtree_augmented.h > @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s > { > if (parent) { > if (parent->rb_left == old) > - parent->rb_left = new; > + WRITE_ONCE(parent->rb_left, new); > else > -
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote: Change the insert and erase code such that lockless searches are non-fatal. In and of itself an rbtree cannot be correctly searched while in-modification, we can however provide weaker guarantees that will allow the rbtree to be used in conjunction with other techniques, such as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()). For this to work we need the following guarantees from the rbtree code: 1) a lockless reader must not see partial stores, this would allow it to observe nodes that are invalid memory. 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For 1) we must use WRITE_ONCE() for all updates to the tree structure; in particular this patch only does rb_{left,right} as those are the only element required for simple searches. It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. So, I was worried that this would penalize all rbtree users, for the benefit of just the one you're adding later in this series. We have several rbtrees where we care about performance a lot, such as the ones used in the scheduler or for indexing vmas. That said, I checked with the compiler we are using here (gcc 4.7 variant) and I didn't see any change in the generated code. So, no issue here for me. If the object code really is different in your setup, please use the lib/rbtree_test module to check the performance impact of the change. For 2) I have carefully audited the code and drawn every intermediate link state and not found a loop. As Mathieu remarked, we are never modifying the currently active tree, so the interrupt case is not the reason for avoiding loops. I think your proposal will work well for the use case you have in mind (looking up modules based on address). However, I was wondering how you would compare your proposal against an alternative I hard Josh Triplett formulate before, where there would be one unique rbtree but rotations would allocate new nodes rather than modify the existing ones. I think this would be workable as well; I'm just not sure whether this would be more or less generally applicable than your proposal. Copying Josh in case he wants to chime in. Reviewed-by: Michel Lespinasse wal...@google.com -- Michel Walken Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
- Original Message - From: Peter Zijlstra pet...@infradead.org To: mi...@kernel.org, ru...@rustcorp.com.au, mathieu desnoyers mathieu.desnoy...@efficios.com, o...@redhat.com, paul...@linux.vnet.ibm.com Cc: linux-kernel@vger.kernel.org, a...@firstfloor.org, rost...@goodmis.org, t...@linutronix.de, pet...@infradead.org, Michel Lespinasse wal...@google.com, Andrea Arcangeli aarca...@redhat.com, David Woodhouse david.woodho...@intel.com, Rik van Riel r...@redhat.com Sent: Saturday, February 28, 2015 4:24:52 PM Subject: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal Change the insert and erase code such that lockless searches are non-fatal. In and of itself an rbtree cannot be correctly searched while in-modification, we can however provide weaker guarantees that will allow the rbtree to be used in conjunction with other techniques, such as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()). For this to work we need the following guarantees from the rbtree code: 1) a lockless reader must not see partial stores, this would allow it to observe nodes that are invalid memory. 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For (2), I don't think this is how the situation should be described. Let's consider a scenario where an interrupt nests over the modification. First, the modification will switch the latch to the other version of the tree. Therefore, the interrupt will see a fully consistent tree, not the tree being modified. Therefore, a temporary loop in the tree should not be an issue for that peculiar situation. However, if we have another thread traversing the tree while we concurrently switch the latch and modify one version of the tree, creating a temporary loop in the tree, this thread could possibly: A) deadlock with the modification if there is a locking dependency between tree modification, tree read, and another lock (transitive dependency). B) have the modifier starved by the other thread, if that thread has a higher scheduling priority (e.g. RT) than the modifier. The high priority thread would then use all its CPU time to perform the temporary loop. So I agree that loops are unwanted there: it allows us to never have to care about situations A and B. However, the explanation about why should not involve, AFAIU, an interrupt handler nesting over the tree modification, because this is precisely one scenario that should not care about loops. Thoughs ? Thanks! Mathieu For 1) we must use WRITE_ONCE() for all updates to the tree structure; in particular this patch only does rb_{left,right} as those are the only element required for simple searches. It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. For 2) I have carefully audited the code and drawn every intermediate link state and not found a loop. Cc: Oleg Nesterov o...@redhat.com Cc: Michel Lespinasse wal...@google.com Cc: Andrea Arcangeli aarca...@redhat.com Cc: David Woodhouse david.woodho...@intel.com Cc: Rik van Riel r...@redhat.com Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/linux/rbtree.h | 10 + include/linux/rbtree_augmented.h | 21 +++ lib/rbtree.c | 71 ++- 3 files changed, 73 insertions(+), 29 deletions(-) --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -31,6 +31,7 @@ #include linux/kernel.h #include linux/stddef.h +#include linux/rcupdate.h struct rb_node { unsigned long __rb_parent_color; @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r *rb_link = node; } +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * parent, + struct rb_node ** rb_link) +{ + node-__rb_parent_color = (unsigned long)parent; + node-rb_left = node-rb_right = NULL; + + rcu_assign_pointer(*rb_link, node); +} + #define rb_entry_safe(ptr, type, member) \ ({ typeof(ptr) ptr = (ptr); \ ptr ? rb_entry(ptr, type, member) : NULL; \ --- a/include/linux/rbtree_augmented.h +++ b/include/linux/rbtree_augmented.h @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s { if (parent) { if (parent-rb_left == old) - parent-rb_left = new; + WRITE_ONCE(parent-rb_left, new); else - parent-rb_right = new; + WRITE_ONCE(parent-rb_right, new); } else - root-rb_node = new; + WRITE_ONCE
Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
* Michel Lespinasse wal...@google.com wrote: On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra pet...@infradead.org wrote: Change the insert and erase code such that lockless searches are non-fatal. In and of itself an rbtree cannot be correctly searched while in-modification, we can however provide weaker guarantees that will allow the rbtree to be used in conjunction with other techniques, such as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()). For this to work we need the following guarantees from the rbtree code: 1) a lockless reader must not see partial stores, this would allow it to observe nodes that are invalid memory. 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For 1) we must use WRITE_ONCE() for all updates to the tree structure; in particular this patch only does rb_{left,right} as those are the only element required for simple searches. It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. So, I was worried that this would penalize all rbtree users, for the benefit of just the one you're adding later in this series. We have several rbtrees where we care about performance a lot, such as the ones used in the scheduler or for indexing vmas. That said, I checked with the compiler we are using here (gcc 4.7 variant) and I didn't see any change in the generated code. So, no issue here for me. Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get this comparison: lib/rbtree.o: textdata bss dec hex filename 3299 0 03299 ce3 rbtree.o.before 3308 0 03308 cec rbtree.o.after +9 bytes. Most of the difference seems minimal, involves an extra register move saving addresses in registers and using them there: 449: 4c 8b 60 10 mov0x10(%rax),%r12 - 44d: 49 39 fccmp%rdi,%r12 - 450: 0f 84 aa 00 00 00 je 500 __rb_insert_augmented+0x1a0 - 456: 49 89 c6mov%rax,%r14 - 459: 4d 85 e4test %r12,%r12 - 45c: 4c 89 63 08 mov%r12,0x8(%rbx) - 460: 48 89 58 10 mov%rbx,0x10(%rax) - 464: 74 0b je 471 __rb_insert_augmented+0x111 449: 4c 8b 60 10 mov0x10(%rax),%r12 + 44d: 48 89 c6mov%rax,%rsi + 450: 49 39 fccmp%rdi,%r12 + 453: 0f 84 97 00 00 00 je 4f0 __rb_insert_augmented+0x190 + 459: 49 89 c6mov%rax,%r14 + 45c: 4d 85 e4test %r12,%r12 + 45f: 4c 89 63 08 mov%r12,0x8(%rbx) + 463: 48 89 5e 10 mov%rbx,0x10(%rsi) + 467: 74 0b je 474 __rb_insert_augmented+0x114 So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi and used 0x10(%rsi). That seems to be plain compiler stupidity (that move to %rsi is senseless with or without volatile), and gcc might improve in the future. In my (admittedly quick) comparison I saw no serious changes like extra memory loads or stack spills. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
Change the insert and erase code such that lockless searches are non-fatal. In and of itself an rbtree cannot be correctly searched while in-modification, we can however provide weaker guarantees that will allow the rbtree to be used in conjunction with other techniques, such as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()"). For this to work we need the following guarantees from the rbtree code: 1) a lockless reader must not see partial stores, this would allow it to observe nodes that are invalid memory. 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For 1) we must use WRITE_ONCE() for all updates to the tree structure; in particular this patch only does rb_{left,right} as those are the only element required for simple searches. It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. For 2) I have carefully audited the code and drawn every intermediate link state and not found a loop. Cc: Oleg Nesterov Cc: Michel Lespinasse Cc: Andrea Arcangeli Cc: David Woodhouse Cc: Rik van Riel Cc: Mathieu Desnoyers Cc: "Paul E. McKenney" Signed-off-by: Peter Zijlstra (Intel) --- include/linux/rbtree.h | 10 + include/linux/rbtree_augmented.h | 21 +++ lib/rbtree.c | 71 ++- 3 files changed, 73 insertions(+), 29 deletions(-) --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -31,6 +31,7 @@ #include #include +#include struct rb_node { unsigned long __rb_parent_color; @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r *rb_link = node; } +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * parent, + struct rb_node ** rb_link) +{ + node->__rb_parent_color = (unsigned long)parent; + node->rb_left = node->rb_right = NULL; + + rcu_assign_pointer(*rb_link, node); +} + #define rb_entry_safe(ptr, type, member) \ ({ typeof(ptr) ptr = (ptr); \ ptr ? rb_entry(ptr, type, member) : NULL; \ --- a/include/linux/rbtree_augmented.h +++ b/include/linux/rbtree_augmented.h @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s { if (parent) { if (parent->rb_left == old) - parent->rb_left = new; + WRITE_ONCE(parent->rb_left, new); else - parent->rb_right = new; + WRITE_ONCE(parent->rb_right, new); } else - root->rb_node = new; + WRITE_ONCE(root->rb_node, new); } extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root, @@ -137,7 +137,8 @@ static __always_inline struct rb_node * __rb_erase_augmented(struct rb_node *node, struct rb_root *root, const struct rb_augment_callbacks *augment) { - struct rb_node *child = node->rb_right, *tmp = node->rb_left; + struct rb_node *child = node->rb_right; + struct rb_node *tmp = node->rb_left; struct rb_node *parent, *rebalance; unsigned long pc; @@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod tmp = parent; } else { struct rb_node *successor = child, *child2; + tmp = child->rb_left; if (!tmp) { /* @@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod */ parent = successor; child2 = successor->rb_right; + augment->copy(node, successor); } else { /* @@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod successor = tmp; tmp = tmp->rb_left; } while (tmp); - parent->rb_left = child2 = successor->rb_right; - successor->rb_right = child; + child2 = successor->rb_right; + WRITE_ONCE(parent->rb_left, child2); + WRITE_ONCE(successor->rb_right, child); rb_set_parent(child, successor); + augment->copy(node, successor); augment->propagate(parent, successor); } - successor->rb_left = tmp = node->rb_left; + tmp = node->rb_left; + WRITE_ONCE(successor->rb_left, tmp); rb_set_parent(tmp, successor); pc = node->__rb_parent_color; tmp = __rb_parent(pc); __rb_change_child(node, successor, tmp, root); +
[RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal
Change the insert and erase code such that lockless searches are non-fatal. In and of itself an rbtree cannot be correctly searched while in-modification, we can however provide weaker guarantees that will allow the rbtree to be used in conjunction with other techniques, such as latches; see 9b0fd802e8c0 (seqcount: Add raw_write_seqcount_latch()). For this to work we need the following guarantees from the rbtree code: 1) a lockless reader must not see partial stores, this would allow it to observe nodes that are invalid memory. 2) there must not be (temporary) loops in the tree structure in the modifier's program order, this would cause a lookup which interrupts the modifier to get stuck indefinitely. For 1) we must use WRITE_ONCE() for all updates to the tree structure; in particular this patch only does rb_{left,right} as those are the only element required for simple searches. It generates slightly worse code, probably because gcc stinks at volatile. But in pointer chasing heavy code a few instructions more should not matter. For 2) I have carefully audited the code and drawn every intermediate link state and not found a loop. Cc: Oleg Nesterov o...@redhat.com Cc: Michel Lespinasse wal...@google.com Cc: Andrea Arcangeli aarca...@redhat.com Cc: David Woodhouse david.woodho...@intel.com Cc: Rik van Riel r...@redhat.com Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/linux/rbtree.h | 10 + include/linux/rbtree_augmented.h | 21 +++ lib/rbtree.c | 71 ++- 3 files changed, 73 insertions(+), 29 deletions(-) --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -31,6 +31,7 @@ #include linux/kernel.h #include linux/stddef.h +#include linux/rcupdate.h struct rb_node { unsigned long __rb_parent_color; @@ -85,6 +86,15 @@ static inline void rb_link_node(struct r *rb_link = node; } +static inline void rb_link_node_rcu(struct rb_node * node, struct rb_node * parent, + struct rb_node ** rb_link) +{ + node-__rb_parent_color = (unsigned long)parent; + node-rb_left = node-rb_right = NULL; + + rcu_assign_pointer(*rb_link, node); +} + #define rb_entry_safe(ptr, type, member) \ ({ typeof(ptr) ptr = (ptr); \ ptr ? rb_entry(ptr, type, member) : NULL; \ --- a/include/linux/rbtree_augmented.h +++ b/include/linux/rbtree_augmented.h @@ -123,11 +123,11 @@ __rb_change_child(struct rb_node *old, s { if (parent) { if (parent-rb_left == old) - parent-rb_left = new; + WRITE_ONCE(parent-rb_left, new); else - parent-rb_right = new; + WRITE_ONCE(parent-rb_right, new); } else - root-rb_node = new; + WRITE_ONCE(root-rb_node, new); } extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root, @@ -137,7 +137,8 @@ static __always_inline struct rb_node * __rb_erase_augmented(struct rb_node *node, struct rb_root *root, const struct rb_augment_callbacks *augment) { - struct rb_node *child = node-rb_right, *tmp = node-rb_left; + struct rb_node *child = node-rb_right; + struct rb_node *tmp = node-rb_left; struct rb_node *parent, *rebalance; unsigned long pc; @@ -167,6 +168,7 @@ __rb_erase_augmented(struct rb_node *nod tmp = parent; } else { struct rb_node *successor = child, *child2; + tmp = child-rb_left; if (!tmp) { /* @@ -180,6 +182,7 @@ __rb_erase_augmented(struct rb_node *nod */ parent = successor; child2 = successor-rb_right; + augment-copy(node, successor); } else { /* @@ -201,19 +204,23 @@ __rb_erase_augmented(struct rb_node *nod successor = tmp; tmp = tmp-rb_left; } while (tmp); - parent-rb_left = child2 = successor-rb_right; - successor-rb_right = child; + child2 = successor-rb_right; + WRITE_ONCE(parent-rb_left, child2); + WRITE_ONCE(successor-rb_right, child); rb_set_parent(child, successor); + augment-copy(node, successor); augment-propagate(parent, successor); } - successor-rb_left = tmp = node-rb_left; + tmp = node-rb_left; + WRITE_ONCE(successor-rb_left, tmp); rb_set_parent(tmp,