Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On 12/08/2015 04:46 PM, Paul E. McKenney wrote: On Tue, Dec 08, 2015 at 04:20:03PM +1100, Paul Mackerras wrote: On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: As in the following? (And yes, I was confused by the fact that the docbook header was in front of a differently-named macro!) That looks great! Have you committed to one of your branches? Indeed I have, though quite recently. Please see branch rcu/dev of: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git Or, more specifically, this commit: 69b907297f4e If your testing goes well, I will push this into the upcoming merge window. Sorry, it took a little longer than I expected. It works just fine, thanks! -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On 12/08/2015 04:46 PM, Paul E. McKenney wrote: On Tue, Dec 08, 2015 at 04:20:03PM +1100, Paul Mackerras wrote: On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: As in the following? (And yes, I was confused by the fact that the docbook header was in front of a differently-named macro!) That looks great! Have you committed to one of your branches? Indeed I have, though quite recently. Please see branch rcu/dev of: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git Or, more specifically, this commit: 69b907297f4e If your testing goes well, I will push this into the upcoming merge window. Sorry, it took a little longer than I expected. It works just fine, thanks! -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Tue, Dec 08, 2015 at 04:20:03PM +1100, Paul Mackerras wrote: > On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: > > > > As in the following? (And yes, I was confused by the fact that the > > docbook header was in front of a differently-named macro!) > > That looks great! Have you committed to one of your branches? Indeed I have, though quite recently. Please see branch rcu/dev of: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git Or, more specifically, this commit: 69b907297f4e If your testing goes well, I will push this into the upcoming merge window. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: > > As in the following? (And yes, I was confused by the fact that the > docbook header was in front of a differently-named macro!) > > Thanx, Paul That looks great! Have you committed to one of your branches? Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: > > As in the following? (And yes, I was confused by the fact that the > docbook header was in front of a differently-named macro!) > > Thanx, Paul That looks great! Have you committed to one of your branches? Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Tue, Dec 08, 2015 at 04:20:03PM +1100, Paul Mackerras wrote: > On Sat, Dec 05, 2015 at 06:19:46PM -0800, Paul E. McKenney wrote: > > > > As in the following? (And yes, I was confused by the fact that the > > docbook header was in front of a differently-named macro!) > > That looks great! Have you committed to one of your branches? Indeed I have, though quite recently. Please see branch rcu/dev of: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git Or, more specifically, this commit: 69b907297f4e If your testing goes well, I will push this into the upcoming merge window. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Mon, Nov 30, 2015 at 12:30:06PM -0800, Paul E. McKenney wrote: > On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote: > > On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > > > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > > [snip] > > > > Still, is my approach correct? What does the comment for > > > > lockless_dereference() actally mean - it won't work together with > > > > RCU at all or this is to force people not to use it as > > > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > > > time"? :) > > > > > > Well, it depends... > > > > > > The key difference between lockless_dereference() and rcu_dereference() > > > is that lockless_dereference() won't complain if used outside of > > > an RCU read-side critical section. When there is no RCU read-side > > > critical section, lockless_dereference() cannot rely on RCU's normal > > > action of keeping the data item around. Therefore, when you are using > > > lockless_dereference(), you have to have some mechanism other than RCU > > > to keep the data item around. The usual approach is for the data item > > > to never be freed, for example, if data is only ever added to the list > > > and never removed. Other approaches include reference counting, hazard > > > pointers, hashed arrays of locks, garbage collectors, transactional > > > memory, and so on. > > > > So, the situation is that we have an RCU-protected list, which in this > > case we are traversing without modifying the list. We are in an > > restricted environment (hypervisor real mode) where we can't be > > preempted, both because interrupts are hard-disabled and because our > > caller has done preempt_disable(). In this restricted environment we > > can access the linear mapping (thus kmalloc'd data) but not the > > vmalloc or ioremap regions. > > > > Thus we are not formally in a RCU read-side critical section, though > > we believe that having preemption disabled gives us equivalent > > protection. Probably what we should do is to add a > > rcu_read_lock/unlock pair in a function higher up the call chain > > so that we are actually in a RCU read-side critical section. > > > > Then the only reason not to use list_for_each_entry_rcu would be that > > we don't trust the checking machinery not to ever access vmalloc'd > > data. In other words, we want a list_for_each_entry_nocheck > > or list_for_each_entry_restricted which omits all the lockdep > > checking. That would need a list_entry_rcu_nocheck which would need a > > thing like rcu_dereference_raw that does no lockdep checking - which > > is where I thought you suggested lockless_dereference. > > > > So, what name do you like for these primitives, and where should they > > go? > > I am good with list_for_each_entry_lockless() and with its using > lockless_dereference(). I am also good with this going into > include/linux/rculist.h. That said... > > o The name of the macro was last given as list_entry_lockless() > rather than the name list_for_each_entry_lockless() that was > used in the docbook comment. The latter is preferable because > it is consistent with the other list-traversal macros. > > o The docbook comment (and I quote) "This list-traversal > primitive may safely run concurrently" is clearly inadequate. > The points it needs to make are: > > 1. This locklessly traverses a list. > > 2. It may be used when items are never removed from the list. > > 3. It may also be used in environments where preemption > is implicitly disabled, but where lockdep cannot safely > be invoked. In this case, after an element is removed, > the synchronize_sched(), synchronize_sched_expedited(), > or call_rcu_sched() functions must be used to wait for > the needed grace period. > > o The commit log is also inadequate, though given a good patch, > I can fix this. It needs some of the information that we have > been sending back and forth in this email thread. > > It feels to me like we are going around in circles here. Someone needs > to submit an updated patch to move this forward: I cannot in good faith > accept the November 3rd patch. Are you or Alexey going to send this > patch, or should I produce my best guess as to what you guys need? As in the following? (And yes, I was confused by the fact that the docbook header was in front of a differently-named macro!) Thanx, Paul commit 428ac14295ea49853afa136c9120cc6fb554a030 Author: Alexey Kardashevskiy Date: Sat Dec 5 18:14:19 2015 -0800 list: Add lockless list traversal primitives Although list_for_each_entry_rcu() can in theory be used anywhere preemption is disabled, it can result in calls to
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Mon, Nov 30, 2015 at 12:30:06PM -0800, Paul E. McKenney wrote: > On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote: > > On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > > > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > > [snip] > > > > Still, is my approach correct? What does the comment for > > > > lockless_dereference() actally mean - it won't work together with > > > > RCU at all or this is to force people not to use it as > > > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > > > time"? :) > > > > > > Well, it depends... > > > > > > The key difference between lockless_dereference() and rcu_dereference() > > > is that lockless_dereference() won't complain if used outside of > > > an RCU read-side critical section. When there is no RCU read-side > > > critical section, lockless_dereference() cannot rely on RCU's normal > > > action of keeping the data item around. Therefore, when you are using > > > lockless_dereference(), you have to have some mechanism other than RCU > > > to keep the data item around. The usual approach is for the data item > > > to never be freed, for example, if data is only ever added to the list > > > and never removed. Other approaches include reference counting, hazard > > > pointers, hashed arrays of locks, garbage collectors, transactional > > > memory, and so on. > > > > So, the situation is that we have an RCU-protected list, which in this > > case we are traversing without modifying the list. We are in an > > restricted environment (hypervisor real mode) where we can't be > > preempted, both because interrupts are hard-disabled and because our > > caller has done preempt_disable(). In this restricted environment we > > can access the linear mapping (thus kmalloc'd data) but not the > > vmalloc or ioremap regions. > > > > Thus we are not formally in a RCU read-side critical section, though > > we believe that having preemption disabled gives us equivalent > > protection. Probably what we should do is to add a > > rcu_read_lock/unlock pair in a function higher up the call chain > > so that we are actually in a RCU read-side critical section. > > > > Then the only reason not to use list_for_each_entry_rcu would be that > > we don't trust the checking machinery not to ever access vmalloc'd > > data. In other words, we want a list_for_each_entry_nocheck > > or list_for_each_entry_restricted which omits all the lockdep > > checking. That would need a list_entry_rcu_nocheck which would need a > > thing like rcu_dereference_raw that does no lockdep checking - which > > is where I thought you suggested lockless_dereference. > > > > So, what name do you like for these primitives, and where should they > > go? > > I am good with list_for_each_entry_lockless() and with its using > lockless_dereference(). I am also good with this going into > include/linux/rculist.h. That said... > > o The name of the macro was last given as list_entry_lockless() > rather than the name list_for_each_entry_lockless() that was > used in the docbook comment. The latter is preferable because > it is consistent with the other list-traversal macros. > > o The docbook comment (and I quote) "This list-traversal > primitive may safely run concurrently" is clearly inadequate. > The points it needs to make are: > > 1. This locklessly traverses a list. > > 2. It may be used when items are never removed from the list. > > 3. It may also be used in environments where preemption > is implicitly disabled, but where lockdep cannot safely > be invoked. In this case, after an element is removed, > the synchronize_sched(), synchronize_sched_expedited(), > or call_rcu_sched() functions must be used to wait for > the needed grace period. > > o The commit log is also inadequate, though given a good patch, > I can fix this. It needs some of the information that we have > been sending back and forth in this email thread. > > It feels to me like we are going around in circles here. Someone needs > to submit an updated patch to move this forward: I cannot in good faith > accept the November 3rd patch. Are you or Alexey going to send this > patch, or should I produce my best guess as to what you guys need? As in the following? (And yes, I was confused by the fact that the docbook header was in front of a differently-named macro!) Thanx, Paul commit 428ac14295ea49853afa136c9120cc6fb554a030 Author: Alexey KardashevskiyDate: Sat Dec 5 18:14:19 2015 -0800 list: Add lockless list traversal primitives Although list_for_each_entry_rcu() can in theory be used anywhere preemption is disabled, it can
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote: > On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > [snip] > > > Still, is my approach correct? What does the comment for > > > lockless_dereference() actally mean - it won't work together with > > > RCU at all or this is to force people not to use it as > > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > > time"? :) > > > > Well, it depends... > > > > The key difference between lockless_dereference() and rcu_dereference() > > is that lockless_dereference() won't complain if used outside of > > an RCU read-side critical section. When there is no RCU read-side > > critical section, lockless_dereference() cannot rely on RCU's normal > > action of keeping the data item around. Therefore, when you are using > > lockless_dereference(), you have to have some mechanism other than RCU > > to keep the data item around. The usual approach is for the data item > > to never be freed, for example, if data is only ever added to the list > > and never removed. Other approaches include reference counting, hazard > > pointers, hashed arrays of locks, garbage collectors, transactional > > memory, and so on. > > So, the situation is that we have an RCU-protected list, which in this > case we are traversing without modifying the list. We are in an > restricted environment (hypervisor real mode) where we can't be > preempted, both because interrupts are hard-disabled and because our > caller has done preempt_disable(). In this restricted environment we > can access the linear mapping (thus kmalloc'd data) but not the > vmalloc or ioremap regions. > > Thus we are not formally in a RCU read-side critical section, though > we believe that having preemption disabled gives us equivalent > protection. Probably what we should do is to add a > rcu_read_lock/unlock pair in a function higher up the call chain > so that we are actually in a RCU read-side critical section. > > Then the only reason not to use list_for_each_entry_rcu would be that > we don't trust the checking machinery not to ever access vmalloc'd > data. In other words, we want a list_for_each_entry_nocheck > or list_for_each_entry_restricted which omits all the lockdep > checking. That would need a list_entry_rcu_nocheck which would need a > thing like rcu_dereference_raw that does no lockdep checking - which > is where I thought you suggested lockless_dereference. > > So, what name do you like for these primitives, and where should they > go? I am good with list_for_each_entry_lockless() and with its using lockless_dereference(). I am also good with this going into include/linux/rculist.h. That said... o The name of the macro was last given as list_entry_lockless() rather than the name list_for_each_entry_lockless() that was used in the docbook comment. The latter is preferable because it is consistent with the other list-traversal macros. o The docbook comment (and I quote) "This list-traversal primitive may safely run concurrently" is clearly inadequate. The points it needs to make are: 1. This locklessly traverses a list. 2. It may be used when items are never removed from the list. 3. It may also be used in environments where preemption is implicitly disabled, but where lockdep cannot safely be invoked. In this case, after an element is removed, the synchronize_sched(), synchronize_sched_expedited(), or call_rcu_sched() functions must be used to wait for the needed grace period. o The commit log is also inadequate, though given a good patch, I can fix this. It needs some of the information that we have been sending back and forth in this email thread. It feels to me like we are going around in circles here. Someone needs to submit an updated patch to move this forward: I cannot in good faith accept the November 3rd patch. Are you or Alexey going to send this patch, or should I produce my best guess as to what you guys need? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote: > On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > [snip] > > > Still, is my approach correct? What does the comment for > > > lockless_dereference() actally mean - it won't work together with > > > RCU at all or this is to force people not to use it as > > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > > time"? :) > > > > Well, it depends... > > > > The key difference between lockless_dereference() and rcu_dereference() > > is that lockless_dereference() won't complain if used outside of > > an RCU read-side critical section. When there is no RCU read-side > > critical section, lockless_dereference() cannot rely on RCU's normal > > action of keeping the data item around. Therefore, when you are using > > lockless_dereference(), you have to have some mechanism other than RCU > > to keep the data item around. The usual approach is for the data item > > to never be freed, for example, if data is only ever added to the list > > and never removed. Other approaches include reference counting, hazard > > pointers, hashed arrays of locks, garbage collectors, transactional > > memory, and so on. > > So, the situation is that we have an RCU-protected list, which in this > case we are traversing without modifying the list. We are in an > restricted environment (hypervisor real mode) where we can't be > preempted, both because interrupts are hard-disabled and because our > caller has done preempt_disable(). In this restricted environment we > can access the linear mapping (thus kmalloc'd data) but not the > vmalloc or ioremap regions. > > Thus we are not formally in a RCU read-side critical section, though > we believe that having preemption disabled gives us equivalent > protection. Probably what we should do is to add a > rcu_read_lock/unlock pair in a function higher up the call chain > so that we are actually in a RCU read-side critical section. > > Then the only reason not to use list_for_each_entry_rcu would be that > we don't trust the checking machinery not to ever access vmalloc'd > data. In other words, we want a list_for_each_entry_nocheck > or list_for_each_entry_restricted which omits all the lockdep > checking. That would need a list_entry_rcu_nocheck which would need a > thing like rcu_dereference_raw that does no lockdep checking - which > is where I thought you suggested lockless_dereference. > > So, what name do you like for these primitives, and where should they > go? I am good with list_for_each_entry_lockless() and with its using lockless_dereference(). I am also good with this going into include/linux/rculist.h. That said... o The name of the macro was last given as list_entry_lockless() rather than the name list_for_each_entry_lockless() that was used in the docbook comment. The latter is preferable because it is consistent with the other list-traversal macros. o The docbook comment (and I quote) "This list-traversal primitive may safely run concurrently" is clearly inadequate. The points it needs to make are: 1. This locklessly traverses a list. 2. It may be used when items are never removed from the list. 3. It may also be used in environments where preemption is implicitly disabled, but where lockdep cannot safely be invoked. In this case, after an element is removed, the synchronize_sched(), synchronize_sched_expedited(), or call_rcu_sched() functions must be used to wait for the needed grace period. o The commit log is also inadequate, though given a good patch, I can fix this. It needs some of the information that we have been sending back and forth in this email thread. It feels to me like we are going around in circles here. Someone needs to submit an updated patch to move this forward: I cannot in good faith accept the November 3rd patch. Are you or Alexey going to send this patch, or should I produce my best guess as to what you guys need? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: [snip] > > Still, is my approach correct? What does the comment for > > lockless_dereference() actally mean - it won't work together with > > RCU at all or this is to force people not to use it as > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > time"? :) > > Well, it depends... > > The key difference between lockless_dereference() and rcu_dereference() > is that lockless_dereference() won't complain if used outside of > an RCU read-side critical section. When there is no RCU read-side > critical section, lockless_dereference() cannot rely on RCU's normal > action of keeping the data item around. Therefore, when you are using > lockless_dereference(), you have to have some mechanism other than RCU > to keep the data item around. The usual approach is for the data item > to never be freed, for example, if data is only ever added to the list > and never removed. Other approaches include reference counting, hazard > pointers, hashed arrays of locks, garbage collectors, transactional > memory, and so on. So, the situation is that we have an RCU-protected list, which in this case we are traversing without modifying the list. We are in an restricted environment (hypervisor real mode) where we can't be preempted, both because interrupts are hard-disabled and because our caller has done preempt_disable(). In this restricted environment we can access the linear mapping (thus kmalloc'd data) but not the vmalloc or ioremap regions. Thus we are not formally in a RCU read-side critical section, though we believe that having preemption disabled gives us equivalent protection. Probably what we should do is to add a rcu_read_lock/unlock pair in a function higher up the call chain so that we are actually in a RCU read-side critical section. Then the only reason not to use list_for_each_entry_rcu would be that we don't trust the checking machinery not to ever access vmalloc'd data. In other words, we want a list_for_each_entry_nocheck or list_for_each_entry_restricted which omits all the lockdep checking. That would need a list_entry_rcu_nocheck which would need a thing like rcu_dereference_raw that does no lockdep checking - which is where I thought you suggested lockless_dereference. So, what name do you like for these primitives, and where should they go? Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote: > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: [snip] > > Still, is my approach correct? What does the comment for > > lockless_dereference() actally mean - it won't work together with > > RCU at all or this is to force people not to use it as > > "list_for_each_entry_rcu() should really be used in 99.99% of the > > time"? :) > > Well, it depends... > > The key difference between lockless_dereference() and rcu_dereference() > is that lockless_dereference() won't complain if used outside of > an RCU read-side critical section. When there is no RCU read-side > critical section, lockless_dereference() cannot rely on RCU's normal > action of keeping the data item around. Therefore, when you are using > lockless_dereference(), you have to have some mechanism other than RCU > to keep the data item around. The usual approach is for the data item > to never be freed, for example, if data is only ever added to the list > and never removed. Other approaches include reference counting, hazard > pointers, hashed arrays of locks, garbage collectors, transactional > memory, and so on. So, the situation is that we have an RCU-protected list, which in this case we are traversing without modifying the list. We are in an restricted environment (hypervisor real mode) where we can't be preempted, both because interrupts are hard-disabled and because our caller has done preempt_disable(). In this restricted environment we can access the linear mapping (thus kmalloc'd data) but not the vmalloc or ioremap regions. Thus we are not formally in a RCU read-side critical section, though we believe that having preemption disabled gives us equivalent protection. Probably what we should do is to add a rcu_read_lock/unlock pair in a function higher up the call chain so that we are actually in a RCU read-side critical section. Then the only reason not to use list_for_each_entry_rcu would be that we don't trust the checking machinery not to ever access vmalloc'd data. In other words, we want a list_for_each_entry_nocheck or list_for_each_entry_restricted which omits all the lockdep checking. That would need a list_entry_rcu_nocheck which would need a thing like rcu_dereference_raw that does no lockdep checking - which is where I thought you suggested lockless_dereference. So, what name do you like for these primitives, and where should they go? Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > On 11/04/2015 01:39 AM, Steven Rostedt wrote: > >On Tue, 3 Nov 2015 17:57:05 +1100 > >Alexey Kardashevskiy wrote: > > > >>This defines list_for_each_entry_lockless. This allows safe list > >>traversing in cases when lockdep() invocation is unwanted like > >>real mode (MMU is off). > >> > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >> > >>This is for VFIO acceleration in POWERKVM for pSeries guests. > >>There is a KVM instance. There also can be some VFIO (PCI passthru) > >>devices attached to a KVM guest. > >> > >>To perform DMA, a pSeries guest registers DMA memory by calling some > >>hypercalls explicitely at the rate close to one-two hcalls per > >>a network packet, i.e. very often. When a guest does a hypercall > >>(which is just an assembly instruction), the host kernel receives it > >>in the real mode (MMU is off). When real mode fails to handle it, > >>it enables MMU and tries handling a hcall in virtual mode. > >> > >>A logical bus ID (LIOBN) is a tagret id for these hypecalls. > >> > >>Each VFIO device belongs to an IOMMU group. Each group has an address > >>translation table. It is allowed to have multiple IOMMU groups (i.e. > >>multiple tables) under the same LIOBN. > >> > >>So effectively every DMA hcall has to update one or more TCE tables > >>attached to the same LIOBN. RCU is used to update/traverse this list > >>safely. > >> > >>Using RCU as is in virtual mode is fine. Lockdep works, etc. > >>list_add_rcu() is used to populate the list; > >>list_del_rcu() + call_rcu() used to remove groups from a list. > >>These operations can happen in runtim as a result of PCI hotplug/unplug > >>in guests. > >> > >>Using RCU as is in real mode is not fine as some RCU checks can lock up > >>the system and in real mode we won't even have a chance to see any > >>debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. > >> > >>Previous version of this used to define list_for_each_entry_rcu_notrace() > >>but it was proposed to use list_entry_lockless() instead. However > >>the comment for lockless_dereference() suggests this is a good idea > >>if "lifetime is managed by something other than RCU" but it is in my case. > >> > >>So what would be the correct approach here? Thanks. > > > >If the only use case for this so far is in POWERKVM, perhaps it should > >be defined specifically (and in arch/powerpc) and not confuse others > >about using this. > > > >Or, if you do imagine that this can be used in other scenarios, then a > >much deeper comment must be made in the code in the kerneldoc section. > >list_for_each_entry_rcu() should really be used in 99.99% of the time > >in the kernel. This looks to be an extreme exception. I hate to add a > >generic helper for something that will only be used in one location. > > No, I cannot imagine this really and I can move it somewhere in arch/powerpc. > > Still, is my approach correct? What does the comment for > lockless_dereference() actally mean - it won't work together with > RCU at all or this is to force people not to use it as > "list_for_each_entry_rcu() should really be used in 99.99% of the > time"? :) Well, it depends... The key difference between lockless_dereference() and rcu_dereference() is that lockless_dereference() won't complain if used outside of an RCU read-side critical section. When there is no RCU read-side critical section, lockless_dereference() cannot rely on RCU's normal action of keeping the data item around. Therefore, when you are using lockless_dereference(), you have to have some mechanism other than RCU to keep the data item around. The usual approach is for the data item to never be freed, for example, if data is only ever added to the list and never removed. Other approaches include reference counting, hazard pointers, hashed arrays of locks, garbage collectors, transactional memory, and so on. Of these other approaches: 1. Reference counting is quite hard to use. 2. Hazard pointers are easier to use than reference counting, but still requires that races with deletion force the traversal to be retried from the top level. 3. Hashed arrays of locks pose interesting deadlock issues, and also poor locality of reference, which in turn results in not-so-good performance and scalability. 4. As far as I know, garbage collectors are not available in the Linux kernel. 5. Full-blown hardware transactional memory is only available as s390's constrained transactions. (Yes, I know about x86 and powerpc's transactional memory, and in particular, I know that it needs a software fallback, which needs to be some other item on this list.) The software transactional memory implementations that I am aware of suffer from not-so-good performance and scalability. So, in the Linux kernel, it is common to use RCU. In any case, the docbook comment
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote: > On 11/04/2015 01:39 AM, Steven Rostedt wrote: > >On Tue, 3 Nov 2015 17:57:05 +1100 > >Alexey Kardashevskiywrote: > > > >>This defines list_for_each_entry_lockless. This allows safe list > >>traversing in cases when lockdep() invocation is unwanted like > >>real mode (MMU is off). > >> > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >> > >>This is for VFIO acceleration in POWERKVM for pSeries guests. > >>There is a KVM instance. There also can be some VFIO (PCI passthru) > >>devices attached to a KVM guest. > >> > >>To perform DMA, a pSeries guest registers DMA memory by calling some > >>hypercalls explicitely at the rate close to one-two hcalls per > >>a network packet, i.e. very often. When a guest does a hypercall > >>(which is just an assembly instruction), the host kernel receives it > >>in the real mode (MMU is off). When real mode fails to handle it, > >>it enables MMU and tries handling a hcall in virtual mode. > >> > >>A logical bus ID (LIOBN) is a tagret id for these hypecalls. > >> > >>Each VFIO device belongs to an IOMMU group. Each group has an address > >>translation table. It is allowed to have multiple IOMMU groups (i.e. > >>multiple tables) under the same LIOBN. > >> > >>So effectively every DMA hcall has to update one or more TCE tables > >>attached to the same LIOBN. RCU is used to update/traverse this list > >>safely. > >> > >>Using RCU as is in virtual mode is fine. Lockdep works, etc. > >>list_add_rcu() is used to populate the list; > >>list_del_rcu() + call_rcu() used to remove groups from a list. > >>These operations can happen in runtim as a result of PCI hotplug/unplug > >>in guests. > >> > >>Using RCU as is in real mode is not fine as some RCU checks can lock up > >>the system and in real mode we won't even have a chance to see any > >>debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. > >> > >>Previous version of this used to define list_for_each_entry_rcu_notrace() > >>but it was proposed to use list_entry_lockless() instead. However > >>the comment for lockless_dereference() suggests this is a good idea > >>if "lifetime is managed by something other than RCU" but it is in my case. > >> > >>So what would be the correct approach here? Thanks. > > > >If the only use case for this so far is in POWERKVM, perhaps it should > >be defined specifically (and in arch/powerpc) and not confuse others > >about using this. > > > >Or, if you do imagine that this can be used in other scenarios, then a > >much deeper comment must be made in the code in the kerneldoc section. > >list_for_each_entry_rcu() should really be used in 99.99% of the time > >in the kernel. This looks to be an extreme exception. I hate to add a > >generic helper for something that will only be used in one location. > > No, I cannot imagine this really and I can move it somewhere in arch/powerpc. > > Still, is my approach correct? What does the comment for > lockless_dereference() actally mean - it won't work together with > RCU at all or this is to force people not to use it as > "list_for_each_entry_rcu() should really be used in 99.99% of the > time"? :) Well, it depends... The key difference between lockless_dereference() and rcu_dereference() is that lockless_dereference() won't complain if used outside of an RCU read-side critical section. When there is no RCU read-side critical section, lockless_dereference() cannot rely on RCU's normal action of keeping the data item around. Therefore, when you are using lockless_dereference(), you have to have some mechanism other than RCU to keep the data item around. The usual approach is for the data item to never be freed, for example, if data is only ever added to the list and never removed. Other approaches include reference counting, hazard pointers, hashed arrays of locks, garbage collectors, transactional memory, and so on. Of these other approaches: 1. Reference counting is quite hard to use. 2. Hazard pointers are easier to use than reference counting, but still requires that races with deletion force the traversal to be retried from the top level. 3. Hashed arrays of locks pose interesting deadlock issues, and also poor locality of reference, which in turn results in not-so-good performance and scalability. 4. As far as I know, garbage collectors are not available in the Linux kernel. 5. Full-blown hardware transactional memory is only available as s390's constrained transactions. (Yes, I know about x86 and powerpc's transactional memory, and in particular, I know that it needs a software fallback, which needs to be some other item on this list.) The software transactional memory implementations that I am aware of suffer from not-so-good performance and scalability. So, in the Linux kernel, it is common to use RCU. In
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On 11/04/2015 01:39 AM, Steven Rostedt wrote: On Tue, 3 Nov 2015 17:57:05 +1100 Alexey Kardashevskiy wrote: This defines list_for_each_entry_lockless. This allows safe list traversing in cases when lockdep() invocation is unwanted like real mode (MMU is off). Signed-off-by: Alexey Kardashevskiy --- This is for VFIO acceleration in POWERKVM for pSeries guests. There is a KVM instance. There also can be some VFIO (PCI passthru) devices attached to a KVM guest. To perform DMA, a pSeries guest registers DMA memory by calling some hypercalls explicitely at the rate close to one-two hcalls per a network packet, i.e. very often. When a guest does a hypercall (which is just an assembly instruction), the host kernel receives it in the real mode (MMU is off). When real mode fails to handle it, it enables MMU and tries handling a hcall in virtual mode. A logical bus ID (LIOBN) is a tagret id for these hypecalls. Each VFIO device belongs to an IOMMU group. Each group has an address translation table. It is allowed to have multiple IOMMU groups (i.e. multiple tables) under the same LIOBN. So effectively every DMA hcall has to update one or more TCE tables attached to the same LIOBN. RCU is used to update/traverse this list safely. Using RCU as is in virtual mode is fine. Lockdep works, etc. list_add_rcu() is used to populate the list; list_del_rcu() + call_rcu() used to remove groups from a list. These operations can happen in runtim as a result of PCI hotplug/unplug in guests. Using RCU as is in real mode is not fine as some RCU checks can lock up the system and in real mode we won't even have a chance to see any debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. Previous version of this used to define list_for_each_entry_rcu_notrace() but it was proposed to use list_entry_lockless() instead. However the comment for lockless_dereference() suggests this is a good idea if "lifetime is managed by something other than RCU" but it is in my case. So what would be the correct approach here? Thanks. If the only use case for this so far is in POWERKVM, perhaps it should be defined specifically (and in arch/powerpc) and not confuse others about using this. Or, if you do imagine that this can be used in other scenarios, then a much deeper comment must be made in the code in the kerneldoc section. list_for_each_entry_rcu() should really be used in 99.99% of the time in the kernel. This looks to be an extreme exception. I hate to add a generic helper for something that will only be used in one location. No, I cannot imagine this really and I can move it somewhere in arch/powerpc. Still, is my approach correct? What does the comment for lockless_dereference() actally mean - it won't work together with RCU at all or this is to force people not to use it as "list_for_each_entry_rcu() should really be used in 99.99% of the time"? :) -- Steve --- include/linux/rculist.h | 16 1 file changed, 16 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 17c6b1f..a83a924 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_lockless - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently + */ +#define list_entry_lockless(ptr, type, member) \ + container_of((typeof(ptr))lockless_dereference(ptr), type, member) + +#define list_for_each_entry_lockless(pos, head, member) \ + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ + >member != (head); \ + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On 11/04/2015 01:39 AM, Steven Rostedt wrote: On Tue, 3 Nov 2015 17:57:05 +1100 Alexey Kardashevskiywrote: This defines list_for_each_entry_lockless. This allows safe list traversing in cases when lockdep() invocation is unwanted like real mode (MMU is off). Signed-off-by: Alexey Kardashevskiy --- This is for VFIO acceleration in POWERKVM for pSeries guests. There is a KVM instance. There also can be some VFIO (PCI passthru) devices attached to a KVM guest. To perform DMA, a pSeries guest registers DMA memory by calling some hypercalls explicitely at the rate close to one-two hcalls per a network packet, i.e. very often. When a guest does a hypercall (which is just an assembly instruction), the host kernel receives it in the real mode (MMU is off). When real mode fails to handle it, it enables MMU and tries handling a hcall in virtual mode. A logical bus ID (LIOBN) is a tagret id for these hypecalls. Each VFIO device belongs to an IOMMU group. Each group has an address translation table. It is allowed to have multiple IOMMU groups (i.e. multiple tables) under the same LIOBN. So effectively every DMA hcall has to update one or more TCE tables attached to the same LIOBN. RCU is used to update/traverse this list safely. Using RCU as is in virtual mode is fine. Lockdep works, etc. list_add_rcu() is used to populate the list; list_del_rcu() + call_rcu() used to remove groups from a list. These operations can happen in runtim as a result of PCI hotplug/unplug in guests. Using RCU as is in real mode is not fine as some RCU checks can lock up the system and in real mode we won't even have a chance to see any debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. Previous version of this used to define list_for_each_entry_rcu_notrace() but it was proposed to use list_entry_lockless() instead. However the comment for lockless_dereference() suggests this is a good idea if "lifetime is managed by something other than RCU" but it is in my case. So what would be the correct approach here? Thanks. If the only use case for this so far is in POWERKVM, perhaps it should be defined specifically (and in arch/powerpc) and not confuse others about using this. Or, if you do imagine that this can be used in other scenarios, then a much deeper comment must be made in the code in the kerneldoc section. list_for_each_entry_rcu() should really be used in 99.99% of the time in the kernel. This looks to be an extreme exception. I hate to add a generic helper for something that will only be used in one location. No, I cannot imagine this really and I can move it somewhere in arch/powerpc. Still, is my approach correct? What does the comment for lockless_dereference() actally mean - it won't work together with RCU at all or this is to force people not to use it as "list_for_each_entry_rcu() should really be used in 99.99% of the time"? :) -- Steve --- include/linux/rculist.h | 16 1 file changed, 16 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 17c6b1f..a83a924 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_lockless - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently + */ +#define list_entry_lockless(ptr, type, member) \ + container_of((typeof(ptr))lockless_dereference(ptr), type, member) + +#define list_for_each_entry_lockless(pos, head, member) \ + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ + >member != (head); \ + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Tue, 3 Nov 2015 17:57:05 +1100 Alexey Kardashevskiy wrote: > This defines list_for_each_entry_lockless. This allows safe list > traversing in cases when lockdep() invocation is unwanted like > real mode (MMU is off). > > Signed-off-by: Alexey Kardashevskiy > --- > > This is for VFIO acceleration in POWERKVM for pSeries guests. > There is a KVM instance. There also can be some VFIO (PCI passthru) > devices attached to a KVM guest. > > To perform DMA, a pSeries guest registers DMA memory by calling some > hypercalls explicitely at the rate close to one-two hcalls per > a network packet, i.e. very often. When a guest does a hypercall > (which is just an assembly instruction), the host kernel receives it > in the real mode (MMU is off). When real mode fails to handle it, > it enables MMU and tries handling a hcall in virtual mode. > > A logical bus ID (LIOBN) is a tagret id for these hypecalls. > > Each VFIO device belongs to an IOMMU group. Each group has an address > translation table. It is allowed to have multiple IOMMU groups (i.e. > multiple tables) under the same LIOBN. > > So effectively every DMA hcall has to update one or more TCE tables > attached to the same LIOBN. RCU is used to update/traverse this list > safely. > > Using RCU as is in virtual mode is fine. Lockdep works, etc. > list_add_rcu() is used to populate the list; > list_del_rcu() + call_rcu() used to remove groups from a list. > These operations can happen in runtim as a result of PCI hotplug/unplug > in guests. > > Using RCU as is in real mode is not fine as some RCU checks can lock up > the system and in real mode we won't even have a chance to see any > debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. > > Previous version of this used to define list_for_each_entry_rcu_notrace() > but it was proposed to use list_entry_lockless() instead. However > the comment for lockless_dereference() suggests this is a good idea > if "lifetime is managed by something other than RCU" but it is in my case. > > So what would be the correct approach here? Thanks. If the only use case for this so far is in POWERKVM, perhaps it should be defined specifically (and in arch/powerpc) and not confuse others about using this. Or, if you do imagine that this can be used in other scenarios, then a much deeper comment must be made in the code in the kerneldoc section. list_for_each_entry_rcu() should really be used in 99.99% of the time in the kernel. This looks to be an extreme exception. I hate to add a generic helper for something that will only be used in one location. -- Steve > --- > include/linux/rculist.h | 16 > 1 file changed, 16 insertions(+) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 17c6b1f..a83a924 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head > *list, > pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) > > /** > + * list_for_each_entry_lockless - iterate over rcu list of given type > + * @pos: the type * to use as a loop cursor. > + * @head:the head for your list. > + * @member: the name of the list_struct within the struct. > + * > + * This list-traversal primitive may safely run concurrently > + */ > +#define list_entry_lockless(ptr, type, member) \ > + container_of((typeof(ptr))lockless_dereference(ptr), type, member) > + > +#define list_for_each_entry_lockless(pos, head, member) \ > + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ > + >member != (head); \ > + pos = list_entry_lockless(pos->member.next, typeof(*pos), > member)) > + > +/** > * list_for_each_entry_continue_rcu - continue iteration over list of given > type > * @pos: the type * to use as a loop cursor. > * @head:the head for your list. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
On Tue, 3 Nov 2015 17:57:05 +1100 Alexey Kardashevskiywrote: > This defines list_for_each_entry_lockless. This allows safe list > traversing in cases when lockdep() invocation is unwanted like > real mode (MMU is off). > > Signed-off-by: Alexey Kardashevskiy > --- > > This is for VFIO acceleration in POWERKVM for pSeries guests. > There is a KVM instance. There also can be some VFIO (PCI passthru) > devices attached to a KVM guest. > > To perform DMA, a pSeries guest registers DMA memory by calling some > hypercalls explicitely at the rate close to one-two hcalls per > a network packet, i.e. very often. When a guest does a hypercall > (which is just an assembly instruction), the host kernel receives it > in the real mode (MMU is off). When real mode fails to handle it, > it enables MMU and tries handling a hcall in virtual mode. > > A logical bus ID (LIOBN) is a tagret id for these hypecalls. > > Each VFIO device belongs to an IOMMU group. Each group has an address > translation table. It is allowed to have multiple IOMMU groups (i.e. > multiple tables) under the same LIOBN. > > So effectively every DMA hcall has to update one or more TCE tables > attached to the same LIOBN. RCU is used to update/traverse this list > safely. > > Using RCU as is in virtual mode is fine. Lockdep works, etc. > list_add_rcu() is used to populate the list; > list_del_rcu() + call_rcu() used to remove groups from a list. > These operations can happen in runtim as a result of PCI hotplug/unplug > in guests. > > Using RCU as is in real mode is not fine as some RCU checks can lock up > the system and in real mode we won't even have a chance to see any > debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. > > Previous version of this used to define list_for_each_entry_rcu_notrace() > but it was proposed to use list_entry_lockless() instead. However > the comment for lockless_dereference() suggests this is a good idea > if "lifetime is managed by something other than RCU" but it is in my case. > > So what would be the correct approach here? Thanks. If the only use case for this so far is in POWERKVM, perhaps it should be defined specifically (and in arch/powerpc) and not confuse others about using this. Or, if you do imagine that this can be used in other scenarios, then a much deeper comment must be made in the code in the kerneldoc section. list_for_each_entry_rcu() should really be used in 99.99% of the time in the kernel. This looks to be an extreme exception. I hate to add a generic helper for something that will only be used in one location. -- Steve > --- > include/linux/rculist.h | 16 > 1 file changed, 16 insertions(+) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 17c6b1f..a83a924 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head > *list, > pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) > > /** > + * list_for_each_entry_lockless - iterate over rcu list of given type > + * @pos: the type * to use as a loop cursor. > + * @head:the head for your list. > + * @member: the name of the list_struct within the struct. > + * > + * This list-traversal primitive may safely run concurrently > + */ > +#define list_entry_lockless(ptr, type, member) \ > + container_of((typeof(ptr))lockless_dereference(ptr), type, member) > + > +#define list_for_each_entry_lockless(pos, head, member) \ > + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ > + >member != (head); \ > + pos = list_entry_lockless(pos->member.next, typeof(*pos), > member)) > + > +/** > * list_for_each_entry_continue_rcu - continue iteration over list of given > type > * @pos: the type * to use as a loop cursor. > * @head:the head for your list. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
This defines list_for_each_entry_lockless. This allows safe list traversing in cases when lockdep() invocation is unwanted like real mode (MMU is off). Signed-off-by: Alexey Kardashevskiy --- This is for VFIO acceleration in POWERKVM for pSeries guests. There is a KVM instance. There also can be some VFIO (PCI passthru) devices attached to a KVM guest. To perform DMA, a pSeries guest registers DMA memory by calling some hypercalls explicitely at the rate close to one-two hcalls per a network packet, i.e. very often. When a guest does a hypercall (which is just an assembly instruction), the host kernel receives it in the real mode (MMU is off). When real mode fails to handle it, it enables MMU and tries handling a hcall in virtual mode. A logical bus ID (LIOBN) is a tagret id for these hypecalls. Each VFIO device belongs to an IOMMU group. Each group has an address translation table. It is allowed to have multiple IOMMU groups (i.e. multiple tables) under the same LIOBN. So effectively every DMA hcall has to update one or more TCE tables attached to the same LIOBN. RCU is used to update/traverse this list safely. Using RCU as is in virtual mode is fine. Lockdep works, etc. list_add_rcu() is used to populate the list; list_del_rcu() + call_rcu() used to remove groups from a list. These operations can happen in runtim as a result of PCI hotplug/unplug in guests. Using RCU as is in real mode is not fine as some RCU checks can lock up the system and in real mode we won't even have a chance to see any debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. Previous version of this used to define list_for_each_entry_rcu_notrace() but it was proposed to use list_entry_lockless() instead. However the comment for lockless_dereference() suggests this is a good idea if "lifetime is managed by something other than RCU" but it is in my case. So what would be the correct approach here? Thanks. --- include/linux/rculist.h | 16 1 file changed, 16 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 17c6b1f..a83a924 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_lockless - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently + */ +#define list_entry_lockless(ptr, type, member) \ + container_of((typeof(ptr))lockless_dereference(ptr), type, member) + +#define list_for_each_entry_lockless(pos, head, member) \ + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ + >member != (head); \ + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- 2.5.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu
This defines list_for_each_entry_lockless. This allows safe list traversing in cases when lockdep() invocation is unwanted like real mode (MMU is off). Signed-off-by: Alexey Kardashevskiy--- This is for VFIO acceleration in POWERKVM for pSeries guests. There is a KVM instance. There also can be some VFIO (PCI passthru) devices attached to a KVM guest. To perform DMA, a pSeries guest registers DMA memory by calling some hypercalls explicitely at the rate close to one-two hcalls per a network packet, i.e. very often. When a guest does a hypercall (which is just an assembly instruction), the host kernel receives it in the real mode (MMU is off). When real mode fails to handle it, it enables MMU and tries handling a hcall in virtual mode. A logical bus ID (LIOBN) is a tagret id for these hypecalls. Each VFIO device belongs to an IOMMU group. Each group has an address translation table. It is allowed to have multiple IOMMU groups (i.e. multiple tables) under the same LIOBN. So effectively every DMA hcall has to update one or more TCE tables attached to the same LIOBN. RCU is used to update/traverse this list safely. Using RCU as is in virtual mode is fine. Lockdep works, etc. list_add_rcu() is used to populate the list; list_del_rcu() + call_rcu() used to remove groups from a list. These operations can happen in runtim as a result of PCI hotplug/unplug in guests. Using RCU as is in real mode is not fine as some RCU checks can lock up the system and in real mode we won't even have a chance to see any debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. Previous version of this used to define list_for_each_entry_rcu_notrace() but it was proposed to use list_entry_lockless() instead. However the comment for lockless_dereference() suggests this is a good idea if "lifetime is managed by something other than RCU" but it is in my case. So what would be the correct approach here? Thanks. --- include/linux/rculist.h | 16 1 file changed, 16 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 17c6b1f..a83a924 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_lockless - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently + */ +#define list_entry_lockless(ptr, type, member) \ + container_of((typeof(ptr))lockless_dereference(ptr), type, member) + +#define list_for_each_entry_lockless(pos, head, member) \ + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ + >member != (head); \ + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- 2.5.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/