Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
On 10/28/2015 09:33 AM, Ingo Molnar wrote: * Tejun Heo wrote: Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to walk @bdi->wb_list. To set up the initial iteration condition, it uses list_entry_rcu() to calculate the entry pointer corresponding to the list head; however, this isn't an actual RCU dereference and using list_entry_rcu() for it ended up breaking a proposed list_entry_rcu() change because it was feeding an non-lvalue pointer into the macro. Don't use the RCU variant for simple pointer offsetting. Use list_entry() instead. Signed-off-by: Tejun Heo --- fs/fs-writeback.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 29e4599..7378169 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, bool skip_if_busy) { struct bdi_writeback *last_wb = NULL; - struct bdi_writeback *wb = list_entry_rcu(>wb_list, - struct bdi_writeback, bdi_node); + struct bdi_writeback *wb = list_entry(>wb_list, + struct bdi_writeback, bdi_node); might_sleep(); Any objections against me applying this fix to tip:core/rcu so that I can push the recent RCU changes towards linux-next without triggering a build failure? No objection on my side but probably you are waiting for an ack from somebody else. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
On 10/28/2015 09:33 AM, Ingo Molnar wrote: * Tejun Heowrote: Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to walk @bdi->wb_list. To set up the initial iteration condition, it uses list_entry_rcu() to calculate the entry pointer corresponding to the list head; however, this isn't an actual RCU dereference and using list_entry_rcu() for it ended up breaking a proposed list_entry_rcu() change because it was feeding an non-lvalue pointer into the macro. Don't use the RCU variant for simple pointer offsetting. Use list_entry() instead. Signed-off-by: Tejun Heo --- fs/fs-writeback.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 29e4599..7378169 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, bool skip_if_busy) { struct bdi_writeback *last_wb = NULL; - struct bdi_writeback *wb = list_entry_rcu(>wb_list, - struct bdi_writeback, bdi_node); + struct bdi_writeback *wb = list_entry(>wb_list, + struct bdi_writeback, bdi_node); might_sleep(); Any objections against me applying this fix to tip:core/rcu so that I can push the recent RCU changes towards linux-next without triggering a build failure? No objection on my side but probably you are waiting for an ack from somebody else. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 09/22/2015 10:50 PM, Paul E. McKenney wrote: On Sun, Sep 13, 2015 at 09:10:24AM -0700, Paul E. McKenney wrote: On Sun, Sep 13, 2015 at 12:06:33PM +0200, Patrick Marlier wrote: [ . . . ] Paul, This sounds good to me. It should fix the performance issue (will check with my benchmark). Thank you, looking forward to seeing the results! Just following up -- how is the benchmarking going? Note that in my module I am using the kernel version 3.16.0-31 (I ported your change). Here the results of my benchmark that tests rculist in the case of read only. # 1st column : The number of threads # 2nd : ops/s the original version # 3rd : ops/s your version with lockless_dereference 1 883923 1747554 2 1741441 3543062 3 2462360 5103647 4 3437273 7176608 6 5155803 9812348 8 6718111 13330050 10 8519227 17458294 12 9773632 20298897 14 11555198 23191424 16 11643264 25125712 I get same performance with my patch. So indeed it fixes the performance problem I was seeing. Thanks a lot! -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 09/22/2015 10:50 PM, Paul E. McKenney wrote: On Sun, Sep 13, 2015 at 09:10:24AM -0700, Paul E. McKenney wrote: On Sun, Sep 13, 2015 at 12:06:33PM +0200, Patrick Marlier wrote: [ . . . ] Paul, This sounds good to me. It should fix the performance issue (will check with my benchmark). Thank you, looking forward to seeing the results! Just following up -- how is the benchmarking going? Note that in my module I am using the kernel version 3.16.0-31 (I ported your change). Here the results of my benchmark that tests rculist in the case of read only. # 1st column : The number of threads # 2nd : ops/s the original version # 3rd : ops/s your version with lockless_dereference 1 883923 1747554 2 1741441 3543062 3 2462360 5103647 4 3437273 7176608 6 5155803 9812348 8 6718111 13330050 10 8519227 17458294 12 9773632 20298897 14 11555198 23191424 16 11643264 25125712 I get same performance with my patch. So indeed it fixes the performance problem I was seeing. Thanks a lot! -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 09/12/2015 01:05 AM, Paul E. McKenney wrote: On Tue, May 19, 2015 at 03:07:25PM -0700, Paul E. McKenney wrote: On Mon, May 18, 2015 at 09:43:21AM -0400, Steven Rostedt wrote: On Mon, 18 May 2015 12:06:47 +1000 NeilBrown wrote: struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that "list_entry_rcu" before and after the patch is reading the same thing. No it isn't. Before the patch it is passed the address of the 'next' field. After the patch it is passed the contents of the 'next' field. Right. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? What needs fixing? I don't see anything broken. Maybe there is something in this "rculist patch" that I'm missing. Can you point me at it? Probably some debugging tool like sparse notices that the assignment isn't a true list entry and complains about it. In other words, I think the real fix is to fix the debugging tool to ignore this, because the code is correct, and this is a false positive failure, and is causing more harm than good, because people are sending out broken patches due to it. OK, finally did the history trawling that I should have done to begin with. Back in 2010, Arnd added the __rcu pointer checking in sparse. But the RCU list primitives were used on non-RCU-protected lists, so some casting pain was required to avoid sparse complaints. (Keep in mind that the list_head structure does not mark ->next with __rcu.) Arnd's workaround was to copy the pointer to the local stack, casting it to an __rcu pointer, then use rcu_dereference_raw() to do the needed traversal of an RCU-protected pointer. This of course resulted in an extraneous load from the stack, which Patrick noticed in his performance work, and which motivated him to send the patches. Perhaps what I should do is create an rcu_dereference_nocheck() for use in list traversals, that omits the sparse checking. That should get rid of both the sparse warnings and the strange casts. The code in md probably needs to change in any case, as otherwise we are invoking rcu_dereference_whatever() on a full struct list_head rather than on a single pointer. Or am I missing something here? Finally getting back to this one... I switched to lockless_dereference() instead of rcu_dereference_raw(), and am running it through the testing gamut. Patrick, are you OK with this change? Paul, This sounds good to me. It should fix the performance issue (will check with my benchmark). I think for drivers/md/bitmap.c:next_active_rdev() the problem was fixed but do you know if it also fixed for net/netfilter/core.c:nf_hook_slow()? Thanks. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 09/12/2015 01:05 AM, Paul E. McKenney wrote: On Tue, May 19, 2015 at 03:07:25PM -0700, Paul E. McKenney wrote: On Mon, May 18, 2015 at 09:43:21AM -0400, Steven Rostedt wrote: On Mon, 18 May 2015 12:06:47 +1000 NeilBrownwrote: struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that "list_entry_rcu" before and after the patch is reading the same thing. No it isn't. Before the patch it is passed the address of the 'next' field. After the patch it is passed the contents of the 'next' field. Right. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? What needs fixing? I don't see anything broken. Maybe there is something in this "rculist patch" that I'm missing. Can you point me at it? Probably some debugging tool like sparse notices that the assignment isn't a true list entry and complains about it. In other words, I think the real fix is to fix the debugging tool to ignore this, because the code is correct, and this is a false positive failure, and is causing more harm than good, because people are sending out broken patches due to it. OK, finally did the history trawling that I should have done to begin with. Back in 2010, Arnd added the __rcu pointer checking in sparse. But the RCU list primitives were used on non-RCU-protected lists, so some casting pain was required to avoid sparse complaints. (Keep in mind that the list_head structure does not mark ->next with __rcu.) Arnd's workaround was to copy the pointer to the local stack, casting it to an __rcu pointer, then use rcu_dereference_raw() to do the needed traversal of an RCU-protected pointer. This of course resulted in an extraneous load from the stack, which Patrick noticed in his performance work, and which motivated him to send the patches. Perhaps what I should do is create an rcu_dereference_nocheck() for use in list traversals, that omits the sparse checking. That should get rid of both the sparse warnings and the strange casts. The code in md probably needs to change in any case, as otherwise we are invoking rcu_dereference_whatever() on a full struct list_head rather than on a single pointer. Or am I missing something here? Finally getting back to this one... I switched to lockless_dereference() instead of rcu_dereference_raw(), and am running it through the testing gamut. Patrick, are you OK with this change? Paul, This sounds good to me. It should fix the performance issue (will check with my benchmark). I think for drivers/md/bitmap.c:next_active_rdev() the problem was fixed but do you know if it also fixed for net/netfilter/core.c:nf_hook_slow()? Thanks. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 05/18/2015 03:53 PM, Patrick Marlier wrote: On Mon, May 18, 2015 at 4:06 AM, NeilBrown wrote: On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier wrote: On 05/13/2015 04:58 AM, NeilBrown wrote: On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt wrote: On Tue, 12 May 2015 15:46:26 -0700 "Paul E. McKenney" wrote: What comes after this is: list_for_each_entry_continue_rcu(rdev, >disks, same_set) { if (rdev->raid_disk >= 0 && Now the original code had: rdev = list_entry_rcu(>disks, struct md_rdev, same_set); Where >disks would return the address of the disks field of mddev which is a list head. Then it would get the 'same_set' offset, which is 0, and rdev is pointing to a makeshift md_rdev struct. But it isn't used, as the list_for_each_entry_continue_rcu() has: #define list_for_each_entry_continue_rcu(pos, head, member)\ for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \ >member != (head);\ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) Thus the first use of pos is pos->member.next or: mddev->disks.next But now you converted it to rdev = mddev->disks.next, which means the first use is: pos = mddev->disks.next->next I think you are skipping the first element here. struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that "list_entry_rcu" before and after the patch is reading the same thing. No it isn't. Before the patch it is passed the address of the 'next' field. After the patch it is passed the contents of the 'next' field. Here I meant "list_entry_rcu" (in include/linux/rculist.h) not the change to drivers/md/bitmap.c. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? What needs fixing? I don't see anything broken. Maybe there is something in this "rculist patch" that I'm missing. Can you point me at it? Do not apply the patch on drivers/md/bitmap.c but only on include/linux/rculist.h and you will see that the compilation fails. Here an example of the compilation error in drivers/md/bitmap.c In file included from include/linux/sched.h:17:0, from include/linux/blkdev.h:4, from drivers/md/bitmap.c:18: drivers/md/bitmap.c: In function ‘next_active_rdev’: include/linux/compiler.h:469:24: error: lvalue required as unary ‘&’ operand (volatile typeof(x) *)&(x); }) ^ include/linux/kernel.h:800:49: note: in definition of macro ‘container_of’ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ include/linux/compiler.h:470:26: note: in expansion of macro ‘__ACCESS_ONCE’ #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) ^ include/linux/rcupdate.h:630:26: note: in expansion of macro ‘ACCESS_ONCE’ typeof(p) _p1 = ACCESS_ONCE(p); \ ^ include/linux/rcupdate.h:587:48: note: in expansion of macro ‘lockless_dereference’ typeof(*p) *p1 = (typeof(*p) *__force)lockless_dereference(p); \ ^ include/linux/rcupdate.h:723:2: note: in expansion of macro ‘__rcu_dereference_check’ __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu) ^ include/linux/rcupdate.h:746:32: note: in expansion of macro ‘rcu_dereference_check’ #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/ ^ include/linux/rculist.h:251:28: note: in expansion of macro ‘rcu_dereference_raw’ container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member); \ ^ drivers/md/bitmap.c:184:10: note: in expansion of macro ‘list_entry_rcu’ rdev = list_entry_rcu(>disks, struct md_rdev, same_set); ^ Thanks. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On Mon, May 18, 2015 at 4:06 AM, NeilBrown wrote: > On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier > wrote: >> On 05/13/2015 04:58 AM, NeilBrown wrote: >> > On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt >> > wrote: >> >> On Tue, 12 May 2015 15:46:26 -0700 >> >> "Paul E. McKenney" wrote: >> >> What comes after this is: >> >> >> >>list_for_each_entry_continue_rcu(rdev, >disks, same_set) { >> >>if (rdev->raid_disk >= 0 && >> >> >> >> Now the original code had: >> >> >> >>rdev = list_entry_rcu(>disks, struct md_rdev, same_set); >> >> >> >> Where >disks would return the address of the disks field of >> >> mddev which is a list head. Then it would get the 'same_set' offset, >> >> which is 0, and rdev is pointing to a makeshift md_rdev struct. But it >> >> isn't used, as the list_for_each_entry_continue_rcu() has: >> >> >> >> #define list_for_each_entry_continue_rcu(pos, head, member) >> >> \ >> >>for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \ >> >> >member != (head);\ >> >> pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) >> >> >> >> Thus the first use of pos is pos->member.next or: >> >> >> >>mddev->disks.next >> >> >> >> But now you converted it to rdev = mddev->disks.next, which means the >> >> first use is: >> >> >> >>pos = mddev->disks.next->next >> >> >> >> I think you are skipping the first element here. >> >> >> struct mddev { >> ... >> struct list_headdisks; >> ...} >> >> struct list_head { >> struct list_head *next, *prev; >> }; >> >> The tricky thing is that "list_entry_rcu" before and after the patch is >> reading the same thing. > > No it isn't. > Before the patch it is passed the address of the 'next' field. After the > patch it is passed the contents of the 'next' field. Here I meant "list_entry_rcu" (in include/linux/rculist.h) not the change to drivers/md/bitmap.c. >> However in your case, the change I proposed is probably wrong I trust >> you on this side. :) What's your proposal to fix it with the rculist patch? > > What needs fixing? I don't see anything broken. > > Maybe there is something in this "rculist patch" that I'm missing. Can you > point me at it? Do not apply the patch on drivers/md/bitmap.c but only on include/linux/rculist.h and you will see that the compilation fails. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On Mon, May 18, 2015 at 4:06 AM, NeilBrown ne...@suse.de wrote: On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier patrick.marl...@gmail.com wrote: On 05/13/2015 04:58 AM, NeilBrown wrote: On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org wrote: On Tue, 12 May 2015 15:46:26 -0700 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: What comes after this is: list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) { if (rdev-raid_disk = 0 Now the original code had: rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); Where mddev-disks would return the address of the disks field of mddev which is a list head. Then it would get the 'same_set' offset, which is 0, and rdev is pointing to a makeshift md_rdev struct. But it isn't used, as the list_for_each_entry_continue_rcu() has: #define list_for_each_entry_continue_rcu(pos, head, member) \ for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \ pos-member != (head);\ pos = list_entry_rcu(pos-member.next, typeof(*pos), member)) Thus the first use of pos is pos-member.next or: mddev-disks.next But now you converted it to rdev = mddev-disks.next, which means the first use is: pos = mddev-disks.next-next I think you are skipping the first element here. struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that list_entry_rcu before and after the patch is reading the same thing. No it isn't. Before the patch it is passed the address of the 'next' field. After the patch it is passed the contents of the 'next' field. Here I meant list_entry_rcu (in include/linux/rculist.h) not the change to drivers/md/bitmap.c. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? What needs fixing? I don't see anything broken. Maybe there is something in this rculist patch that I'm missing. Can you point me at it? Do not apply the patch on drivers/md/bitmap.c but only on include/linux/rculist.h and you will see that the compilation fails. -- Pat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 05/18/2015 03:53 PM, Patrick Marlier wrote: On Mon, May 18, 2015 at 4:06 AM, NeilBrown ne...@suse.de wrote: On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier patrick.marl...@gmail.com wrote: On 05/13/2015 04:58 AM, NeilBrown wrote: On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org wrote: On Tue, 12 May 2015 15:46:26 -0700 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: What comes after this is: list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) { if (rdev-raid_disk = 0 Now the original code had: rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); Where mddev-disks would return the address of the disks field of mddev which is a list head. Then it would get the 'same_set' offset, which is 0, and rdev is pointing to a makeshift md_rdev struct. But it isn't used, as the list_for_each_entry_continue_rcu() has: #define list_for_each_entry_continue_rcu(pos, head, member)\ for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \ pos-member != (head);\ pos = list_entry_rcu(pos-member.next, typeof(*pos), member)) Thus the first use of pos is pos-member.next or: mddev-disks.next But now you converted it to rdev = mddev-disks.next, which means the first use is: pos = mddev-disks.next-next I think you are skipping the first element here. struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that list_entry_rcu before and after the patch is reading the same thing. No it isn't. Before the patch it is passed the address of the 'next' field. After the patch it is passed the contents of the 'next' field. Here I meant list_entry_rcu (in include/linux/rculist.h) not the change to drivers/md/bitmap.c. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? What needs fixing? I don't see anything broken. Maybe there is something in this rculist patch that I'm missing. Can you point me at it? Do not apply the patch on drivers/md/bitmap.c but only on include/linux/rculist.h and you will see that the compilation fails. Here an example of the compilation error in drivers/md/bitmap.c In file included from include/linux/sched.h:17:0, from include/linux/blkdev.h:4, from drivers/md/bitmap.c:18: drivers/md/bitmap.c: In function ‘next_active_rdev’: include/linux/compiler.h:469:24: error: lvalue required as unary ‘’ operand (volatile typeof(x) *)(x); }) ^ include/linux/kernel.h:800:49: note: in definition of macro ‘container_of’ const typeof( ((type *)0)-member ) *__mptr = (ptr); \ ^ include/linux/compiler.h:470:26: note: in expansion of macro ‘__ACCESS_ONCE’ #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) ^ include/linux/rcupdate.h:630:26: note: in expansion of macro ‘ACCESS_ONCE’ typeof(p) _p1 = ACCESS_ONCE(p); \ ^ include/linux/rcupdate.h:587:48: note: in expansion of macro ‘lockless_dereference’ typeof(*p) *p1 = (typeof(*p) *__force)lockless_dereference(p); \ ^ include/linux/rcupdate.h:723:2: note: in expansion of macro ‘__rcu_dereference_check’ __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu) ^ include/linux/rcupdate.h:746:32: note: in expansion of macro ‘rcu_dereference_check’ #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/ ^ include/linux/rculist.h:251:28: note: in expansion of macro ‘rcu_dereference_raw’ container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member); \ ^ drivers/md/bitmap.c:184:10: note: in expansion of macro ‘list_entry_rcu’ rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); ^ Thanks. -- Pat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 05/13/2015 04:58 AM, NeilBrown wrote: On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt wrote: On Tue, 12 May 2015 15:46:26 -0700 "Paul E. McKenney" wrote: From: Patrick Marlier Signed-off-by: Patrick Marlier Signed-off-by: Paul E. McKenney --- drivers/md/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 2bc56e2a3526..32901772e4ee 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde rcu_read_lock(); if (rdev == NULL) /* start at the beginning */ - rdev = list_entry_rcu(>disks, struct md_rdev, same_set); + rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, same_set); Hmm, this changes the semantics. The original code looks nasty, I first thought it was broken, but it seems to work out of sheer luck (or clever hack) Definitely a clever hack - no question of "luck" here :-) It might makes sense to change it to use list_for_each_entry_from_rcu() if (rdev == NULL) rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, same_set); else { rdev_dec_pending(rdev, mddev); rdev = list_next_entry_rcu(rdev->same_set.next, struct md_rdev, same_set); } list_for_each_entry_from_rcu(rdev, ) but there isn't a "list_next_entry_rcu" Also, it would have been polity to at least 'cc' them Maintainer of this code in the original patch - no? Sure my bad. I hesitated to CC maintainers. I was almost sure that it will be rejected so I wanted to avoid noise. Thanks, NeilBrown else { /* release the previous rdev and start from there. */ rdev_dec_pending(rdev, mddev); What comes after this is: list_for_each_entry_continue_rcu(rdev, >disks, same_set) { if (rdev->raid_disk >= 0 && Now the original code had: rdev = list_entry_rcu(>disks, struct md_rdev, same_set); Where >disks would return the address of the disks field of mddev which is a list head. Then it would get the 'same_set' offset, which is 0, and rdev is pointing to a makeshift md_rdev struct. But it isn't used, as the list_for_each_entry_continue_rcu() has: #define list_for_each_entry_continue_rcu(pos, head, member) \ for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \ >member != (head); \ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) Thus the first use of pos is pos->member.next or: mddev->disks.next But now you converted it to rdev = mddev->disks.next, which means the first use is: pos = mddev->disks.next->next I think you are skipping the first element here. struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that "list_entry_rcu" before and after the patch is reading the same thing. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? PS: In the rculist patch I proposed, I avoid the store and the atomic reload in the stack variable __ptr. (yeap, the rcu_dereference_raw/ACCESS_ONCE is a bit confusing because it implicitly do & on the parameter). Thanks. -- Pat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage
On 05/13/2015 04:58 AM, NeilBrown wrote: On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org wrote: On Tue, 12 May 2015 15:46:26 -0700 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: From: Patrick Marlier patrick.marl...@gmail.com Signed-off-by: Patrick Marlier patrick.marl...@gmail.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com --- drivers/md/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 2bc56e2a3526..32901772e4ee 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde rcu_read_lock(); if (rdev == NULL) /* start at the beginning */ - rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); + rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, same_set); Hmm, this changes the semantics. The original code looks nasty, I first thought it was broken, but it seems to work out of sheer luck (or clever hack) Definitely a clever hack - no question of luck here :-) It might makes sense to change it to use list_for_each_entry_from_rcu() if (rdev == NULL) rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, same_set); else { rdev_dec_pending(rdev, mddev); rdev = list_next_entry_rcu(rdev-same_set.next, struct md_rdev, same_set); } list_for_each_entry_from_rcu(rdev, ) but there isn't a list_next_entry_rcu Also, it would have been polity to at least 'cc' them Maintainer of this code in the original patch - no? Sure my bad. I hesitated to CC maintainers. I was almost sure that it will be rejected so I wanted to avoid noise. Thanks, NeilBrown else { /* release the previous rdev and start from there. */ rdev_dec_pending(rdev, mddev); What comes after this is: list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) { if (rdev-raid_disk = 0 Now the original code had: rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); Where mddev-disks would return the address of the disks field of mddev which is a list head. Then it would get the 'same_set' offset, which is 0, and rdev is pointing to a makeshift md_rdev struct. But it isn't used, as the list_for_each_entry_continue_rcu() has: #define list_for_each_entry_continue_rcu(pos, head, member) \ for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \ pos-member != (head); \ pos = list_entry_rcu(pos-member.next, typeof(*pos), member)) Thus the first use of pos is pos-member.next or: mddev-disks.next But now you converted it to rdev = mddev-disks.next, which means the first use is: pos = mddev-disks.next-next I think you are skipping the first element here. struct mddev { ... struct list_headdisks; ...} struct list_head { struct list_head *next, *prev; }; The tricky thing is that list_entry_rcu before and after the patch is reading the same thing. However in your case, the change I proposed is probably wrong I trust you on this side. :) What's your proposal to fix it with the rculist patch? PS: In the rculist patch I proposed, I avoid the store and the atomic reload in the stack variable __ptr. (yeap, the rcu_dereference_raw/ACCESS_ONCE is a bit confusing because it implicitly do on the parameter). Thanks. -- Pat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
Strange I don't get any conflict. Maybe due to my email client so I attached the patches to this email. Thanks, -- Patrick On Sat, Mar 28, 2015 at 11:47 AM, Paul E. McKenney wrote: > On Sat, Mar 28, 2015 at 03:42:10AM -0700, Paul E. McKenney wrote: >> On Wed, Mar 25, 2015 at 04:01:24PM +0100, Patrick Marlier wrote: >> > On 03/25/2015 03:30 PM, Paul E. McKenney wrote: >> > >On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote: >> > >>Change to read effectively ptr with rcu_dereference_raw and not the >> > >>__ptr variable on the stack. >> > >> >> > >>Signed-off-by: Patrick Marlier >> > >Avoiding an extra load could be worthwhile in a number of situations, >> > >agreed. >> > Not only a load. It adds a store and a load on the stack and I think >> > this creates a dependency in the processor pipeline. >> > >> > >However, won't this change cause sparse to complain if invoked on a >> > >non-RCU-protected pointer? The ability to use list-RCU API >> > >members on both RCU and non-RCU pointers was one of the points >> > >of the previous commit, right? >> > Probably we can put back the cast but I am not familiar enough with >> > the RCU API. >> > >> > Also, the problem here is that you probably want ACCESS_ONCE to >> > happen on the content of 'ptr' and not on the stack variable >> > '__ptr'. >> > >> > (you have to follow this chain: rcu_dereference_raw -> >> > rcu_dereference_check -> __rcu_dereference_check -> >> > lockless_dereference -> ACCESS_ONCE) >> > >> > #define lockless_dereference(p) \ >> > ({ \ >> > typeof(p) _p1 = ACCESS_ONCE(p); \ >> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ >> > (_p1); \ >> > }) >> > >> > #define __ACCESS_ONCE(x) ({ \ >> > __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \ >> > (volatile typeof(x) *)&(x); }) >> > #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) >> > >> > Note that ACCESS_ONCE is doing "&" on x. >> > >> > IMHO, I would prefer saving some useless instructions for better >> > performance rather than giving too much flexibility on the API (also >> > pretty sure the cast can be still done). >> >> OK, what I am going to do is to apply your patches for testing purposes. >> If there are no complaints, they will likely go into v4.3 or thereabouts. > > Except that I hit conflicts. Could you please rebase to rcu/dev at > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git? > > Thanx, Paul > From 8ac818d418068105623e43bbd289d9553c182e6c Mon Sep 17 00:00:00 2001 From: Patrick Marlier Date: Tue, 24 Mar 2015 11:16:55 +0100 Subject: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier --- include/linux/rculist.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index a18b16f..9d9baea 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list, * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ -({ \ - typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ - container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \ -}) + container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member) /** * Where are list_empty_rcu() and list_first_entry_rcu()? -- 2.1.0 From 3ab5f342939f768b693708bd32ef3f350af3b5a6 Mon Sep 17 00:00:00 2001 From: Patrick Marlier Date: Tue, 24 Mar 2015 11:21:05 +0100 Subject: [PATCH 2/3] netfilter: fix list_entry_rcu usage. Signed-off-by: Patrick Marlier --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index fea9ef5..05bd311 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list); + elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); next_hook: verdict = nf_iterate(_hooks[pf][hook], skb, hook, indev, outdev, , okfn, hook_thresh); --
Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
Strange I don't get any conflict. Maybe due to my email client so I attached the patches to this email. Thanks, -- Patrick On Sat, Mar 28, 2015 at 11:47 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Sat, Mar 28, 2015 at 03:42:10AM -0700, Paul E. McKenney wrote: On Wed, Mar 25, 2015 at 04:01:24PM +0100, Patrick Marlier wrote: On 03/25/2015 03:30 PM, Paul E. McKenney wrote: On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote: Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com Avoiding an extra load could be worthwhile in a number of situations, agreed. Not only a load. It adds a store and a load on the stack and I think this creates a dependency in the processor pipeline. However, won't this change cause sparse to complain if invoked on a non-RCU-protected pointer? The ability to use list-RCU API members on both RCU and non-RCU pointers was one of the points of the previous commit, right? Probably we can put back the cast but I am not familiar enough with the RCU API. Also, the problem here is that you probably want ACCESS_ONCE to happen on the content of 'ptr' and not on the stack variable '__ptr'. (you have to follow this chain: rcu_dereference_raw - rcu_dereference_check - __rcu_dereference_check - lockless_dereference - ACCESS_ONCE) #define lockless_dereference(p) \ ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ (_p1); \ }) #define __ACCESS_ONCE(x) ({ \ __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \ (volatile typeof(x) *)(x); }) #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) Note that ACCESS_ONCE is doing on x. IMHO, I would prefer saving some useless instructions for better performance rather than giving too much flexibility on the API (also pretty sure the cast can be still done). OK, what I am going to do is to apply your patches for testing purposes. If there are no complaints, they will likely go into v4.3 or thereabouts. Except that I hit conflicts. Could you please rebase to rcu/dev at git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git? Thanx, Paul From 8ac818d418068105623e43bbd289d9553c182e6c Mon Sep 17 00:00:00 2001 From: Patrick Marlier patrick.marl...@gmail.com Date: Tue, 24 Mar 2015 11:16:55 +0100 Subject: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- include/linux/rculist.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index a18b16f..9d9baea 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list, * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ -({ \ - typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ - container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \ -}) + container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member) /** * Where are list_empty_rcu() and list_first_entry_rcu()? -- 2.1.0 From 3ab5f342939f768b693708bd32ef3f350af3b5a6 Mon Sep 17 00:00:00 2001 From: Patrick Marlier patrick.marl...@gmail.com Date: Tue, 24 Mar 2015 11:21:05 +0100 Subject: [PATCH 2/3] netfilter: fix list_entry_rcu usage. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index fea9ef5..05bd311 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list); + elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); next_hook: verdict = nf_iterate(nf_hooks[pf][hook], skb, hook, indev, outdev, elem, okfn, hook_thresh); -- 2.1.0 From 84f0428e2c9172692aba727636a643efb6994752 Mon Sep 17 00:00:00 2001 From: Patrick Marlier patrick.marl...@gmail.com Date: Tue, 24 Mar 2015 11:22:10 +0100 Subject: [PATCH 3/3] md/bitmap: fix list_entry_rcu usage. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- drivers/md/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 3a57679..ed00e46 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -181,7 +181,7
Re: [PATCH 2/3] netfilter: fix list_entry_rcu usage.
On 03/25/2015 03:36 PM, Paul E. McKenney wrote: On Tue, Mar 24, 2015 at 11:31:44AM +0100, Patrick Marlier wrote: >Signed-off-by: Patrick Marlier >--- > net/netfilter/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/net/netfilter/core.c b/net/netfilter/core.c >index fea9ef5..05bd311 100644 >--- a/net/netfilter/core.c >+++ b/net/netfilter/core.c >@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, >struct sk_buff *skb, > /* We may already have this, but read-locks nest anyway */ > rcu_read_lock(); > >- elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list); >+ elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); And this departs from the list_entry() API. Is this really a good idea? No opinion on that but AFAIK there are only 2 spots in the whole kernel source where the ".next" is not used with list_entry_rcu. -- Patrick Marlier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
On 03/25/2015 03:30 PM, Paul E. McKenney wrote: On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote: Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier Avoiding an extra load could be worthwhile in a number of situations, agreed. Not only a load. It adds a store and a load on the stack and I think this creates a dependency in the processor pipeline. However, won't this change cause sparse to complain if invoked on a non-RCU-protected pointer? The ability to use list-RCU API members on both RCU and non-RCU pointers was one of the points of the previous commit, right? Probably we can put back the cast but I am not familiar enough with the RCU API. Also, the problem here is that you probably want ACCESS_ONCE to happen on the content of 'ptr' and not on the stack variable '__ptr'. (you have to follow this chain: rcu_dereference_raw -> rcu_dereference_check -> __rcu_dereference_check -> lockless_dereference -> ACCESS_ONCE) #define lockless_dereference(p) \ ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ (_p1); \ }) #define __ACCESS_ONCE(x) ({ \ __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \ (volatile typeof(x) *)&(x); }) #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) Note that ACCESS_ONCE is doing "&" on x. IMHO, I would prefer saving some useless instructions for better performance rather than giving too much flexibility on the API (also pretty sure the cast can be still done). -- Patrick Marlier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] netfilter: fix list_entry_rcu usage.
On 03/25/2015 03:36 PM, Paul E. McKenney wrote: On Tue, Mar 24, 2015 at 11:31:44AM +0100, Patrick Marlier wrote: Signed-off-by: Patrick Marlierpatrick.marl...@gmail.com --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index fea9ef5..05bd311 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list); + elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); And this departs from the list_entry() API. Is this really a good idea? No opinion on that but AFAIK there are only 2 spots in the whole kernel source where the .next is not used with list_entry_rcu. -- Patrick Marlier -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
On 03/25/2015 03:30 PM, Paul E. McKenney wrote: On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote: Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com Avoiding an extra load could be worthwhile in a number of situations, agreed. Not only a load. It adds a store and a load on the stack and I think this creates a dependency in the processor pipeline. However, won't this change cause sparse to complain if invoked on a non-RCU-protected pointer? The ability to use list-RCU API members on both RCU and non-RCU pointers was one of the points of the previous commit, right? Probably we can put back the cast but I am not familiar enough with the RCU API. Also, the problem here is that you probably want ACCESS_ONCE to happen on the content of 'ptr' and not on the stack variable '__ptr'. (you have to follow this chain: rcu_dereference_raw - rcu_dereference_check - __rcu_dereference_check - lockless_dereference - ACCESS_ONCE) #define lockless_dereference(p) \ ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ (_p1); \ }) #define __ACCESS_ONCE(x) ({ \ __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \ (volatile typeof(x) *)(x); }) #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) Note that ACCESS_ONCE is doing on x. IMHO, I would prefer saving some useless instructions for better performance rather than giving too much flexibility on the API (also pretty sure the cast can be still done). -- Patrick Marlier -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] md/bitmap: fix list_entry_rcu usage.
Signed-off-by: Patrick Marlier --- drivers/md/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 3a57679..ed00e46 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde rcu_read_lock(); if (rdev == NULL) /* start at the beginning */ - rdev = list_entry_rcu(>disks, struct md_rdev, same_set); + rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, same_set); else { /* release the previous rdev and start from there. */ rdev_dec_pending(rdev, mddev); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier --- include/linux/rculist.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index a18b16f..9d9baea 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list, * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ -({ \ - typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ - container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \ -}) + container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member) /** * Where are list_empty_rcu() and list_first_entry_rcu()? -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] netfilter: fix list_entry_rcu usage.
Signed-off-by: Patrick Marlier --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index fea9ef5..05bd311 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list); + elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); next_hook: verdict = nf_iterate(_hooks[pf][hook], skb, hook, indev, outdev, , okfn, hook_thresh); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] rculist: Fix list_entry_rcu to read ptr with, rcu_dereference_raw
Hi Paul and Josh, While running some benchmarks with rcu, I noticed that rculist had a performance issue compared to the non-rcu list. Having a closer look at the rlulist, I found that there is a bad usage of rcu_dereference_raw in list_entry_rcu(). Indeed, "typeof (*ptr) *__ptr = ptr;" reads ptr and "rcu_dereference_raw(__ptr)" forces read on the stack variable __ptr. However, ptr should be read with rcu_dereference_raw not the __ptr. This was introduced some years ago with commit 67bdbffd (2010-02-25). I am proposing to revert this part of the patch but probably this can be fixed differently. Fixing this showed 2 problems in the usage of list_entry_rcu. Since it touches other parts of kernel, I guess it requires ack from other maintainers. Thanks. PS: I will Cc the others maintainers when somebody confirms the problem. Signed-off-by: Patrick Marlier Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Arnd Bergmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] md/bitmap: fix list_entry_rcu usage.
Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- drivers/md/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 3a57679..ed00e46 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde rcu_read_lock(); if (rdev == NULL) /* start at the beginning */ - rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set); + rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, same_set); else { /* release the previous rdev and start from there. */ rdev_dec_pending(rdev, mddev); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw
Change to read effectively ptr with rcu_dereference_raw and not the __ptr variable on the stack. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- include/linux/rculist.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index a18b16f..9d9baea 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list, * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ -({ \ - typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ - container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \ -}) + container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member) /** * Where are list_empty_rcu() and list_first_entry_rcu()? -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] netfilter: fix list_entry_rcu usage.
Signed-off-by: Patrick Marlier patrick.marl...@gmail.com --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index fea9ef5..05bd311 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list); + elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list); next_hook: verdict = nf_iterate(nf_hooks[pf][hook], skb, hook, indev, outdev, elem, okfn, hook_thresh); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] rculist: Fix list_entry_rcu to read ptr with, rcu_dereference_raw
Hi Paul and Josh, While running some benchmarks with rcu, I noticed that rculist had a performance issue compared to the non-rcu list. Having a closer look at the rlulist, I found that there is a bad usage of rcu_dereference_raw in list_entry_rcu(). Indeed, typeof (*ptr) *__ptr = ptr; reads ptr and rcu_dereference_raw(__ptr) forces read on the stack variable __ptr. However, ptr should be read with rcu_dereference_raw not the __ptr. This was introduced some years ago with commit 67bdbffd (2010-02-25). I am proposing to revert this part of the patch but probably this can be fixed differently. Fixing this showed 2 problems in the usage of list_entry_rcu. Since it touches other parts of kernel, I guess it requires ack from other maintainers. Thanks. PS: I will Cc the others maintainers when somebody confirms the problem. Signed-off-by: Patrick Marlier patrick.marl...@gmail.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Josh Triplett j...@joshtriplett.org Cc: Arnd Bergmann a...@arndb.de -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/