Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()

2015-10-28 Thread Patrick Marlier



On 10/28/2015 09:33 AM, Ingo Molnar wrote:


* Tejun Heo  wrote:


Subject: writeback: don't use list_entry_rcu() for pointer offsetting in 
bdi_split_work_to_wbs()

bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
walk @bdi->wb_list.  To set up the initial iteration condition, it
uses list_entry_rcu() to calculate the entry pointer corresponding to
the list head; however, this isn't an actual RCU dereference and using
list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
change because it was feeding an non-lvalue pointer into the macro.

Don't use the RCU variant for simple pointer offsetting.  Use
list_entry() instead.

Signed-off-by: Tejun Heo 
---
  fs/fs-writeback.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599..7378169 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info 
*bdi,
  bool skip_if_busy)
  {
struct bdi_writeback *last_wb = NULL;
-   struct bdi_writeback *wb = list_entry_rcu(>wb_list,
-   struct bdi_writeback, bdi_node);
+   struct bdi_writeback *wb = list_entry(>wb_list,
+ struct bdi_writeback, bdi_node);

might_sleep();


Any objections against me applying this fix to tip:core/rcu so that I can push 
the
recent RCU changes towards linux-next without triggering a build failure?


No objection on my side but probably you are waiting for an ack from 
somebody else.

--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()

2015-10-28 Thread Patrick Marlier



On 10/28/2015 09:33 AM, Ingo Molnar wrote:


* Tejun Heo  wrote:


Subject: writeback: don't use list_entry_rcu() for pointer offsetting in 
bdi_split_work_to_wbs()

bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
walk @bdi->wb_list.  To set up the initial iteration condition, it
uses list_entry_rcu() to calculate the entry pointer corresponding to
the list head; however, this isn't an actual RCU dereference and using
list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
change because it was feeding an non-lvalue pointer into the macro.

Don't use the RCU variant for simple pointer offsetting.  Use
list_entry() instead.

Signed-off-by: Tejun Heo 
---
  fs/fs-writeback.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599..7378169 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info 
*bdi,
  bool skip_if_busy)
  {
struct bdi_writeback *last_wb = NULL;
-   struct bdi_writeback *wb = list_entry_rcu(>wb_list,
-   struct bdi_writeback, bdi_node);
+   struct bdi_writeback *wb = list_entry(>wb_list,
+ struct bdi_writeback, bdi_node);

might_sleep();


Any objections against me applying this fix to tip:core/rcu so that I can push 
the
recent RCU changes towards linux-next without triggering a build failure?


No objection on my side but probably you are waiting for an ack from 
somebody else.

--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-09-23 Thread Patrick Marlier


On 09/22/2015 10:50 PM, Paul E. McKenney wrote:

On Sun, Sep 13, 2015 at 09:10:24AM -0700, Paul E. McKenney wrote:

On Sun, Sep 13, 2015 at 12:06:33PM +0200, Patrick Marlier wrote:


[ . . . ]


Paul,

This sounds good to me. It should fix the performance issue (will
check with my benchmark).


Thank you, looking forward to seeing the results!


Just following up -- how is the benchmarking going?


Note that in my module I am using the kernel version 3.16.0-31 (I ported 
your change).
Here the results of my benchmark that tests rculist in the case of read 
only.


# 1st column : The number of threads
# 2nd : ops/s the original version
# 3rd : ops/s your version with lockless_dereference
1 883923 1747554
2 1741441 3543062
3 2462360 5103647
4 3437273 7176608
6 5155803 9812348
8 6718111 13330050
10 8519227 17458294
12 9773632 20298897
14 11555198 23191424
16 11643264 25125712

I get same performance with my patch.
So indeed it fixes the performance problem I was seeing.

Thanks a lot!
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-09-23 Thread Patrick Marlier


On 09/22/2015 10:50 PM, Paul E. McKenney wrote:

On Sun, Sep 13, 2015 at 09:10:24AM -0700, Paul E. McKenney wrote:

On Sun, Sep 13, 2015 at 12:06:33PM +0200, Patrick Marlier wrote:


[ . . . ]


Paul,

This sounds good to me. It should fix the performance issue (will
check with my benchmark).


Thank you, looking forward to seeing the results!


Just following up -- how is the benchmarking going?


Note that in my module I am using the kernel version 3.16.0-31 (I ported 
your change).
Here the results of my benchmark that tests rculist in the case of read 
only.


# 1st column : The number of threads
# 2nd : ops/s the original version
# 3rd : ops/s your version with lockless_dereference
1 883923 1747554
2 1741441 3543062
3 2462360 5103647
4 3437273 7176608
6 5155803 9812348
8 6718111 13330050
10 8519227 17458294
12 9773632 20298897
14 11555198 23191424
16 11643264 25125712

I get same performance with my patch.
So indeed it fixes the performance problem I was seeing.

Thanks a lot!
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-09-13 Thread Patrick Marlier


On 09/12/2015 01:05 AM, Paul E. McKenney wrote:

On Tue, May 19, 2015 at 03:07:25PM -0700, Paul E. McKenney wrote:

On Mon, May 18, 2015 at 09:43:21AM -0400, Steven Rostedt wrote:

On Mon, 18 May 2015 12:06:47 +1000
NeilBrown  wrote:



struct mddev {
...
struct list_headdisks;
...}

struct list_head {
  struct list_head *next, *prev;
};

The tricky thing is that "list_entry_rcu" before and after the patch is
reading the same thing.


No it isn't.
Before the patch it is passed the address of the 'next' field.  After the
patch it is passed the contents of the 'next' field.


Right.






However in your case, the change I proposed is probably wrong I trust
you on this side. :) What's your proposal to fix it with the rculist patch?


What needs fixing?  I don't see anything broken.

Maybe there is something in this "rculist patch" that I'm missing.  Can you
point me at it?



Probably some debugging tool like sparse notices that the assignment
isn't a true list entry and complains about it. In other words, I think
the real fix is to fix the debugging tool to ignore this, because the
code is correct, and this is a false positive failure, and is causing
more harm than good, because people are sending out broken patches due
to it.


OK, finally did the history trawling that I should have done to begin with.

Back in 2010, Arnd added the __rcu pointer checking in sparse.
But the RCU list primitives were used on non-RCU-protected lists, so
some casting pain was required to avoid sparse complaints.  (Keep in
mind that the list_head structure does not mark ->next with __rcu.)
Arnd's workaround was to copy the pointer to the local stack, casting
it to an __rcu pointer, then use rcu_dereference_raw() to do the needed
traversal of an RCU-protected pointer.

This of course resulted in an extraneous load from the stack, which
Patrick noticed in his performance work, and which motivated him to send
the patches.

Perhaps what I should do is create an rcu_dereference_nocheck() for use
in list traversals, that omits the sparse checking.  That should get rid
of both the sparse warnings and the strange casts.

The code in md probably needs to change in any case, as otherwise we are
invoking rcu_dereference_whatever() on a full struct list_head rather
than on a single pointer.  Or am I missing something here?


Finally getting back to this one...

I switched to lockless_dereference() instead of rcu_dereference_raw(),
and am running it through the testing gamut.  Patrick, are you OK with
this change?


Paul,

This sounds good to me. It should fix the performance issue (will check 
with my benchmark).
I think for drivers/md/bitmap.c:next_active_rdev() the problem was fixed 
but do you know if it also fixed for net/netfilter/core.c:nf_hook_slow()?


Thanks.
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-09-13 Thread Patrick Marlier


On 09/12/2015 01:05 AM, Paul E. McKenney wrote:

On Tue, May 19, 2015 at 03:07:25PM -0700, Paul E. McKenney wrote:

On Mon, May 18, 2015 at 09:43:21AM -0400, Steven Rostedt wrote:

On Mon, 18 May 2015 12:06:47 +1000
NeilBrown  wrote:



struct mddev {
...
struct list_headdisks;
...}

struct list_head {
  struct list_head *next, *prev;
};

The tricky thing is that "list_entry_rcu" before and after the patch is
reading the same thing.


No it isn't.
Before the patch it is passed the address of the 'next' field.  After the
patch it is passed the contents of the 'next' field.


Right.






However in your case, the change I proposed is probably wrong I trust
you on this side. :) What's your proposal to fix it with the rculist patch?


What needs fixing?  I don't see anything broken.

Maybe there is something in this "rculist patch" that I'm missing.  Can you
point me at it?



Probably some debugging tool like sparse notices that the assignment
isn't a true list entry and complains about it. In other words, I think
the real fix is to fix the debugging tool to ignore this, because the
code is correct, and this is a false positive failure, and is causing
more harm than good, because people are sending out broken patches due
to it.


OK, finally did the history trawling that I should have done to begin with.

Back in 2010, Arnd added the __rcu pointer checking in sparse.
But the RCU list primitives were used on non-RCU-protected lists, so
some casting pain was required to avoid sparse complaints.  (Keep in
mind that the list_head structure does not mark ->next with __rcu.)
Arnd's workaround was to copy the pointer to the local stack, casting
it to an __rcu pointer, then use rcu_dereference_raw() to do the needed
traversal of an RCU-protected pointer.

This of course resulted in an extraneous load from the stack, which
Patrick noticed in his performance work, and which motivated him to send
the patches.

Perhaps what I should do is create an rcu_dereference_nocheck() for use
in list traversals, that omits the sparse checking.  That should get rid
of both the sparse warnings and the strange casts.

The code in md probably needs to change in any case, as otherwise we are
invoking rcu_dereference_whatever() on a full struct list_head rather
than on a single pointer.  Or am I missing something here?


Finally getting back to this one...

I switched to lockless_dereference() instead of rcu_dereference_raw(),
and am running it through the testing gamut.  Patrick, are you OK with
this change?


Paul,

This sounds good to me. It should fix the performance issue (will check 
with my benchmark).
I think for drivers/md/bitmap.c:next_active_rdev() the problem was fixed 
but do you know if it also fixed for net/netfilter/core.c:nf_hook_slow()?


Thanks.
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-18 Thread Patrick Marlier


On 05/18/2015 03:53 PM, Patrick Marlier wrote:

On Mon, May 18, 2015 at 4:06 AM, NeilBrown  wrote:

On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier
 wrote:

On 05/13/2015 04:58 AM, NeilBrown wrote:

On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt  wrote:

On Tue, 12 May 2015 15:46:26 -0700
"Paul E. McKenney"  wrote:
What comes after this is:

list_for_each_entry_continue_rcu(rdev, >disks, same_set) {
if (rdev->raid_disk >= 0 &&

Now the original code had:

rdev = list_entry_rcu(>disks, struct md_rdev, same_set);

Where >disks would return the address of the disks field of
mddev which is a list head. Then it would get the 'same_set' offset,
which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
isn't used, as the list_for_each_entry_continue_rcu() has:

#define list_for_each_entry_continue_rcu(pos, head, member)\
for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
 >member != (head);\
 pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

Thus the first use of pos is pos->member.next or:

mddev->disks.next

But now you converted it to rdev = mddev->disks.next, which means the
first use is:

pos = mddev->disks.next->next

I think you are skipping the first element here.



struct mddev {
...
   struct list_headdisks;
...}

struct list_head {
  struct list_head *next, *prev;
};

The tricky thing is that "list_entry_rcu" before and after the patch is
reading the same thing.


No it isn't.
Before the patch it is passed the address of the 'next' field.  After the
patch it is passed the contents of the 'next' field.


Here I meant "list_entry_rcu" (in include/linux/rculist.h) not the
change to drivers/md/bitmap.c.



However in your case, the change I proposed is probably wrong I trust
you on this side. :) What's your proposal to fix it with the rculist patch?


What needs fixing?  I don't see anything broken.

Maybe there is something in this "rculist patch" that I'm missing.  Can you
point me at it?


Do not apply the patch on drivers/md/bitmap.c but only on
include/linux/rculist.h and you will see that the compilation fails.


Here an example of the compilation error in drivers/md/bitmap.c

In file included from include/linux/sched.h:17:0,
 from include/linux/blkdev.h:4,
 from drivers/md/bitmap.c:18:
drivers/md/bitmap.c: In function ‘next_active_rdev’:
include/linux/compiler.h:469:24: error: lvalue required as unary ‘&’ operand
  (volatile typeof(x) *)&(x); })
^
include/linux/kernel.h:800:49: note: in definition of macro ‘container_of’
  const typeof( ((type *)0)->member ) *__mptr = (ptr); \
 ^
include/linux/compiler.h:470:26: note: in expansion of macro ‘__ACCESS_ONCE’
 #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
  ^
include/linux/rcupdate.h:630:26: note: in expansion of macro ‘ACCESS_ONCE’
  typeof(p) _p1 = ACCESS_ONCE(p); \
  ^
include/linux/rcupdate.h:587:48: note: in expansion of macro 
‘lockless_dereference’

  typeof(*p) *p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:723:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu)
  ^
include/linux/rcupdate.h:746:32: note: in expansion of macro 
‘rcu_dereference_check’
 #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ 
needed? @@@*/

^
include/linux/rculist.h:251:28: note: in expansion of macro 
‘rcu_dereference_raw’

  container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member); \
^
drivers/md/bitmap.c:184:10: note: in expansion of macro ‘list_entry_rcu’
   rdev = list_entry_rcu(>disks, struct md_rdev, same_set);
  ^

Thanks.
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-18 Thread Patrick Marlier
On Mon, May 18, 2015 at 4:06 AM, NeilBrown  wrote:
> On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier
>  wrote:
>> On 05/13/2015 04:58 AM, NeilBrown wrote:
>> > On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt  
>> > wrote:
>> >> On Tue, 12 May 2015 15:46:26 -0700
>> >> "Paul E. McKenney"  wrote:
>> >> What comes after this is:
>> >>
>> >>list_for_each_entry_continue_rcu(rdev, >disks, same_set) {
>> >>if (rdev->raid_disk >= 0 &&
>> >>
>> >> Now the original code had:
>> >>
>> >>rdev = list_entry_rcu(>disks, struct md_rdev, same_set);
>> >>
>> >> Where >disks would return the address of the disks field of
>> >> mddev which is a list head. Then it would get the 'same_set' offset,
>> >> which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
>> >> isn't used, as the list_for_each_entry_continue_rcu() has:
>> >>
>> >> #define list_for_each_entry_continue_rcu(pos, head, member)   
>> >>  \
>> >>for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
>> >> >member != (head);\
>> >> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>> >>
>> >> Thus the first use of pos is pos->member.next or:
>> >>
>> >>mddev->disks.next
>> >>
>> >> But now you converted it to rdev = mddev->disks.next, which means the
>> >> first use is:
>> >>
>> >>pos = mddev->disks.next->next
>> >>
>> >> I think you are skipping the first element here.
>>
>>
>> struct mddev {
>> ...
>>   struct list_headdisks;
>> ...}
>>
>> struct list_head {
>>  struct list_head *next, *prev;
>> };
>>
>> The tricky thing is that "list_entry_rcu" before and after the patch is
>> reading the same thing.
>
> No it isn't.
> Before the patch it is passed the address of the 'next' field.  After the
> patch it is passed the contents of the 'next' field.

Here I meant "list_entry_rcu" (in include/linux/rculist.h) not the
change to drivers/md/bitmap.c.


>> However in your case, the change I proposed is probably wrong I trust
>> you on this side. :) What's your proposal to fix it with the rculist patch?
>
> What needs fixing?  I don't see anything broken.
>
> Maybe there is something in this "rculist patch" that I'm missing.  Can you
> point me at it?

Do not apply the patch on drivers/md/bitmap.c but only on
include/linux/rculist.h and you will see that the compilation fails.

--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-18 Thread Patrick Marlier
On Mon, May 18, 2015 at 4:06 AM, NeilBrown ne...@suse.de wrote:
 On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier
 patrick.marl...@gmail.com wrote:
 On 05/13/2015 04:58 AM, NeilBrown wrote:
  On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org 
  wrote:
  On Tue, 12 May 2015 15:46:26 -0700
  Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
  What comes after this is:
 
 list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) {
 if (rdev-raid_disk = 0 
 
  Now the original code had:
 
 rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);
 
  Where mddev-disks would return the address of the disks field of
  mddev which is a list head. Then it would get the 'same_set' offset,
  which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
  isn't used, as the list_for_each_entry_continue_rcu() has:
 
  #define list_for_each_entry_continue_rcu(pos, head, member)   
   \
 for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \
  pos-member != (head);\
  pos = list_entry_rcu(pos-member.next, typeof(*pos), member))
 
  Thus the first use of pos is pos-member.next or:
 
 mddev-disks.next
 
  But now you converted it to rdev = mddev-disks.next, which means the
  first use is:
 
 pos = mddev-disks.next-next
 
  I think you are skipping the first element here.


 struct mddev {
 ...
   struct list_headdisks;
 ...}

 struct list_head {
  struct list_head *next, *prev;
 };

 The tricky thing is that list_entry_rcu before and after the patch is
 reading the same thing.

 No it isn't.
 Before the patch it is passed the address of the 'next' field.  After the
 patch it is passed the contents of the 'next' field.

Here I meant list_entry_rcu (in include/linux/rculist.h) not the
change to drivers/md/bitmap.c.


 However in your case, the change I proposed is probably wrong I trust
 you on this side. :) What's your proposal to fix it with the rculist patch?

 What needs fixing?  I don't see anything broken.

 Maybe there is something in this rculist patch that I'm missing.  Can you
 point me at it?

Do not apply the patch on drivers/md/bitmap.c but only on
include/linux/rculist.h and you will see that the compilation fails.

--
Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-18 Thread Patrick Marlier


On 05/18/2015 03:53 PM, Patrick Marlier wrote:

On Mon, May 18, 2015 at 4:06 AM, NeilBrown ne...@suse.de wrote:

On Sat, 16 May 2015 19:42:54 +0200 Patrick Marlier
patrick.marl...@gmail.com wrote:

On 05/13/2015 04:58 AM, NeilBrown wrote:

On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org wrote:

On Tue, 12 May 2015 15:46:26 -0700
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
What comes after this is:

list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) {
if (rdev-raid_disk = 0 

Now the original code had:

rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);

Where mddev-disks would return the address of the disks field of
mddev which is a list head. Then it would get the 'same_set' offset,
which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
isn't used, as the list_for_each_entry_continue_rcu() has:

#define list_for_each_entry_continue_rcu(pos, head, member)\
for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \
 pos-member != (head);\
 pos = list_entry_rcu(pos-member.next, typeof(*pos), member))

Thus the first use of pos is pos-member.next or:

mddev-disks.next

But now you converted it to rdev = mddev-disks.next, which means the
first use is:

pos = mddev-disks.next-next

I think you are skipping the first element here.



struct mddev {
...
   struct list_headdisks;
...}

struct list_head {
  struct list_head *next, *prev;
};

The tricky thing is that list_entry_rcu before and after the patch is
reading the same thing.


No it isn't.
Before the patch it is passed the address of the 'next' field.  After the
patch it is passed the contents of the 'next' field.


Here I meant list_entry_rcu (in include/linux/rculist.h) not the
change to drivers/md/bitmap.c.



However in your case, the change I proposed is probably wrong I trust
you on this side. :) What's your proposal to fix it with the rculist patch?


What needs fixing?  I don't see anything broken.

Maybe there is something in this rculist patch that I'm missing.  Can you
point me at it?


Do not apply the patch on drivers/md/bitmap.c but only on
include/linux/rculist.h and you will see that the compilation fails.


Here an example of the compilation error in drivers/md/bitmap.c

In file included from include/linux/sched.h:17:0,
 from include/linux/blkdev.h:4,
 from drivers/md/bitmap.c:18:
drivers/md/bitmap.c: In function ‘next_active_rdev’:
include/linux/compiler.h:469:24: error: lvalue required as unary ‘’ operand
  (volatile typeof(x) *)(x); })
^
include/linux/kernel.h:800:49: note: in definition of macro ‘container_of’
  const typeof( ((type *)0)-member ) *__mptr = (ptr); \
 ^
include/linux/compiler.h:470:26: note: in expansion of macro ‘__ACCESS_ONCE’
 #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
  ^
include/linux/rcupdate.h:630:26: note: in expansion of macro ‘ACCESS_ONCE’
  typeof(p) _p1 = ACCESS_ONCE(p); \
  ^
include/linux/rcupdate.h:587:48: note: in expansion of macro 
‘lockless_dereference’

  typeof(*p) *p1 = (typeof(*p) *__force)lockless_dereference(p); \
^
include/linux/rcupdate.h:723:2: note: in expansion of macro 
‘__rcu_dereference_check’

  __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu)
  ^
include/linux/rcupdate.h:746:32: note: in expansion of macro 
‘rcu_dereference_check’
 #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ 
needed? @@@*/

^
include/linux/rculist.h:251:28: note: in expansion of macro 
‘rcu_dereference_raw’

  container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member); \
^
drivers/md/bitmap.c:184:10: note: in expansion of macro ‘list_entry_rcu’
   rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);
  ^

Thanks.
--
Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-16 Thread Patrick Marlier



On 05/13/2015 04:58 AM, NeilBrown wrote:

On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt  wrote:


On Tue, 12 May 2015 15:46:26 -0700
"Paul E. McKenney"  wrote:


From: Patrick Marlier 

Signed-off-by: Patrick Marlier 
Signed-off-by: Paul E. McKenney 
---
  drivers/md/bitmap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 2bc56e2a3526..32901772e4ee 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev 
*rdev, struct mddev *mdde
rcu_read_lock();
if (rdev == NULL)
/* start at the beginning */
-   rdev = list_entry_rcu(>disks, struct md_rdev, same_set);
+   rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, 
same_set);


Hmm, this changes the semantics.

The original code looks nasty, I first thought it was broken, but it
seems to work out of sheer luck (or clever hack)


Definitely a clever hack - no question of "luck" here :-)

It might makes sense to change it to use list_for_each_entry_from_rcu()

   if (rdev == NULL)
  rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, same_set);
   else {
  rdev_dec_pending(rdev, mddev);
  rdev = list_next_entry_rcu(rdev->same_set.next, struct md_rdev, same_set);
   }
   list_for_each_entry_from_rcu(rdev, )

but there isn't a "list_next_entry_rcu"


Also, it would have been polity to at least 'cc' them Maintainer of this code
in the original patch - no?


Sure my bad. I hesitated to CC maintainers. I was almost sure that it 
will be rejected so I wanted to avoid noise.





Thanks,
NeilBrown




else {
/* release the previous rdev and start from there. */
rdev_dec_pending(rdev, mddev);



What comes after this is:

list_for_each_entry_continue_rcu(rdev, >disks, same_set) {
if (rdev->raid_disk >= 0 &&

Now the original code had:

   rdev = list_entry_rcu(>disks, struct md_rdev, same_set);

Where >disks would return the address of the disks field of
mddev which is a list head. Then it would get the 'same_set' offset,
which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
isn't used, as the list_for_each_entry_continue_rcu() has:

#define list_for_each_entry_continue_rcu(pos, head, member) \
for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
 >member != (head); \
 pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

Thus the first use of pos is pos->member.next or:

   mddev->disks.next

But now you converted it to rdev = mddev->disks.next, which means the
first use is:

   pos = mddev->disks.next->next

I think you are skipping the first element here.



struct mddev {
...
struct list_headdisks;
...}

struct list_head {
struct list_head *next, *prev;
};

The tricky thing is that "list_entry_rcu" before and after the patch is 
reading the same thing.


However in your case, the change I proposed is probably wrong I trust 
you on this side. :) What's your proposal to fix it with the rculist patch?


PS: In the rculist patch I proposed, I avoid the store and the atomic 
reload in the stack variable __ptr. (yeap, the 
rcu_dereference_raw/ACCESS_ONCE is a bit confusing because it implicitly 
do & on the parameter).


Thanks.
--
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 3/4] md/bitmap: Fix list_entry_rcu usage

2015-05-16 Thread Patrick Marlier



On 05/13/2015 04:58 AM, NeilBrown wrote:

On Tue, 12 May 2015 22:38:53 -0400 Steven Rostedt rost...@goodmis.org wrote:


On Tue, 12 May 2015 15:46:26 -0700
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:


From: Patrick Marlier patrick.marl...@gmail.com

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
---
  drivers/md/bitmap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 2bc56e2a3526..32901772e4ee 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct md_rdev 
*rdev, struct mddev *mdde
rcu_read_lock();
if (rdev == NULL)
/* start at the beginning */
-   rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);
+   rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, 
same_set);


Hmm, this changes the semantics.

The original code looks nasty, I first thought it was broken, but it
seems to work out of sheer luck (or clever hack)


Definitely a clever hack - no question of luck here :-)

It might makes sense to change it to use list_for_each_entry_from_rcu()

   if (rdev == NULL)
  rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, same_set);
   else {
  rdev_dec_pending(rdev, mddev);
  rdev = list_next_entry_rcu(rdev-same_set.next, struct md_rdev, same_set);
   }
   list_for_each_entry_from_rcu(rdev, )

but there isn't a list_next_entry_rcu


Also, it would have been polity to at least 'cc' them Maintainer of this code
in the original patch - no?


Sure my bad. I hesitated to CC maintainers. I was almost sure that it 
will be rejected so I wanted to avoid noise.





Thanks,
NeilBrown




else {
/* release the previous rdev and start from there. */
rdev_dec_pending(rdev, mddev);



What comes after this is:

list_for_each_entry_continue_rcu(rdev, mddev-disks, same_set) {
if (rdev-raid_disk = 0 

Now the original code had:

   rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);

Where mddev-disks would return the address of the disks field of
mddev which is a list head. Then it would get the 'same_set' offset,
which is 0, and rdev is pointing to a makeshift md_rdev struct. But it
isn't used, as the list_for_each_entry_continue_rcu() has:

#define list_for_each_entry_continue_rcu(pos, head, member) \
for (pos = list_entry_rcu(pos-member.next, typeof(*pos), member); \
 pos-member != (head); \
 pos = list_entry_rcu(pos-member.next, typeof(*pos), member))

Thus the first use of pos is pos-member.next or:

   mddev-disks.next

But now you converted it to rdev = mddev-disks.next, which means the
first use is:

   pos = mddev-disks.next-next

I think you are skipping the first element here.



struct mddev {
...
struct list_headdisks;
...}

struct list_head {
struct list_head *next, *prev;
};

The tricky thing is that list_entry_rcu before and after the patch is 
reading the same thing.


However in your case, the change I proposed is probably wrong I trust 
you on this side. :) What's your proposal to fix it with the rculist patch?


PS: In the rculist patch I proposed, I avoid the store and the atomic 
reload in the stack variable __ptr. (yeap, the 
rcu_dereference_raw/ACCESS_ONCE is a bit confusing because it implicitly 
do  on the parameter).


Thanks.
--
Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-04-13 Thread Patrick Marlier
Strange I don't get any conflict.
Maybe due to my email client so I attached the patches to this email.

Thanks,
--
Patrick

On Sat, Mar 28, 2015 at 11:47 AM, Paul E. McKenney
 wrote:
> On Sat, Mar 28, 2015 at 03:42:10AM -0700, Paul E. McKenney wrote:
>> On Wed, Mar 25, 2015 at 04:01:24PM +0100, Patrick Marlier wrote:
>> > On 03/25/2015 03:30 PM, Paul E. McKenney wrote:
>> > >On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote:
>> > >>Change to read effectively ptr with rcu_dereference_raw and not the
>> > >>__ptr variable on the stack.
>> > >>
>> > >>Signed-off-by: Patrick Marlier 
>> > >Avoiding an extra load could be worthwhile in a number of situations,
>> > >agreed.
>> > Not only a load. It adds a store and a load on the stack and I think
>> > this creates a dependency in the processor pipeline.
>> >
>> > >However, won't this change cause sparse to complain if invoked on a
>> > >non-RCU-protected pointer?  The ability to use list-RCU API
>> > >members on both RCU and non-RCU pointers was one of the points
>> > >of the previous commit, right?
>> > Probably we can put back the cast but I am not familiar enough with
>> > the RCU API.
>> >
>> > Also, the problem here is that you probably want ACCESS_ONCE to
>> > happen on the content of 'ptr' and not on the stack variable
>> > '__ptr'.
>> >
>> > (you have to follow this chain: rcu_dereference_raw ->
>> > rcu_dereference_check -> __rcu_dereference_check ->
>> > lockless_dereference -> ACCESS_ONCE)
>> >
>> > #define lockless_dereference(p) \
>> > ({ \
>> > typeof(p) _p1 = ACCESS_ONCE(p); \
>> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
>> > (_p1); \
>> > })
>> >
>> > #define __ACCESS_ONCE(x) ({ \
>> >  __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \
>> > (volatile typeof(x) *)&(x); })
>> > #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
>> >
>> > Note that ACCESS_ONCE is doing "&" on x.
>> >
>> > IMHO, I would prefer saving some useless instructions for better
>> > performance rather than giving too much flexibility on the API (also
>> > pretty sure the cast can be still done).
>>
>> OK, what I am going to do is to apply your patches for testing purposes.
>> If there are no complaints, they will likely go into v4.3 or thereabouts.
>
> Except that I hit conflicts.  Could you please rebase to rcu/dev at
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git?
>
> Thanx, Paul
>
From 8ac818d418068105623e43bbd289d9553c182e6c Mon Sep 17 00:00:00 2001
From: Patrick Marlier 
Date: Tue, 24 Mar 2015 11:16:55 +0100
Subject: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with
 rcu_dereference_raw

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier 
---
 include/linux/rculist.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index a18b16f..9d9baea 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
  * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
  */
 #define list_entry_rcu(ptr, type, member) \
-({ \
-	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
-	container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
-})
+	container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member)
 
 /**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
-- 
2.1.0

From 3ab5f342939f768b693708bd32ef3f350af3b5a6 Mon Sep 17 00:00:00 2001
From: Patrick Marlier 
Date: Tue, 24 Mar 2015 11:21:05 +0100
Subject: [PATCH 2/3] netfilter: fix list_entry_rcu usage.

Signed-off-by: Patrick Marlier 
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..05bd311 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb,
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
-	elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list);
+	elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list);
 next_hook:
 	verdict = nf_iterate(_hooks[pf][hook], skb, hook, indev,
 			 outdev, , okfn, hook_thresh);
-- 

Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-04-13 Thread Patrick Marlier
Strange I don't get any conflict.
Maybe due to my email client so I attached the patches to this email.

Thanks,
--
Patrick

On Sat, Mar 28, 2015 at 11:47 AM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Sat, Mar 28, 2015 at 03:42:10AM -0700, Paul E. McKenney wrote:
 On Wed, Mar 25, 2015 at 04:01:24PM +0100, Patrick Marlier wrote:
  On 03/25/2015 03:30 PM, Paul E. McKenney wrote:
  On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote:
  Change to read effectively ptr with rcu_dereference_raw and not the
  __ptr variable on the stack.
  
  Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
  Avoiding an extra load could be worthwhile in a number of situations,
  agreed.
  Not only a load. It adds a store and a load on the stack and I think
  this creates a dependency in the processor pipeline.
 
  However, won't this change cause sparse to complain if invoked on a
  non-RCU-protected pointer?  The ability to use list-RCU API
  members on both RCU and non-RCU pointers was one of the points
  of the previous commit, right?
  Probably we can put back the cast but I am not familiar enough with
  the RCU API.
 
  Also, the problem here is that you probably want ACCESS_ONCE to
  happen on the content of 'ptr' and not on the stack variable
  '__ptr'.
 
  (you have to follow this chain: rcu_dereference_raw -
  rcu_dereference_check - __rcu_dereference_check -
  lockless_dereference - ACCESS_ONCE)
 
  #define lockless_dereference(p) \
  ({ \
  typeof(p) _p1 = ACCESS_ONCE(p); \
  smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
  (_p1); \
  })
 
  #define __ACCESS_ONCE(x) ({ \
   __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \
  (volatile typeof(x) *)(x); })
  #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
 
  Note that ACCESS_ONCE is doing  on x.
 
  IMHO, I would prefer saving some useless instructions for better
  performance rather than giving too much flexibility on the API (also
  pretty sure the cast can be still done).

 OK, what I am going to do is to apply your patches for testing purposes.
 If there are no complaints, they will likely go into v4.3 or thereabouts.

 Except that I hit conflicts.  Could you please rebase to rcu/dev at
 git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git?

 Thanx, Paul

From 8ac818d418068105623e43bbd289d9553c182e6c Mon Sep 17 00:00:00 2001
From: Patrick Marlier patrick.marl...@gmail.com
Date: Tue, 24 Mar 2015 11:16:55 +0100
Subject: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with
 rcu_dereference_raw

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 include/linux/rculist.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index a18b16f..9d9baea 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
  * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
  */
 #define list_entry_rcu(ptr, type, member) \
-({ \
-	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
-	container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
-})
+	container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member)
 
 /**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
-- 
2.1.0

From 3ab5f342939f768b693708bd32ef3f350af3b5a6 Mon Sep 17 00:00:00 2001
From: Patrick Marlier patrick.marl...@gmail.com
Date: Tue, 24 Mar 2015 11:21:05 +0100
Subject: [PATCH 2/3] netfilter: fix list_entry_rcu usage.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..05bd311 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, struct sk_buff *skb,
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
-	elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list);
+	elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list);
 next_hook:
 	verdict = nf_iterate(nf_hooks[pf][hook], skb, hook, indev,
 			 outdev, elem, okfn, hook_thresh);
-- 
2.1.0

From 84f0428e2c9172692aba727636a643efb6994752 Mon Sep 17 00:00:00 2001
From: Patrick Marlier patrick.marl...@gmail.com
Date: Tue, 24 Mar 2015 11:22:10 +0100
Subject: [PATCH 3/3] md/bitmap: fix list_entry_rcu usage.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 drivers/md/bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 3a57679..ed00e46 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -181,7 +181,7

Re: [PATCH 2/3] netfilter: fix list_entry_rcu usage.

2015-03-25 Thread Patrick Marlier

On 03/25/2015 03:36 PM, Paul E. McKenney wrote:

On Tue, Mar 24, 2015 at 11:31:44AM +0100, Patrick Marlier wrote:

>Signed-off-by: Patrick Marlier
>---
>  net/netfilter/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/netfilter/core.c b/net/netfilter/core.c
>index fea9ef5..05bd311 100644
>--- a/net/netfilter/core.c
>+++ b/net/netfilter/core.c
>@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook,
>struct sk_buff *skb,
> /* We may already have this, but read-locks nest anyway */
> rcu_read_lock();
>
>-   elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list);
>+   elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list);

And this departs from the list_entry() API.  Is this really a good idea?


No opinion on that but AFAIK there are only 2 spots in the whole kernel 
source where the ".next" is not used with list_entry_rcu.

--
Patrick Marlier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-03-25 Thread Patrick Marlier

On 03/25/2015 03:30 PM, Paul E. McKenney wrote:

On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote:

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier 

Avoiding an extra load could be worthwhile in a number of situations,
agreed.
Not only a load. It adds a store and a load on the stack and I think 
this creates a dependency in the processor pipeline.



However, won't this change cause sparse to complain if invoked on a
non-RCU-protected pointer?  The ability to use list-RCU API
members on both RCU and non-RCU pointers was one of the points
of the previous commit, right?
Probably we can put back the cast but I am not familiar enough with the 
RCU API.


Also, the problem here is that you probably want ACCESS_ONCE to happen 
on the content of 'ptr' and not on the stack variable '__ptr'.


(you have to follow this chain: rcu_dereference_raw -> 
rcu_dereference_check -> __rcu_dereference_check -> lockless_dereference 
-> ACCESS_ONCE)


#define lockless_dereference(p) \
({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
(_p1); \
})

#define __ACCESS_ONCE(x) ({ \
 __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \
(volatile typeof(x) *)&(x); })
#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))

Note that ACCESS_ONCE is doing "&" on x.

IMHO, I would prefer saving some useless instructions for better 
performance rather than giving too much flexibility on the API (also 
pretty sure the cast can be still done).

--
Patrick Marlier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] netfilter: fix list_entry_rcu usage.

2015-03-25 Thread Patrick Marlier

On 03/25/2015 03:36 PM, Paul E. McKenney wrote:

On Tue, Mar 24, 2015 at 11:31:44AM +0100, Patrick Marlier wrote:

Signed-off-by: Patrick Marlierpatrick.marl...@gmail.com
---
  net/netfilter/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..05bd311 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook,
struct sk_buff *skb,
 /* We may already have this, but read-locks nest anyway */
 rcu_read_lock();

-   elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list);
+   elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, list);

And this departs from the list_entry() API.  Is this really a good idea?


No opinion on that but AFAIK there are only 2 spots in the whole kernel 
source where the .next is not used with list_entry_rcu.

--
Patrick Marlier
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-03-25 Thread Patrick Marlier

On 03/25/2015 03:30 PM, Paul E. McKenney wrote:

On Tue, Mar 24, 2015 at 11:31:38AM +0100, Patrick Marlier wrote:

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com

Avoiding an extra load could be worthwhile in a number of situations,
agreed.
Not only a load. It adds a store and a load on the stack and I think 
this creates a dependency in the processor pipeline.



However, won't this change cause sparse to complain if invoked on a
non-RCU-protected pointer?  The ability to use list-RCU API
members on both RCU and non-RCU pointers was one of the points
of the previous commit, right?
Probably we can put back the cast but I am not familiar enough with the 
RCU API.


Also, the problem here is that you probably want ACCESS_ONCE to happen 
on the content of 'ptr' and not on the stack variable '__ptr'.


(you have to follow this chain: rcu_dereference_raw - 
rcu_dereference_check - __rcu_dereference_check - lockless_dereference 
- ACCESS_ONCE)


#define lockless_dereference(p) \
({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
(_p1); \
})

#define __ACCESS_ONCE(x) ({ \
 __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \
(volatile typeof(x) *)(x); })
#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))

Note that ACCESS_ONCE is doing  on x.

IMHO, I would prefer saving some useless instructions for better 
performance rather than giving too much flexibility on the API (also 
pretty sure the cast can be still done).

--
Patrick Marlier
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] md/bitmap: fix list_entry_rcu usage.

2015-03-24 Thread Patrick Marlier

Signed-off-by: Patrick Marlier 
---
 drivers/md/bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 3a57679..ed00e46 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct 
md_rdev *rdev, struct mddev *mdde

rcu_read_lock();
if (rdev == NULL)
/* start at the beginning */
-   rdev = list_entry_rcu(>disks, struct md_rdev, same_set);
+   rdev = list_entry_rcu(mddev->disks.next, struct md_rdev, same_set);
else {
/* release the previous rdev and start from there. */
rdev_dec_pending(rdev, mddev);
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-03-24 Thread Patrick Marlier

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier 
---
 include/linux/rculist.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index a18b16f..9d9baea 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct 
list_head *list,
  * primitives such as list_add_rcu() as long as it's guarded by 
rcu_read_lock().

  */
 #define list_entry_rcu(ptr, type, member) \
-({ \
-   typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
-   container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
-})
+   container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member)

 /**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] netfilter: fix list_entry_rcu usage.

2015-03-24 Thread Patrick Marlier

Signed-off-by: Patrick Marlier 
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..05bd311 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, 
struct sk_buff *skb,

/* We may already have this, but read-locks nest anyway */
rcu_read_lock();

-   elem = list_entry_rcu(_hooks[pf][hook], struct nf_hook_ops, list);
+   elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, 
list);

 next_hook:
verdict = nf_iterate(_hooks[pf][hook], skb, hook, indev,
 outdev, , okfn, hook_thresh);
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] rculist: Fix list_entry_rcu to read ptr with, rcu_dereference_raw

2015-03-24 Thread Patrick Marlier

Hi Paul and Josh,

While running some benchmarks with rcu, I noticed that rculist had a 
performance issue compared to the non-rcu list.
Having a closer look at the rlulist, I found that there is a bad usage 
of rcu_dereference_raw in list_entry_rcu().
Indeed, "typeof (*ptr) *__ptr = ptr;" reads ptr and 
"rcu_dereference_raw(__ptr)" forces read on the stack variable __ptr.

However, ptr should be read with rcu_dereference_raw not the __ptr.

This was introduced some years ago with commit 67bdbffd (2010-02-25).
I am proposing to revert this part of the patch but probably this can be 
fixed differently.


Fixing this showed 2 problems in the usage of list_entry_rcu.
Since it touches other parts of kernel, I guess it requires ack from 
other maintainers.


Thanks.

PS: I will Cc the others maintainers when somebody confirms the problem.

Signed-off-by: Patrick Marlier 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] md/bitmap: fix list_entry_rcu usage.

2015-03-24 Thread Patrick Marlier

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 drivers/md/bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 3a57679..ed00e46 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -181,7 +181,7 @@ static struct md_rdev *next_active_rdev(struct 
md_rdev *rdev, struct mddev *mdde

rcu_read_lock();
if (rdev == NULL)
/* start at the beginning */
-   rdev = list_entry_rcu(mddev-disks, struct md_rdev, same_set);
+   rdev = list_entry_rcu(mddev-disks.next, struct md_rdev, same_set);
else {
/* release the previous rdev and start from there. */
rdev_dec_pending(rdev, mddev);
--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] rculist: Fix list_entry_rcu to read ptr with rcu_dereference_raw

2015-03-24 Thread Patrick Marlier

Change to read effectively ptr with rcu_dereference_raw and not the
__ptr variable on the stack.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 include/linux/rculist.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index a18b16f..9d9baea 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct 
list_head *list,
  * primitives such as list_add_rcu() as long as it's guarded by 
rcu_read_lock().

  */
 #define list_entry_rcu(ptr, type, member) \
-({ \
-   typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
-   container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
-})
+   container_of((typeof(ptr))rcu_dereference_raw(ptr), type, member)

 /**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] netfilter: fix list_entry_rcu usage.

2015-03-24 Thread Patrick Marlier

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..05bd311 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -174,7 +174,7 @@ int nf_hook_slow(u_int8_t pf, unsigned int hook, 
struct sk_buff *skb,

/* We may already have this, but read-locks nest anyway */
rcu_read_lock();

-   elem = list_entry_rcu(nf_hooks[pf][hook], struct nf_hook_ops, list);
+   elem = list_entry_rcu(nf_hooks[pf][hook].next, struct nf_hook_ops, 
list);

 next_hook:
verdict = nf_iterate(nf_hooks[pf][hook], skb, hook, indev,
 outdev, elem, okfn, hook_thresh);
--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] rculist: Fix list_entry_rcu to read ptr with, rcu_dereference_raw

2015-03-24 Thread Patrick Marlier

Hi Paul and Josh,

While running some benchmarks with rcu, I noticed that rculist had a 
performance issue compared to the non-rcu list.
Having a closer look at the rlulist, I found that there is a bad usage 
of rcu_dereference_raw in list_entry_rcu().
Indeed, typeof (*ptr) *__ptr = ptr; reads ptr and 
rcu_dereference_raw(__ptr) forces read on the stack variable __ptr.

However, ptr should be read with rcu_dereference_raw not the __ptr.

This was introduced some years ago with commit 67bdbffd (2010-02-25).
I am proposing to revert this part of the patch but probably this can be 
fixed differently.


Fixing this showed 2 problems in the usage of list_entry_rcu.
Since it touches other parts of kernel, I guess it requires ack from 
other maintainers.


Thanks.

PS: I will Cc the others maintainers when somebody confirms the problem.

Signed-off-by: Patrick Marlier patrick.marl...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Josh Triplett j...@joshtriplett.org
Cc: Arnd Bergmann a...@arndb.de
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/