Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

2020-06-18 Thread Joel Fernandes
On Thu, Jun 18, 2020 at 04:09:34PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
> > 
> > First, this looks like a very nice optimization, thank you!

Thanks!

> > > Cc: ure...@gmail.com
> > > Signed-off-by: Joel Fernandes (Google) 
> 
> As discussed over IRC, I updated the patch as shown below.  Does that
> work for you?

Yes, that works for me. Thanks!

 - Joel


>   Thanx, Paul
> 
> 
> 
> commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
> Author: Joel Fernandes (Google) 
> Date:   Thu Jun 18 16:29:49 2020 -0400
> 
> rcu/segcblist: Prevent useless GP start if no CBs to accelerate
> 
> The rcu_segcblist_accelerate() function returns true iff it is necessary
> to request another grace period.  A tracing session showed that this
> function unnecessarily requests grace periods.
> 
> For exmaple, consider the following sequence of events:
> 1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
> 2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
> 3. Thus rcu_segcblist_accelerate() returns true, requesting grace period 
> N.
> 4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
> 4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
> 5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
>there are no new callbacks.  However, rcu_segcblist_accelerate()
>nevertheless (uselessly) requests a new grace period N+1.
> 
> This extra grace period results in additional lock contention and also
> additional wakeups, all for no good reason.
> 
> This commit therefore adds a check to rcu_segcblist_accelerate() that
> prevents the return of true when there are no new callbacks.
> 
> This change reduces the number of grace periods (GPs) and wakeups in each
> of eleven five-second rcutorture runs as follows:
> 
> ++---+---+
> | #  | Number of GPs | Number of Wakeups |
> ++=+=+=+=+
> | 1  | With| Without | With| Without |
> ++-+-+-+-+
> | 2  |  75 |  89 | 113 | 119 |
> ++-+-+-+-+
> | 3  |  62 |  91 | 105 | 123 |
> ++-+-+-+-+
> | 4  |  60 |  79 |  98 | 110 |
> ++-+-+-+-+
> | 5  |  63 |  79 |  99 | 112 |
> ++-+-+-+-+
> | 6  |  57 |  89 |  96 | 123 |
> ++-+-+-+-+
> | 7  |  64 |  85 |  97 | 118 |
> ++-+-+-+-+
> | 8  |  58 |  83 |  98 | 113 |
> ++-+-+-+-+
> | 9  |  57 |  77 |  89 | 104 |
> ++-+-+-+-+
> | 10 |  66 |  82 |  98 | 119 |
> ++-+-+-+-+
> | 11 |  52 |  82 |  83 | 117 |
> ++-+-+-+-+
> 
> The reduction in the number of wakeups ranges from 5% to 40%.
> 
> Cc: ure...@gmail.com
> [ paulmck: Rework commit log and comment. ]
> Signed-off-by: Joel Fernandes (Google) 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f661..2d2a6b6b9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist 
> *rsclp, unsigned long seq)
>* Also advance to the oldest segment of callbacks whose
>* ->gp_seq[] completion is at or after that passed in via "seq",
>* skipping any empty segments.
> +  *
> +  * Note that segment "i" (and any lower-numbered segments
> +  * containing older callbacks) will be unaffected, and their
> +  * grace-period numbers remain unchanged.  For example, if i ==
> +  * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
> +  * Instead, the CBs in NEXT_TAIL will be merged with those in
> +  * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
> +  * would be updated.  NEXT_TAIL would then be empty.
>*/
> - if (++i >= RCU_NEXT_TAIL)
> + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>   return false;
>  
>   /*


Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

2020-06-18 Thread Paul E. McKenney
On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
> 
> First, this looks like a very nice optimization, thank you!
> 
> > rcu_segcblist_accelerate() returns true if a GP is to be
> > started/requested and false if not. During tracing, I found that it is
> > asking that GPs be requested
> 
> s/GPs/unnecessary GPs/?  Plus "." at end of the sentence.
> 
> > The exact flow seems to be something like:
> > 1. Callbacks are queued on CPU A - into the NEXT list.
> > 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a 
> > GP X.
> > 3. GP thread wakes up on another CPU, starts the GP X and requests QS from 
> > CPU A.
> > 4. CPU A's softirq runs again presumably because of #3.
> 
> Yes, that is one reason RCU softirq might run again.
> 
> > 5. CPU A's softirq now does acceleration again, this time no CBs are
> >accelerated since last attempt, but it still requests GP X+1 which
> >could be useless.
> 
> I can't think of a case where this request helps.  How about: "but
> it still unnecessarily requests GP X+1"?
> 
> > The fix is, prevent the useless GP start if we detect no CBs are there
> > to accelerate.
> > 
> > With this, we have the following improvement in short runs of
> > rcutorture (5 seconds each):
> > ++---+---+
> > | #  | Number of GPs | Number of Wakeups |
> > ++=+=+=+=+
> > | 1  | With| Without | With| Without |
> > ++-+-+-+-+
> > | 2  |  75 |  89 | 113 | 119 |
> > ++-+-+-+-+
> > | 3  |  62 |  91 | 105 | 123 |
> > ++-+-+-+-+
> > | 4  |  60 |  79 |  98 | 110 |
> > ++-+-+-+-+
> > | 5  |  63 |  79 |  99 | 112 |
> > ++-+-+-+-+
> > | 6  |  57 |  89 |  96 | 123 |
> > ++-+-+-+-+
> > | 7  |  64 |  85 |  97 | 118 |
> > ++-+-+-+-+
> > | 8  |  58 |  83 |  98 | 113 |
> > ++-+-+-+-+
> > | 9  |  57 |  77 |  89 | 104 |
> > ++-+-+-+-+
> > | 10 |  66 |  82 |  98 | 119 |
> > ++-+-+-+-+
> > | 11 |  52 |  82 |  83 | 117 |
> > ++-+-+-+-+
> 
> So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
> overall reduction in wakeups across all the runs.  That should be of
> some use to someone.  ;-)
> 
> I do run rcutorture quite a bit, but is there a more real-world
> benchmark that could be tried?
> 
> > Cc: ure...@gmail.com
> > Signed-off-by: Joel Fernandes (Google) 

As discussed over IRC, I updated the patch as shown below.  Does that
work for you?

Thanx, Paul



commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
Author: Joel Fernandes (Google) 
Date:   Thu Jun 18 16:29:49 2020 -0400

rcu/segcblist: Prevent useless GP start if no CBs to accelerate

The rcu_segcblist_accelerate() function returns true iff it is necessary
to request another grace period.  A tracing session showed that this
function unnecessarily requests grace periods.

For exmaple, consider the following sequence of events:
1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
   there are no new callbacks.  However, rcu_segcblist_accelerate()
   nevertheless (uselessly) requests a new grace period N+1.

This extra grace period results in additional lock contention and also
additional wakeups, all for no good reason.

This commit therefore adds a check to rcu_segcblist_accelerate() that
prevents the return of true when there are no new callbacks.

This change reduces the number of grace periods (GPs) and wakeups in each
of eleven five-second rcutorture runs as follows:

++---+---+
| #  | Number of GPs | Number of Wakeups |
++=+=+=+=+
| 1  | With| Without | With| Without |
++-+-+-+-+
| 2  |  75 |  89 | 113 | 119 |

Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

2020-06-18 Thread Paul E. McKenney
On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:

First, this looks like a very nice optimization, thank you!

> rcu_segcblist_accelerate() returns true if a GP is to be
> started/requested and false if not. During tracing, I found that it is
> asking that GPs be requested

s/GPs/unnecessary GPs/?  Plus "." at end of the sentence.

> The exact flow seems to be something like:
> 1. Callbacks are queued on CPU A - into the NEXT list.
> 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP 
> X.
> 3. GP thread wakes up on another CPU, starts the GP X and requests QS from 
> CPU A.
> 4. CPU A's softirq runs again presumably because of #3.

Yes, that is one reason RCU softirq might run again.

> 5. CPU A's softirq now does acceleration again, this time no CBs are
>accelerated since last attempt, but it still requests GP X+1 which
>could be useless.

I can't think of a case where this request helps.  How about: "but
it still unnecessarily requests GP X+1"?

> The fix is, prevent the useless GP start if we detect no CBs are there
> to accelerate.
> 
> With this, we have the following improvement in short runs of
> rcutorture (5 seconds each):
> ++---+---+
> | #  | Number of GPs | Number of Wakeups |
> ++=+=+=+=+
> | 1  | With| Without | With| Without |
> ++-+-+-+-+
> | 2  |  75 |  89 | 113 | 119 |
> ++-+-+-+-+
> | 3  |  62 |  91 | 105 | 123 |
> ++-+-+-+-+
> | 4  |  60 |  79 |  98 | 110 |
> ++-+-+-+-+
> | 5  |  63 |  79 |  99 | 112 |
> ++-+-+-+-+
> | 6  |  57 |  89 |  96 | 123 |
> ++-+-+-+-+
> | 7  |  64 |  85 |  97 | 118 |
> ++-+-+-+-+
> | 8  |  58 |  83 |  98 | 113 |
> ++-+-+-+-+
> | 9  |  57 |  77 |  89 | 104 |
> ++-+-+-+-+
> | 10 |  66 |  82 |  98 | 119 |
> ++-+-+-+-+
> | 11 |  52 |  82 |  83 | 117 |
> ++-+-+-+-+

So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
overall reduction in wakeups across all the runs.  That should be of
some use to someone.  ;-)

I do run rcutorture quite a bit, but is there a more real-world
benchmark that could be tried?

> Cc: ure...@gmail.com
> Signed-off-by: Joel Fernandes (Google) 
> ---
>  kernel/rcu/rcu_segcblist.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f66133b4b3..4782cf17bf4f9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist 
> *rsclp, unsigned long seq)
>* Also advance to the oldest segment of callbacks whose
>* ->gp_seq[] completion is at or after that passed in via "seq",
>* skipping any empty segments.
> +  *
> +  * Note that "i" is the youngest segment of the list after which
> +  * any older segments than "i" would not be mutated or assigned
> +  * GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
> +  * nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
> +  * merged with NEXT_READY_TAIL and the GP numbers of both of
> +  * them would be updated.

In this case, only the GP number of NEXT_READY_TAIL would be updated,
correct?  Or am I missing something subtle in the loop just past the
end of this patch?

Thanx, Paul

>*/
> - if (++i >= RCU_NEXT_TAIL)
> + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>   return false;
>  
>   /*
> -- 
> 2.27.0.111.gc72c7da667-goog
> 


[PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

2020-06-18 Thread Joel Fernandes (Google)
rcu_segcblist_accelerate() returns true if a GP is to be
started/requested and false if not. During tracing, I found that it is
asking that GPs be requested

The exact flow seems to be something like:
1. Callbacks are queued on CPU A - into the NEXT list.
2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU 
A.
4. CPU A's softirq runs again presumably because of #3.
5. CPU A's softirq now does acceleration again, this time no CBs are
   accelerated since last attempt, but it still requests GP X+1 which
   could be useless.

The fix is, prevent the useless GP start if we detect no CBs are there
to accelerate.

With this, we have the following improvement in short runs of
rcutorture (5 seconds each):
++---+---+
| #  | Number of GPs | Number of Wakeups |
++=+=+=+=+
| 1  | With| Without | With| Without |
++-+-+-+-+
| 2  |  75 |  89 | 113 | 119 |
++-+-+-+-+
| 3  |  62 |  91 | 105 | 123 |
++-+-+-+-+
| 4  |  60 |  79 |  98 | 110 |
++-+-+-+-+
| 5  |  63 |  79 |  99 | 112 |
++-+-+-+-+
| 6  |  57 |  89 |  96 | 123 |
++-+-+-+-+
| 7  |  64 |  85 |  97 | 118 |
++-+-+-+-+
| 8  |  58 |  83 |  98 | 113 |
++-+-+-+-+
| 9  |  57 |  77 |  89 | 104 |
++-+-+-+-+
| 10 |  66 |  82 |  98 | 119 |
++-+-+-+-+
| 11 |  52 |  82 |  83 | 117 |
++-+-+-+-+

Cc: ure...@gmail.com
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/rcu/rcu_segcblist.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f66133b4b3..4782cf17bf4f9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, 
unsigned long seq)
 * Also advance to the oldest segment of callbacks whose
 * ->gp_seq[] completion is at or after that passed in via "seq",
 * skipping any empty segments.
+*
+* Note that "i" is the youngest segment of the list after which
+* any older segments than "i" would not be mutated or assigned
+* GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
+* nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
+* merged with NEXT_READY_TAIL and the GP numbers of both of
+* them would be updated.
 */
-   if (++i >= RCU_NEXT_TAIL)
+   if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
return false;
 
/*
-- 
2.27.0.111.gc72c7da667-goog