Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-10 Thread Joel Fernandes
On Wed, Jun 10, 2020 at 04:21:42PM -0700, Paul E. McKenney wrote:
[...]
> > > 8.To softirq 3.  Either GP or CB kthread for the transitioning
> > >   CPU advances to next.
> > >   At this point, the no-CBs setup is fully shut down.
> > > 9.To softirq 4.  Transitioning code advances to next,
> > >   which is the first, "In softirq".
> > >   (This one -might- be unnecessary, but...)
> > > 
> > > All transitions are of course with the ->nocb_lock held.
> > > 
> > > When there is only one CPU during early boot near rcu_init() time,
> > > the transition from "In softirq" to "No-CB" can remain be instantaneous.
> > > 
> > > This has the advantage of not slowing things down just because there
> > > is an RCU callback flood in progress.  It also uses an explicit
> > > protocol that should (give or take bugs) maintain full safety both
> > > in protection of ->cblist and in dealing with RCU callback floods.
> > > 
> > > Thoughts?
> > 
> > Agreed. And I really like that it details the whole process in a very
> > explicit way.
> > 
> > Thanks!
> 
> Glad you like it!  And of course please adjust it as needed, up to and
> including doing something completely different that works better.  ;-)

Makes sense to me too, thanks!

 - Joel



Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-10 Thread Paul E. McKenney
On Thu, Jun 11, 2020 at 12:12:46AM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 10, 2020 at 07:02:10AM -0700, Paul E. McKenney wrote:
> > And just to argue against myself...
> > 
> > Another approach is to maintain explicit multiple states for each
> > ->cblist, perhaps something like this:
> > 
> > 1.  In softirq.  Transition code advances to next.
> > 2.  To no-CB 1.  Either GP or CB kthread for the transitioning
> > CPU advances to next.  Note that the fact that the
> > transition code runs on the transitioning CPU means that
> > the RCU softirq handler doesn't need to be involved.
> > 3.  To no-CB 2.  Either GP or CB kthread for the transitioning
> > CPU advances to next.
> 
> Just to clarify, if GP has set NO_CB2 in (2), we want CB to set NO_CB3
> in 3), right? OTOH if CB has set NO_CB2 in (2), we want GP to set NO_CB3
> in (3), right?
> 
> The point being to make sure that both threads acknowledge the transition?

Exactly!

> > 4.  To no-CB 3.  Transitioning code advances to next.
> > At this point, the no-CBs setup is fully functional.
> 
> And softirq can stop processing callbacks from that point on.

You got it!

> > 5.  No-CB.  Transitioning code advances to next.
> > Again, the fact that the transitioning code is running
> > on the transitioning CPU with interrupts disabled means
> > that the RCU softirq handler need not be explicitly
> > involved.
> > 6.  To softirq 1.  The RCU softirq handler for the transitioning
> > CPU advances to next.
> > At this point, the RCU softirq handler is processing callbacks.
> > 7.  To softirq 2.  Either GP or CB kthread for the transitioning
> > CPU advances to next.
> > At this point, the softirq handler is processing callbacks.
> 
> SOFTIRQ2 should be part of what happens in SOFTIRQ1. The transitioning
> CPU sets SOFTIRQ1, which is immediately visible by local softirqs,
> and wakes up CB/GP. CB/GP sets SOFTIRQ2, CB/GP sets SOFTIRQ3 and
> we go back to transitioning code that sets IN_SOFTIRQ.
> 
> Or did I miss something?

I was perhaps being overly paranoid.  You might well be right.

> > 8.  To softirq 3.  Either GP or CB kthread for the transitioning
> > CPU advances to next.
> > At this point, the no-CBs setup is fully shut down.
> > 9.  To softirq 4.  Transitioning code advances to next,
> > which is the first, "In softirq".
> > (This one -might- be unnecessary, but...)
> > 
> > All transitions are of course with the ->nocb_lock held.
> > 
> > When there is only one CPU during early boot near rcu_init() time,
> > the transition from "In softirq" to "No-CB" can remain be instantaneous.
> > 
> > This has the advantage of not slowing things down just because there
> > is an RCU callback flood in progress.  It also uses an explicit
> > protocol that should (give or take bugs) maintain full safety both
> > in protection of ->cblist and in dealing with RCU callback floods.
> > 
> > Thoughts?
> 
> Agreed. And I really like that it details the whole process in a very
> explicit way.
> 
> Thanks!

Glad you like it!  And of course please adjust it as needed, up to and
including doing something completely different that works better.  ;-)

Thanx, Paul


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-10 Thread Frederic Weisbecker
On Wed, Jun 10, 2020 at 07:02:10AM -0700, Paul E. McKenney wrote:
> And just to argue against myself...
> 
> Another approach is to maintain explicit multiple states for each
> ->cblist, perhaps something like this:
> 
> 1.In softirq.  Transition code advances to next.
> 2.To no-CB 1.  Either GP or CB kthread for the transitioning
>   CPU advances to next.  Note that the fact that the
>   transition code runs on the transitioning CPU means that
>   the RCU softirq handler doesn't need to be involved.
> 3.To no-CB 2.  Either GP or CB kthread for the transitioning
>   CPU advances to next.

Just to clarify, if GP has set NO_CB2 in (2), we want CB to set NO_CB3
in 3), right? OTOH if CB has set NO_CB2 in (2), we want GP to set NO_CB3
in (3), right?

The point being to make sure that both threads acknowledge the transition?

> 4.To no-CB 3.  Transitioning code advances to next.
>   At this point, the no-CBs setup is fully functional.

And softirq can stop processing callbacks from that point on.

> 5.No-CB.  Transitioning code advances to next.
>   Again, the fact that the transitioning code is running
>   on the transitioning CPU with interrupts disabled means
>   that the RCU softirq handler need not be explicitly
>   involved.
> 6.To softirq 1.  The RCU softirq handler for the transitioning
>   CPU advances to next.
>   At this point, the RCU softirq handler is processing callbacks.
> 7.To softirq 2.  Either GP or CB kthread for the transitioning
>   CPU advances to next.
>   At this point, the softirq handler is processing callbacks.

SOFTIRQ2 should be part of what happens in SOFTIRQ1. The transitioning
CPU sets SOFTIRQ1, which is immediately visible by local softirqs,
and wakes up CB/GP. CB/GP sets SOFTIRQ2, CB/GP sets SOFTIRQ3 and
we go back to transitioning code that sets IN_SOFTIRQ.

Or did I miss something?


> 8.To softirq 3.  Either GP or CB kthread for the transitioning
>   CPU advances to next.
>   At this point, the no-CBs setup is fully shut down.
> 9.To softirq 4.  Transitioning code advances to next,
>   which is the first, "In softirq".
>   (This one -might- be unnecessary, but...)
> 
> All transitions are of course with the ->nocb_lock held.
> 
> When there is only one CPU during early boot near rcu_init() time,
> the transition from "In softirq" to "No-CB" can remain be instantaneous.
> 
> This has the advantage of not slowing things down just because there
> is an RCU callback flood in progress.  It also uses an explicit
> protocol that should (give or take bugs) maintain full safety both
> in protection of ->cblist and in dealing with RCU callback floods.
> 
> Thoughts?

Agreed. And I really like that it details the whole process in a very
explicit way.

Thanks!


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-10 Thread Paul E. McKenney
On Wed, Jun 10, 2020 at 03:12:39PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 09, 2020 at 11:02:27AM -0700, Paul E. McKenney wrote:
> > > > > And anyway we still want to unconditionally lock on many places,
> > > > > regardless of the offloaded state. I don't know how we could have
> > > > > a magic helper do the unconditional lock on some places and the
> > > > > conditional on others.
> > > > 
> > > > I was assuming (perhaps incorrectly) that an intermediate phase between
> > > > not-offloaded and offloaded would take care of all of those cases.
> > > 
> > > Perhaps partly but I fear that won't be enough.
> > 
> > One approach is to rely on RCU read-side critical sections surrounding
> > the lock acquisition and to stay in the intermediate phase until a grace
> > period completes, preferably call_rcu() instead of synchronize_rcu().
> > 
> > This of course means refusing to do a transition if the CPU is still
> > in the intermediate state from a prior transition.
> 
> That sounds good. But using synchronize_rcu() would be far easier. We
> need to keep the hotplug and rcu barrier locked during the transition.
> 
> > > Also I've been thinking that rcu_nocb_lock() should meet any of these
> > > requirements:
> > > 
> > > * hotplug is locked
> > > * rcu barrier is locked
> > > * rnp is locked
> > > 
> > > Because checking the offloaded state (when nocb isn't locked yet) of
> > > an rdp without any of the above locks held is racy. And that should
> > > be easy to check and prevent from copy-pasta accidents.
> > > 
> > > What do you think?
> > 
> > An RCU read-side critical section might be simpler.
> 
> Ok I think I can manage that.

And just to argue against myself...

Another approach is to maintain explicit multiple states for each
->cblist, perhaps something like this:

1.  In softirq.  Transition code advances to next.
2.  To no-CB 1.  Either GP or CB kthread for the transitioning
CPU advances to next.  Note that the fact that the
transition code runs on the transitioning CPU means that
the RCU softirq handler doesn't need to be involved.
3.  To no-CB 2.  Either GP or CB kthread for the transitioning
CPU advances to next.
4.  To no-CB 3.  Transitioning code advances to next.
At this point, the no-CBs setup is fully functional.
5.  No-CB.  Transitioning code advances to next.
Again, the fact that the transitioning code is running
on the transitioning CPU with interrupts disabled means
that the RCU softirq handler need not be explicitly
involved.
6.  To softirq 1.  The RCU softirq handler for the transitioning
CPU advances to next.
At this point, the RCU softirq handler is processing callbacks.
7.  To softirq 2.  Either GP or CB kthread for the transitioning
CPU advances to next.
At this point, the softirq handler is processing callbacks.
8.  To softirq 3.  Either GP or CB kthread for the transitioning
CPU advances to next.
At this point, the no-CBs setup is fully shut down.
9.  To softirq 4.  Transitioning code advances to next,
which is the first, "In softirq".
(This one -might- be unnecessary, but...)

All transitions are of course with the ->nocb_lock held.

When there is only one CPU during early boot near rcu_init() time,
the transition from "In softirq" to "No-CB" can remain be instantaneous.

This has the advantage of not slowing things down just because there
is an RCU callback flood in progress.  It also uses an explicit
protocol that should (give or take bugs) maintain full safety both
in protection of ->cblist and in dealing with RCU callback floods.

Thoughts?

Thanx, Paul


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-10 Thread Frederic Weisbecker
On Tue, Jun 09, 2020 at 11:02:27AM -0700, Paul E. McKenney wrote:
> > > > And anyway we still want to unconditionally lock on many places,
> > > > regardless of the offloaded state. I don't know how we could have
> > > > a magic helper do the unconditional lock on some places and the
> > > > conditional on others.
> > > 
> > > I was assuming (perhaps incorrectly) that an intermediate phase between
> > > not-offloaded and offloaded would take care of all of those cases.
> > 
> > Perhaps partly but I fear that won't be enough.
> 
> One approach is to rely on RCU read-side critical sections surrounding
> the lock acquisition and to stay in the intermediate phase until a grace
> period completes, preferably call_rcu() instead of synchronize_rcu().
> 
> This of course means refusing to do a transition if the CPU is still
> in the intermediate state from a prior transition.

That sounds good. But using synchronize_rcu() would be far easier. We
need to keep the hotplug and rcu barrier locked during the transition.

> > Also I've been thinking that rcu_nocb_lock() should meet any of these
> > requirements:
> > 
> > * hotplug is locked
> > * rcu barrier is locked
> > * rnp is locked
> > 
> > Because checking the offloaded state (when nocb isn't locked yet) of
> > an rdp without any of the above locks held is racy. And that should
> > be easy to check and prevent from copy-pasta accidents.
> > 
> > What do you think?
> 
> An RCU read-side critical section might be simpler.

Ok I think I can manage that.

Thanks.


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-09 Thread Paul E. McKenney
On Mon, Jun 08, 2020 at 02:57:17PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 04, 2020 at 09:36:55AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> > > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > > Reviewed-by: Joel Fernandes (Google) 
> > > > 
> > > > Thank you for looking this over, Joel!
> > > > 
> > > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > > right thing", even when things are changing?  If it is feasible, that
> > > > would prevent any number of "interesting" copy-pasta and "just now 
> > > > became
> > > > common code" bugs down the road.
> > > 
> > > This won't be pretty:
> > > 
> > > locked = rcu_nocb_lock();
> > > rcu_nocb_unlock(locked);
> > 
> > I was thinking in terms of a bit in the rcu_data structure into
> > which rcu_nocb_lock() and friends stored the status, and from which
> > rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
> > how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.
> > 
> > As noted, this does require reworking the hotplug code to avoid the
> > current holding of two such locks concurrently, which I am happy to do
> > if that helps.
> > 
> > Or am I missing a subtle (or not-so-subtle) twist here?
> 
> So, during a CPU round, the nocb-gp kthread locks the corresponding rdp-nocb
> and then ignores if it is not offloaded. Surely there is a smarter ways to do
> that though. Well that's something we'll need to optimize at some point 
> anyway.
> 
> Also we must then make sure that the nocb timers won't ever fire while we
> switch to offloaded state or they may fiddle with internals without locking
> nocb.

Understood, the synchronization requirements during the switchover will
be tricky.  But that will be the case no matter how we set it up.
If possible, it would be good if that trickiness could be concentrated
rather than spread far and wide.

> > > And anyway we still want to unconditionally lock on many places,
> > > regardless of the offloaded state. I don't know how we could have
> > > a magic helper do the unconditional lock on some places and the
> > > conditional on others.
> > 
> > I was assuming (perhaps incorrectly) that an intermediate phase between
> > not-offloaded and offloaded would take care of all of those cases.
> 
> Perhaps partly but I fear that won't be enough.

One approach is to rely on RCU read-side critical sections surrounding
the lock acquisition and to stay in the intermediate phase until a grace
period completes, preferably call_rcu() instead of synchronize_rcu().

This of course means refusing to do a transition if the CPU is still
in the intermediate state from a prior transition.

> > > Also the point of turning the lock helpers into primitives is to make
> > > it clearer as to where we really need unconditional locking and where
> > > we allow it to be conditional. A gift to reviewers :-)
> > 
> > Unless and until someone does a copy-pasta, thus unconditionally
> > doing the wrong thing.  ;-)
> 
> Yeah but the cost is more complicated code and synchronization to
> maintain the use of those all-in-one locking APIs everywhere. And
> current state is already not that simple :-)
> 
> Perhaps we should rename rcu_nocb_lock() into rcu_nocb_lock_cond() to
> prevent from accidents?

We do need something to avoid accidents.  ;-)

> Also I've been thinking that rcu_nocb_lock() should meet any of these
> requirements:
> 
> * hotplug is locked
> * rcu barrier is locked
> * rnp is locked
> 
> Because checking the offloaded state (when nocb isn't locked yet) of
> an rdp without any of the above locks held is racy. And that should
> be easy to check and prevent from copy-pasta accidents.
> 
> What do you think?

An RCU read-side critical section might be simpler.

Thanx, Paul


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-08 Thread Frederic Weisbecker
On Thu, Jun 04, 2020 at 09:36:55AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > Reviewed-by: Joel Fernandes (Google) 
> > > 
> > > Thank you for looking this over, Joel!
> > > 
> > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > right thing", even when things are changing?  If it is feasible, that
> > > would prevent any number of "interesting" copy-pasta and "just now became
> > > common code" bugs down the road.
> > 
> > This won't be pretty:
> > 
> > locked = rcu_nocb_lock();
> > rcu_nocb_unlock(locked);
> 
> I was thinking in terms of a bit in the rcu_data structure into
> which rcu_nocb_lock() and friends stored the status, and from which
> rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
> how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.
> 
> As noted, this does require reworking the hotplug code to avoid the
> current holding of two such locks concurrently, which I am happy to do
> if that helps.
> 
> Or am I missing a subtle (or not-so-subtle) twist here?

So, during a CPU round, the nocb-gp kthread locks the corresponding rdp-nocb
and then ignores if it is not offloaded. Surely there is a smarter ways to do
that though. Well that's something we'll need to optimize at some point anyway.

Also we must then make sure that the nocb timers won't ever fire while we
switch to offloaded state or they may fiddle with internals without locking
nocb.

> 
> > And anyway we still want to unconditionally lock on many places,
> > regardless of the offloaded state. I don't know how we could have
> > a magic helper do the unconditional lock on some places and the
> > conditional on others.
> 
> I was assuming (perhaps incorrectly) that an intermediate phase between
> not-offloaded and offloaded would take care of all of those cases.

Perhaps partly but I fear that won't be enough.

> 
> > Also the point of turning the lock helpers into primitives is to make
> > it clearer as to where we really need unconditional locking and where
> > we allow it to be conditional. A gift to reviewers :-)
> 
> Unless and until someone does a copy-pasta, thus unconditionally
> doing the wrong thing.  ;-)

Yeah but the cost is more complicated code and synchronization to
maintain the use of those all-in-one locking APIs everywhere. And
current state is already not that simple :-)

Perhaps we should rename rcu_nocb_lock() into rcu_nocb_lock_cond() to
prevent from accidents?

Also I've been thinking that rcu_nocb_lock() should meet any of these
requirements:

* hotplug is locked
* rcu barrier is locked
* rnp is locked

Because checking the offloaded state (when nocb isn't locked yet) of
an rdp without any of the above locks held is racy. And that should
be easy to check and prevent from copy-pasta accidents.

What do you think?


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-04 Thread Paul E. McKenney
On Thu, Jun 04, 2020 at 01:41:22PM +0200, Frederic Weisbecker wrote:
> On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > Reviewed-by: Joel Fernandes (Google) 
> > 
> > Thank you for looking this over, Joel!
> > 
> > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > right thing", even when things are changing?  If it is feasible, that
> > would prevent any number of "interesting" copy-pasta and "just now became
> > common code" bugs down the road.
> 
> This won't be pretty:
> 
> locked = rcu_nocb_lock();
> rcu_nocb_unlock(locked);

I was thinking in terms of a bit in the rcu_data structure into
which rcu_nocb_lock() and friends stored the status, and from which
rcu_nocb_unlock() and friends retrieved that same status.  Sort of like
how preemptible RCU uses the ->rcu_read_lock_nesting field in task_struct.

As noted, this does require reworking the hotplug code to avoid the
current holding of two such locks concurrently, which I am happy to do
if that helps.

Or am I missing a subtle (or not-so-subtle) twist here?

> And anyway we still want to unconditionally lock on many places,
> regardless of the offloaded state. I don't know how we could have
> a magic helper do the unconditional lock on some places and the
> conditional on others.

I was assuming (perhaps incorrectly) that an intermediate phase between
not-offloaded and offloaded would take care of all of those cases.

> Also the point of turning the lock helpers into primitives is to make
> it clearer as to where we really need unconditional locking and where
> we allow it to be conditional. A gift to reviewers :-)

Unless and until someone does a copy-pasta, thus unconditionally
doing the wrong thing.  ;-)

If we cannot avoid different spellings of ->cblist in different places,
such is life, but I do want to make sure that we have fully considered
the alternatives.

Thanx, Paul


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-06-04 Thread Frederic Weisbecker
On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > Reviewed-by: Joel Fernandes (Google) 
> 
> Thank you for looking this over, Joel!
> 
> Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> right thing", even when things are changing?  If it is feasible, that
> would prevent any number of "interesting" copy-pasta and "just now became
> common code" bugs down the road.

This won't be pretty:

locked = rcu_nocb_lock();
rcu_nocb_unlock(locked);

And anyway we still want to unconditionally lock on many places,
regardless of the offloaded state. I don't know how we could have
a magic helper do the unconditional lock on some places and the
conditional on others.

Also the point of turning the lock helpers into primitives is to make
it clearer as to where we really need unconditional locking and where
we allow it to be conditional. A gift to reviewers :-)

Thanks.


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Paul E. McKenney
On Tue, May 26, 2020 at 08:45:42PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney  wrote:
> >
> > On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > > > BTW, I'm really itching to give it a try to make the scheduler 
> > > > > > > more deadlock
> > > > > > > resilient (that is, if the scheduler wake up path detects a 
> > > > > > > deadlock, then it
> > > > > > > defers the wake up using timers, or irq_work on its own instead 
> > > > > > > of passing
> > > > > > > the burden of doing so to the callers). Thoughts?
> > > > > >
> > > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > > scheduler often has tighter latency constraints than RCU does.  So I
> > > > > > think that is a better question for the scheduler maintainers than 
> > > > > > it
> > > > > > is for me.  ;-)
> > > > >
> > > > > Ok, it definitely keeps coming up in my radar first with the
> > > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it 
> > > > > could also
> > > > > be good for a conference discussion!
> > > >
> > > > Again, please understand that RCU has way looser latency constraints
> > > > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > > > not go over well, especially in the real-time application area.
> > >
> > > Yeah, agreed that the "deadlock detection" code should be pretty light 
> > > weight
> > > if/when it is written.
> >
> > In addition, to even stand a chance, you would need to use hrtimers.
> > The half-jiffy (at a minimum) delay from any other deferral mechanism
> > that I know of would be the kiss of death, especially from the viewpoint
> > of the real-time guys.
> 
> Just to make sure we are talking about the same kind of overhead - the
> deferring is only needed if the rq lock is already held (detected by
> trylocking). So there's no overhead in the common case other than the
> trylock possibly being slightly more expensive than the regular
> locking. Also, once the scheduler defers it, it uses the same kind of
> mechanism that other deferral mechanisms use to overcome this deadlock
> (timers, irq_work etc), so the overhead then would be no different
> than what he have now - the RT users would already have the wake up
> latency in current kernels without this idea implemented. Did I miss
> something?

Aggressive real-time applications care deeply about the uncommon case.

Thanx, Paul

> > > > But what did the scheduler maintainers say about this idea?
> > >
> > > Last I remember when it came up during the rcu_read_unlock_special() 
> > > deadlock
> > > discussions, there's no way to know for infra like RCU to know that it was
> > > invoked from the scheduler.
> > >
> > > The idea I am bringing up now (about the scheduler itself detecting a
> > > recursion) was never brought up (not yet) with the sched maintainers (at
> > > least not by me).
> >
> > It might be good to bounce if off of them sooner rather than later.
> 
> Ok, I did that now over IRC. Thank you!
> 
>  - Joel


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Joel Fernandes
Hi Paul,

On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney  wrote:
>
> On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > > BTW, I'm really itching to give it a try to make the scheduler more 
> > > > > > deadlock
> > > > > > resilient (that is, if the scheduler wake up path detects a 
> > > > > > deadlock, then it
> > > > > > defers the wake up using timers, or irq_work on its own instead of 
> > > > > > passing
> > > > > > the burden of doing so to the callers). Thoughts?
> > > > >
> > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > scheduler often has tighter latency constraints than RCU does.  So I
> > > > > think that is a better question for the scheduler maintainers than it
> > > > > is for me.  ;-)
> > > >
> > > > Ok, it definitely keeps coming up in my radar first with the
> > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could 
> > > > also
> > > > be good for a conference discussion!
> > >
> > > Again, please understand that RCU has way looser latency constraints
> > > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > > not go over well, especially in the real-time application area.
> >
> > Yeah, agreed that the "deadlock detection" code should be pretty light 
> > weight
> > if/when it is written.
>
> In addition, to even stand a chance, you would need to use hrtimers.
> The half-jiffy (at a minimum) delay from any other deferral mechanism
> that I know of would be the kiss of death, especially from the viewpoint
> of the real-time guys.

Just to make sure we are talking about the same kind of overhead - the
deferring is only needed if the rq lock is already held (detected by
trylocking). So there's no overhead in the common case other than the
trylock possibly being slightly more expensive than the regular
locking. Also, once the scheduler defers it, it uses the same kind of
mechanism that other deferral mechanisms use to overcome this deadlock
(timers, irq_work etc), so the overhead then would be no different
than what he have now - the RT users would already have the wake up
latency in current kernels without this idea implemented. Did I miss
something?

> > > But what did the scheduler maintainers say about this idea?
> >
> > Last I remember when it came up during the rcu_read_unlock_special() 
> > deadlock
> > discussions, there's no way to know for infra like RCU to know that it was
> > invoked from the scheduler.
> >
> > The idea I am bringing up now (about the scheduler itself detecting a
> > recursion) was never brought up (not yet) with the sched maintainers (at
> > least not by me).
>
> It might be good to bounce if off of them sooner rather than later.

Ok, I did that now over IRC. Thank you!

 - Joel


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Paul E. McKenney
On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> [...]
> > > > > BTW, I'm really itching to give it a try to make the scheduler more 
> > > > > deadlock
> > > > > resilient (that is, if the scheduler wake up path detects a deadlock, 
> > > > > then it
> > > > > defers the wake up using timers, or irq_work on its own instead of 
> > > > > passing
> > > > > the burden of doing so to the callers). Thoughts?
> > > > 
> > > > I have used similar approaches within RCU, but on the other hand the
> > > > scheduler often has tighter latency constraints than RCU does.  So I
> > > > think that is a better question for the scheduler maintainers than it
> > > > is for me.  ;-)
> > > 
> > > Ok, it definitely keeps coming up in my radar first with the
> > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could 
> > > also
> > > be good for a conference discussion!
> > 
> > Again, please understand that RCU has way looser latency constraints
> > than the scheduler does.  Adding half a jiffy to wakeup latency might
> > not go over well, especially in the real-time application area.
> 
> Yeah, agreed that the "deadlock detection" code should be pretty light weight
> if/when it is written.

In addition, to even stand a chance, you would need to use hrtimers.
The half-jiffy (at a minimum) delay from any other deferral mechanism
that I know of would be the kiss of death, especially from the viewpoint
of the real-time guys.

> > But what did the scheduler maintainers say about this idea?
> 
> Last I remember when it came up during the rcu_read_unlock_special() deadlock
> discussions, there's no way to know for infra like RCU to know that it was
> invoked from the scheduler.
> 
> The idea I am bringing up now (about the scheduler itself detecting a
> recursion) was never brought up (not yet) with the sched maintainers (at
> least not by me).

It might be good to bounce if off of them sooner rather than later.

Thanx, Paul


Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Joel Fernandes
On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
[...]
> > > > BTW, I'm really itching to give it a try to make the scheduler more 
> > > > deadlock
> > > > resilient (that is, if the scheduler wake up path detects a deadlock, 
> > > > then it
> > > > defers the wake up using timers, or irq_work on its own instead of 
> > > > passing
> > > > the burden of doing so to the callers). Thoughts?
> > > 
> > > I have used similar approaches within RCU, but on the other hand the
> > > scheduler often has tighter latency constraints than RCU does.So I
> > > think that is a better question for the scheduler maintainers than it
> > > is for me.  ;-)
> > 
> > Ok, it definitely keeps coming up in my radar first with the
> > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > be good for a conference discussion!
> 
> Again, please understand that RCU has way looser latency constraints
> than the scheduler does.  Adding half a jiffy to wakeup latency might
> not go over well, especially in the real-time application area.

Yeah, agreed that the "deadlock detection" code should be pretty light weight
if/when it is written.

> But what did the scheduler maintainers say about this idea?

Last I remember when it came up during the rcu_read_unlock_special() deadlock
discussions, there's no way to know for infra like RCU to know that it was
invoked from the scheduler.

The idea I am bringing up now (about the scheduler itself detecting a
recursion) was never brought up (not yet) with the sched maintainers (at
least not by me).

thanks,

 - Joel





Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Paul E. McKenney
On Tue, May 26, 2020 at 04:18:40PM -0400, Joel Fernandes wrote:
> On Tue, May 26, 2020 at 09:29:46AM -0700, Paul E. McKenney wrote:
> > On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> > > Hi Paul,
> > > 
> > > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > > > timers) can unconditionally lock rdp->nocb_lock as they always 
> > > > > > execute
> > > > > > in the context of an offloaded rdp.
> > > > > > 
> > > > > > This also prepare for toggling CPUs to/from callback's offloaded 
> > > > > > mode
> > > > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > > > case.
> > > > > 
> > > > > Suggested rewrite for change log:
> > > > > 
> > > > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, 
> > > > > nocb
> > > > > timers) unconditionally lock rdp->nocb_lock as they always execute in 
> > > > > the
> > > > > context of an offloaded rdp.
> > > > > 
> > > > > This prepares for future toggling of CPUs to/from callback's 
> > > > > offloaded mode
> > > > > where the offloaded state can change when rdp->nocb_lock is not held. 
> > > > > We'll
> > > > > still want the entrypoints to lock the rdp in any case.
> > > > > 
> > > > > 
> > > > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > > > do_nocb_deferred_wakeup_common() since that's the only user, and then 
> > > > > delete
> > > > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also 
> > > > > remove
> > > > > confusion about which API to use for nocb locking (i.e. whether to 
> > > > > directly
> > > > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > > > 
> > > > > Reviewed-by: Joel Fernandes (Google) 
> > > > 
> > > > Thank you for looking this over, Joel!
> > > > 
> > > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > > right thing", even when things are changing?
> > > 
> > > One way to prevent things from changing could be to employ Steven's
> > > poor-man's RCU to mediate the "changing state" itself assuming the switch
> > > from the fast-path to the slow-path does not happen often. :) So whichever
> > > path is changing the state needs to wait for that poor-man's GP.
> > 
> > That should not be needed, given that acquiring ->nocb_lock on the CPU
> > corresponding to that lock suffices in both cases.  The trick is making
> > sure that the release matches the acquire.
> 
> Revisiting what you meant by "when things are changing", I'm assuming you
> meant a CPU is dynamically switched from non-offloaded mode to
> offloaded-mode. Please correct me if I'm wrong.

Both.  It does no good to acquire by disabling IRQs and then release by
releasing a lock that you do not hold.  Nor does it help to acquire the
lock and then fail to release it, instead merely re-enabling IRQs.

Aside from the case where two locks are held, a per-CPU variable (as
in another field in the rcu_data structure would work.  And that case
could probably be reworked to hold only one lock at a time.

> Assuming that's true, you asked how do we "do the right thing" in the
> lock/unlock APIs. I was also suggesting getting rid of them and directly
> acquiring/releasing the spinlock, like Frederic does. It sounds like
> that is not good enough and you want an API that can do conditional locking
> (and the said complexity is hidden behind the API). Allow me to read more
> code and see if I can understand that / how to do that.

Indeed, the non-nohz_full code should not acquire the spinlock.

> > > Any case, are you concerned with the performance issues with the
> > > unconditional locking and that's why you suggest still keeping it 
> > > conditional?
> > 
> > My concerns are more about maintainability.
> 
> Ok.
> 
> > > Also, coming back to my point of inline the helper into the last caller -
> > > do_nocb_deferred_wakeup_common(). I think since we don't need to check if 
> > > the
> > > rdp is offloaded and do any conditional locking. The timer can be called 
> > > only
> > > with offloaded rdp. So we can directly do the unconditional spinlock 
> > > instead
> > > of using the rcu_nocb_lock helper there.
> > 
> > Indeed we can.  But should we?
> 
> Yeah may be not, in the event that we could do conditional locking and
> benefit.

Underneath an API, as current mainline does.

> > > > would prevent any number of "interesting" copy-pasta and "just now 
> > > > became
> > > > common code" bugs down the road.  And because irqs are disabled while
> > > > holding the lock, it should be possible to keep state on a per-CPU 
> > > > basis.
> > > 
> > > Agreed, that would be nice. Though if we could keep simple, that'd be nice
> > > too.
> > 
> > 

Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Joel Fernandes
On Tue, May 26, 2020 at 09:29:46AM -0700, Paul E. McKenney wrote:
> On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > > > in the context of an offloaded rdp.
> > > > > 
> > > > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > > case.
> > > > 
> > > > Suggested rewrite for change log:
> > > > 
> > > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > timers) unconditionally lock rdp->nocb_lock as they always execute in 
> > > > the
> > > > context of an offloaded rdp.
> > > > 
> > > > This prepares for future toggling of CPUs to/from callback's offloaded 
> > > > mode
> > > > where the offloaded state can change when rdp->nocb_lock is not held. 
> > > > We'll
> > > > still want the entrypoints to lock the rdp in any case.
> > > > 
> > > > 
> > > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > > do_nocb_deferred_wakeup_common() since that's the only user, and then 
> > > > delete
> > > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also 
> > > > remove
> > > > confusion about which API to use for nocb locking (i.e. whether to 
> > > > directly
> > > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > > 
> > > > Reviewed-by: Joel Fernandes (Google) 
> > > 
> > > Thank you for looking this over, Joel!
> > > 
> > > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > > right thing", even when things are changing?
> > 
> > One way to prevent things from changing could be to employ Steven's
> > poor-man's RCU to mediate the "changing state" itself assuming the switch
> > from the fast-path to the slow-path does not happen often. :) So whichever
> > path is changing the state needs to wait for that poor-man's GP.
> 
> That should not be needed, given that acquiring ->nocb_lock on the CPU
> corresponding to that lock suffices in both cases.  The trick is making
> sure that the release matches the acquire.

Revisiting what you meant by "when things are changing", I'm assuming you
meant a CPU is dynamically switched from non-offloaded mode to
offloaded-mode. Please correct me if I'm wrong.

Assuming that's true, you asked how do we "do the right thing" in the
lock/unlock APIs. I was also suggesting getting rid of them and directly
acquiring/releasing the spinlock, like Frederic does. It sounds like
that is not good enough and you want an API that can do conditional locking
(and the said complexity is hidden behind the API). Allow me to read more
code and see if I can understand that / how to do that.

> > Any case, are you concerned with the performance issues with the
> > unconditional locking and that's why you suggest still keeping it 
> > conditional?
> 
> My concerns are more about maintainability.

Ok.

> > Also, coming back to my point of inline the helper into the last caller -
> > do_nocb_deferred_wakeup_common(). I think since we don't need to check if 
> > the
> > rdp is offloaded and do any conditional locking. The timer can be called 
> > only
> > with offloaded rdp. So we can directly do the unconditional spinlock instead
> > of using the rcu_nocb_lock helper there.
> 
> Indeed we can.  But should we?

Yeah may be not, in the event that we could do conditional locking and
benefit.

> > > would prevent any number of "interesting" copy-pasta and "just now became
> > > common code" bugs down the road.  And because irqs are disabled while
> > > holding the lock, it should be possible to keep state on a per-CPU basis.
> > 
> > Agreed, that would be nice. Though if we could keep simple, that'd be nice
> > too.
> 
> Having one set of functions/macros that are always used to protect
> the ->cblist, no matter what the context, is a very attractive form of
> simple.  ;-)

I was thinking that API is already raw_spin_lock/unlock() but I'll revisit
everything.

> > > The ugliest scenario is callback adoption, where there are two ->cblist
> > > structures in need of being locked.  In that case, changes are excluded
> > > (because that is in CPU hotplug code), but is it possible to take
> > > advantage of that reasonably?
> > 
> > Could you describe this a bit more? Thanks.
> 
> Right now, callbacks are merged directly from the outgoing CPU's ->cblist
> to the surviving CPU's ->cblist.  This means that both ->cblist structures
> must be locked at the same time, which would require additional state.
> After all, if only the one ->cblist were to be 

Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Paul E. McKenney
On Tue, May 26, 2020 at 11:21:37AM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> > On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > > in the context of an offloaded rdp.
> > > > 
> > > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > > where the offloaded state will possibly change when rdp->nocb_lock
> > > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > > case.
> > > 
> > > Suggested rewrite for change log:
> > > 
> > > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > > context of an offloaded rdp.
> > > 
> > > This prepares for future toggling of CPUs to/from callback's offloaded 
> > > mode
> > > where the offloaded state can change when rdp->nocb_lock is not held. 
> > > We'll
> > > still want the entrypoints to lock the rdp in any case.
> > > 
> > > 
> > > Also, can we inline rcu_nocb_lock_irqsave() into
> > > do_nocb_deferred_wakeup_common() since that's the only user, and then 
> > > delete
> > > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also 
> > > remove
> > > confusion about which API to use for nocb locking (i.e. whether to 
> > > directly
> > > acquire lock or call rcu_nocb_lock_irqsave()).
> > > 
> > > Reviewed-by: Joel Fernandes (Google) 
> > 
> > Thank you for looking this over, Joel!
> > 
> > Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> > right thing", even when things are changing?
> 
> One way to prevent things from changing could be to employ Steven's
> poor-man's RCU to mediate the "changing state" itself assuming the switch
> from the fast-path to the slow-path does not happen often. :) So whichever
> path is changing the state needs to wait for that poor-man's GP.

That should not be needed, given that acquiring ->nocb_lock on the CPU
corresponding to that lock suffices in both cases.  The trick is making
sure that the release matches the acquire.

> Any case, are you concerned with the performance issues with the
> unconditional locking and that's why you suggest still keeping it conditional?

My concerns are more about maintainability.

> Also, coming back to my point of inline the helper into the last caller -
> do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
> rdp is offloaded and do any conditional locking. The timer can be called only
> with offloaded rdp. So we can directly do the unconditional spinlock instead
> of using the rcu_nocb_lock helper there.

Indeed we can.  But should we?

> > would prevent any number of "interesting" copy-pasta and "just now became
> > common code" bugs down the road.  And because irqs are disabled while
> > holding the lock, it should be possible to keep state on a per-CPU basis.
> 
> Agreed, that would be nice. Though if we could keep simple, that'd be nice
> too.

Having one set of functions/macros that are always used to protect
the ->cblist, no matter what the context, is a very attractive form of
simple.  ;-)

> > The ugliest scenario is callback adoption, where there are two ->cblist
> > structures in need of being locked.  In that case, changes are excluded
> > (because that is in CPU hotplug code), but is it possible to take
> > advantage of that reasonably?
> 
> Could you describe this a bit more? Thanks.

Right now, callbacks are merged directly from the outgoing CPU's ->cblist
to the surviving CPU's ->cblist.  This means that both ->cblist structures
must be locked at the same time, which would require additional state.
After all, if only the one ->cblist were to be locked at a given time,
a per-CPU variable could be used to track what method should be used to
unlock the ->cblist.

This could be restructured to use an intermediate private ->cblist,
but care would be required with the counts, as it is a very bad idea
for a large group of callbacks to become invisible.  (This is not a
problem for rcu_barrier(), which excludes CPU hotplug, but it could
be a serious problem for callback-flood-mitigation mechanisms.  Yes,
they are heuristic, but...)

> > Maybe these changes are the best we can do, but it would be good to
> > if the same primitive locked a ->cblist regardless of context.
> 
> Here you are comparing 2 primitives. Do you mean that just IRQs being
> disabled is another primitive, and rcu_nocb_lock is another one?

I am not sure what this question means, but I am advocating looking
into retaining a single wrapper that decides instead of direct use of
the underlying primitives.

Or are you instead asking why there are two different methods of
protecting the ->cblist 

Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-26 Thread Joel Fernandes
Hi Paul,

On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > in the context of an offloaded rdp.
> > > 
> > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > where the offloaded state will possibly change when rdp->nocb_lock
> > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > case.
> > 
> > Suggested rewrite for change log:
> > 
> > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > context of an offloaded rdp.
> > 
> > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > still want the entrypoints to lock the rdp in any case.
> > 
> > 
> > Also, can we inline rcu_nocb_lock_irqsave() into
> > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > confusion about which API to use for nocb locking (i.e. whether to directly
> > acquire lock or call rcu_nocb_lock_irqsave()).
> > 
> > Reviewed-by: Joel Fernandes (Google) 
> 
> Thank you for looking this over, Joel!
> 
> Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> right thing", even when things are changing?

One way to prevent things from changing could be to employ Steven's
poor-man's RCU to mediate the "changing state" itself assuming the switch
from the fast-path to the slow-path does not happen often. :) So whichever
path is changing the state needs to wait for that poor-man's GP.

Any case, are you concerned with the performance issues with the
unconditional locking and that's why you suggest still keeping it conditional?

Also, coming back to my point of inline the helper into the last caller -
do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
rdp is offloaded and do any conditional locking. The timer can be called only
with offloaded rdp. So we can directly do the unconditional spinlock instead
of using the rcu_nocb_lock helper there.

> would prevent any number of "interesting" copy-pasta and "just now became
> common code" bugs down the road.  And because irqs are disabled while
> holding the lock, it should be possible to keep state on a per-CPU basis.

Agreed, that would be nice. Though if we could keep simple, that'd be nice
too.

> The ugliest scenario is callback adoption, where there are two ->cblist
> structures in need of being locked.  In that case, changes are excluded
> (because that is in CPU hotplug code), but is it possible to take
> advantage of that reasonably?

Could you describe this a bit more? Thanks.

> Maybe these changes are the best we can do, but it would be good to
> if the same primitive locked a ->cblist regardless of context.

Here you are comparing 2 primitives. Do you mean that just IRQs being
disabled is another primitive, and rcu_nocb_lock is another one?

BTW, I'm really itching to give it a try to make the scheduler more deadlock
resilient (that is, if the scheduler wake up path detects a deadlock, then it
defers the wake up using timers, or irq_work on its own instead of passing
the burden of doing so to the callers). Thoughts?

thanks,

 - Joel


> Can that be made to work reasonably?
> 
>   Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > > 
> > > Signed-off-by: Frederic Weisbecker 
> > > Cc: Paul E. McKenney 
> > > Cc: Josh Triplett 
> > > Cc: Steven Rostedt 
> > > Cc: Mathieu Desnoyers 
> > > Cc: Lai Jiangshan 
> > > Cc: Joel Fernandes 
> > > ---
> > >  kernel/rcu/tree_plugin.h | 14 +++---
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 097635c41135..523570469864 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct 
> > > timer_list *t)
> > >   struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > >  
> > >   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > - rcu_nocb_lock_irqsave(rdp, flags);
> > > + raw_spin_lock_irqsave(>nocb_lock, flags);
> > >   smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > >   __call_rcu_nocb_wake(rdp, true, flags);
> > >  }
> > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >*/
> > >   for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > >   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> 

Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-22 Thread Paul E. McKenney
On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > in the context of an offloaded rdp.
> > 
> > This also prepare for toggling CPUs to/from callback's offloaded mode
> > where the offloaded state will possibly change when rdp->nocb_lock
> > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > case.
> 
> Suggested rewrite for change log:
> 
> Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> timers) unconditionally lock rdp->nocb_lock as they always execute in the
> context of an offloaded rdp.
> 
> This prepares for future toggling of CPUs to/from callback's offloaded mode
> where the offloaded state can change when rdp->nocb_lock is not held. We'll
> still want the entrypoints to lock the rdp in any case.
> 
> 
> Also, can we inline rcu_nocb_lock_irqsave() into
> do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> confusion about which API to use for nocb locking (i.e. whether to directly
> acquire lock or call rcu_nocb_lock_irqsave()).
> 
> Reviewed-by: Joel Fernandes (Google) 

Thank you for looking this over, Joel!

Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
right thing", even when things are changing?  If it is feasible, that
would prevent any number of "interesting" copy-pasta and "just now became
common code" bugs down the road.  And because irqs are disabled while
holding the lock, it should be possible to keep state on a per-CPU basis.

The ugliest scenario is callback adoption, where there are two ->cblist
structures in need of being locked.  In that case, changes are excluded
(because that is in CPU hotplug code), but is it possible to take
advantage of that reasonably?

Maybe these changes are the best we can do, but it would be good to
if the same primitive locked a ->cblist regardless of context.

Can that be made to work reasonably?

Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Paul E. McKenney 
> > Cc: Josh Triplett 
> > Cc: Steven Rostedt 
> > Cc: Mathieu Desnoyers 
> > Cc: Lai Jiangshan 
> > Cc: Joel Fernandes 
> > ---
> >  kernel/rcu/tree_plugin.h | 14 +++---
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 097635c41135..523570469864 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct 
> > timer_list *t)
> > struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> >  
> > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > -   rcu_nocb_lock_irqsave(rdp, flags);
> > +   raw_spin_lock_irqsave(>nocb_lock, flags);
> > smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > __call_rcu_nocb_wake(rdp, true, flags);
> >  }
> > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >  */
> > for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > -   rcu_nocb_lock_irqsave(rdp, flags);
> > +   raw_spin_lock_irqsave(>nocb_lock, flags);
> > bypass_ncbs = rcu_cblist_n_cbs(>nocb_bypass);
> > if (bypass_ncbs &&
> > (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > (void)rcu_nocb_try_flush_bypass(rdp, j);
> > bypass_ncbs = rcu_cblist_n_cbs(>nocb_bypass);
> > } else if (!bypass_ncbs && rcu_segcblist_empty(>cblist)) {
> > -   rcu_nocb_unlock_irqrestore(rdp, flags);
> > +   raw_spin_unlock_irqrestore(>nocb_lock, flags);
> > continue; /* No callbacks here, try next. */
> > }
> > if (bypass_ncbs) {
> > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > } else {
> > needwake = false;
> > }
> > -   rcu_nocb_unlock_irqrestore(rdp, flags);
> > +   raw_spin_unlock_irqrestore(>nocb_lock, flags);
> > if (needwake) {
> > swake_up_one(>nocb_cb_wq);
> > gotcbs = true;
> > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > rcu_do_batch(rdp);
> > local_bh_enable();
> > lockdep_assert_irqs_enabled();
> > -   rcu_nocb_lock_irqsave(rdp, flags);
> > +   raw_spin_lock_irqsave(>nocb_lock, flags);
> > if (rcu_segcblist_nextgp(>cblist, 

Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

2020-05-20 Thread Joel Fernandes
On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> timers) can unconditionally lock rdp->nocb_lock as they always execute
> in the context of an offloaded rdp.
> 
> This also prepare for toggling CPUs to/from callback's offloaded mode
> where the offloaded state will possibly change when rdp->nocb_lock
> isn't taken. We'll still want the entrypoints to lock the rdp in any
> case.

Suggested rewrite for change log:

Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
timers) unconditionally lock rdp->nocb_lock as they always execute in the
context of an offloaded rdp.

This prepares for future toggling of CPUs to/from callback's offloaded mode
where the offloaded state can change when rdp->nocb_lock is not held. We'll
still want the entrypoints to lock the rdp in any case.


Also, can we inline rcu_nocb_lock_irqsave() into
do_nocb_deferred_wakeup_common() since that's the only user, and then delete
rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
confusion about which API to use for nocb locking (i.e. whether to directly
acquire lock or call rcu_nocb_lock_irqsave()).

Reviewed-by: Joel Fernandes (Google) 

thanks,

 - Joel


> 
> Signed-off-by: Frederic Weisbecker 
> Cc: Paul E. McKenney 
> Cc: Josh Triplett 
> Cc: Steven Rostedt 
> Cc: Mathieu Desnoyers 
> Cc: Lai Jiangshan 
> Cc: Joel Fernandes 
> ---
>  kernel/rcu/tree_plugin.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 097635c41135..523570469864 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct 
> timer_list *t)
>   struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
>  
>   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> - rcu_nocb_lock_irqsave(rdp, flags);
> + raw_spin_lock_irqsave(>nocb_lock, flags);
>   smp_mb__after_spinlock(); /* Timer expire before wakeup. */
>   __call_rcu_nocb_wake(rdp, true, flags);
>  }
> @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>*/
>   for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
>   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> - rcu_nocb_lock_irqsave(rdp, flags);
> + raw_spin_lock_irqsave(>nocb_lock, flags);
>   bypass_ncbs = rcu_cblist_n_cbs(>nocb_bypass);
>   if (bypass_ncbs &&
>   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>   (void)rcu_nocb_try_flush_bypass(rdp, j);
>   bypass_ncbs = rcu_cblist_n_cbs(>nocb_bypass);
>   } else if (!bypass_ncbs && rcu_segcblist_empty(>cblist)) {
> - rcu_nocb_unlock_irqrestore(rdp, flags);
> + raw_spin_unlock_irqrestore(>nocb_lock, flags);
>   continue; /* No callbacks here, try next. */
>   }
>   if (bypass_ncbs) {
> @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>   } else {
>   needwake = false;
>   }
> - rcu_nocb_unlock_irqrestore(rdp, flags);
> + raw_spin_unlock_irqrestore(>nocb_lock, flags);
>   if (needwake) {
>   swake_up_one(>nocb_cb_wq);
>   gotcbs = true;
> @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>   rcu_do_batch(rdp);
>   local_bh_enable();
>   lockdep_assert_irqs_enabled();
> - rcu_nocb_lock_irqsave(rdp, flags);
> + raw_spin_lock_irqsave(>nocb_lock, flags);
>   if (rcu_segcblist_nextgp(>cblist, _gp_seq) &&
>   rcu_seq_done(>gp_seq, cur_gp_seq) &&
>   raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>   raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
>   }
>   if (rcu_segcblist_ready_cbs(>cblist)) {
> - rcu_nocb_unlock_irqrestore(rdp, flags);
> + raw_spin_unlock_irqrestore(>nocb_lock, flags);
>   if (needwake_gp)
>   rcu_gp_kthread_wake();
>   return;
> @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  
>   trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
>   WRITE_ONCE(rdp->nocb_cb_sleep, true);
> - rcu_nocb_unlock_irqrestore(rdp, flags);
> + raw_spin_unlock_irqrestore(>nocb_lock, flags);
>   if (needwake_gp)
>   rcu_gp_kthread_wake();
>   swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> -- 
> 2.25.0
>