Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-15 Thread Uladzislau Rezki
On Wed, Jul 15, 2020 at 04:16:22PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-07-15 15:38:08 [+0200], Uladzislau Rezki wrote:
> > > As of -rc3 it should complain about printk() which is why it is still 
> > > disabled by default.
> > >
> > Have you tried to trigger a "complain" you are talking about?
> 
> No, but I is wrong because a raw_spinlock_t is acquired followed by a
> spinlock_t.
>
Right. According to documentation CONFIG_PROVE_RAW_LOCK_NESTING is used
to detect raw_spinlock vs. spinlock nesting usage.

> 
> > I suspect to get some trace dump when CONFIG_PROVE_RAW_LOCK_NESTING=y.
> 
> You should get one if you haven't received any splat earlier (like from
> printk code because it only triggers once).
> 
Got it.

Thanks! 

--
Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-15 Thread Sebastian Andrzej Siewior
On 2020-07-15 15:38:08 [+0200], Uladzislau Rezki wrote:
> > As of -rc3 it should complain about printk() which is why it is still 
> > disabled by default.
> >
> Have you tried to trigger a "complain" you are talking about?

No, but I is wrong because a raw_spinlock_t is acquired followed by a
spinlock_t.

> I suspect to get some trace dump when CONFIG_PROVE_RAW_LOCK_NESTING=y.

You should get one if you haven't received any splat earlier (like from
printk code because it only triggers once).

> Thank you.

Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-15 Thread Uladzislau Rezki
On Tue, Jun 30, 2020 at 06:45:43PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-24 13:12:12 [-0700], paul...@kernel.org wrote:
> > From: "Joel Fernandes (Google)" 
> > 
> > To keep the kfree_rcu() code working in purely atomic sections on RT,
> > such as non-threaded IRQ handlers and raw spinlock sections, avoid
> > calling into the page allocator which uses sleeping locks on RT.
> > 
> > In fact, even if the  caller is preemptible, the kfree_rcu() code is
> > not, as the krcp->lock is a raw spinlock.
> > 
> > Calling into the page allocator is optional and avoiding it should be
> > Ok, especially with the page pre-allocation support in future patches.
> > Such pre-allocation would further avoid the a need for a dynamically
> > allocated page in the first place.
> > 
> > Cc: Sebastian Andrzej Siewior 
> > Reviewed-by: Uladzislau Rezki 
> > Co-developed-by: Uladzislau Rezki 
> > Signed-off-by: Uladzislau Rezki 
> > Signed-off-by: Joel Fernandes (Google) 
> > Signed-off-by: Uladzislau Rezki (Sony) 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  kernel/rcu/tree.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 64592b4..dbdd509 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3184,6 +3184,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> > *krcp,
> > if (!bnode) {
> > WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > 
> > PAGE_SIZE);
> >  
> > +   /*
> > +* To keep this path working on raw non-preemptible
> > +* sections, prevent the optional entry into the
> > +* allocator as it uses sleeping locks. In fact, even
> > +* if the caller of kfree_rcu() is preemptible, this
> > +* path still is not, as krcp->lock is a raw spinlock.
> > +* With additional page pre-allocation in the works,
> > +* hitting this return is going to be much less likely.
> > +*/
> > +   if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > +   return false;
> 
> This is not going to work together with the "wait context validator"
> (CONFIG_PROVE_RAW_LOCK_NESTING).

>
> As of -rc3 it should complain about printk() which is why it is still 
> disabled by default.
>
Have you tried to trigger a "complain" you are talking about?
I suspect to get some trace dump when CONFIG_PROVE_RAW_LOCK_NESTING=y.

Thank you.

--
Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-08 Thread Uladzislau Rezki
On Tue, Jul 07, 2020 at 02:45:31PM -0400, Joel Fernandes wrote:
> On Tue, Jul 07, 2020 at 07:34:41PM +0200, Uladzislau Rezki wrote:
> > On Mon, Jul 06, 2020 at 02:06:45PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jul 02, 2020 at 10:19:08PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2020-07-02 09:48:26 [-0700], Paul E. McKenney wrote:
> > > > > On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior 
> > > > > wrote:
> > > > > > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > > > > > This is not going to work together with the "wait context 
> > > > > > > > validator"
> > > > > > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain 
> > > > > > > > about
> > > > > > > > printk() which is why it is still disabled by default.
> > > > > > > 
> > > > > > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > > > > > warnings rely on the raw spin lock to reduce false positives due
> > > > > > > to race conditions.  Some thought will be required here.
> > > > > > 
> > > > > > I don't get this part. Can you explain/give me an example where to 
> > > > > > look
> > > > > > at?
> > > > > 
> > > > > Starting from the scheduler-clock interrupt's call into RCU,
> > > > > we have rcu_sched_clock_irq() which calls rcu_pending() which
> > > > > calls check_cpu_stall() which calls either print_cpu_stall() or
> > > > > print_other_cpu_stall(), depending on whether the stall is happening 
> > > > > on
> > > > > the current CPU or on some other CPU, respectively.
> > > > > 
> > > > > Both of these last functions acquire the rcu_node structure's raw 
> > > > > ->lock
> > > > > and expect to do printk()s while holding it.
> > > > 
> > > > …
> > > > > Thoughts?
> > > > 
> > > > Okay. So in the RT queue there is a printk() rewrite which fixes this
> > > > kind of things. Upstream the printk() interface is still broken in this
> > > > regard and therefore CONFIG_PROVE_RAW_LOCK_NESTING is disabled.
> > > > [Earlier the workqueue would also trigger a warning but this has been
> > > > fixed as of v5.8-rc1.]
> > > > This was just me explaining why this bad, what debug function would
> > > > report it and why it is not enabled by default.
> > > 
> > > Whew!!!  ;-)
> > > 
> > > > > > > > So assume that this is fixed and enabled then on !PREEMPT_RT it 
> > > > > > > > will
> > > > > > > > complain that you have a raw_spinlock_t acquired (the one from 
> > > > > > > > patch
> > > > > > > > 02/17) and attempt to acquire a spinlock_t in the memory 
> > > > > > > > allocator.
> > > > > > > 
> > > > > > > Given that the slab allocator doesn't acquire any locks until it 
> > > > > > > gets
> > > > > > > a fair way in, wouldn't it make sense to allow a "shallow" 
> > > > > > > allocation
> > > > > > > while a raw spinlock is held?  This would require yet another 
> > > > > > > GFP_ flag,
> > > > > > > but that won't make all that much of a difference in the total.  
> > > > > > > ;-)
> > > > > > 
> > > > > > That would be one way of dealing with. But we could go back to
> > > > > > spinlock_t and keep the memory allocation even for RT as is. I 
> > > > > > don't see
> > > > > > a downside of this. And we would worry about kfree_rcu() from real
> > > > > > IRQ-off region once we get to it.
> > > > > 
> > > > > Once we get to it, your thought would be to do per-CPU queuing of
> > > > > memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
> > > > > up after it?  Or did you have some other trick in mind?
> > > > 
> > > > So for now I would very much like to revert the raw_spinlock_t back to
> > > > the spinlock_t and add a migrate_disable() just avoid the tiny
> > > > possible migration between obtaining the CPU-ptr and acquiring the lock
> > > > (I think Joel was afraid of performance hit).
> > > 
> > > Performance is indeed a concern here.
> > > 
> > > > Should we get to a *real* use case where someone must invoke kfree_rcu()
> > > > from a hard-IRQ-off region then we can think what makes sense. per-CPU
> > > > queues and IRQ-work would be one way of dealing with it.
> > > 
> > > It looks like workqueues can also be used, at least in their current
> > > form.  And timers.
> > > 
> > > Vlad, Joel, thoughts?
> > >
> > Some high level thoughts:
> > 
> > Currently everything is done in workqueue context, it means all freeing
> > happens there. For RT kernel we can invoke a page allocator only for single
> > kfree_rcu() argument(though we skip it). As for double one, it is 
> > impossible,
> > that is why a simple path is used by linking rcu_head among each other for
> > further reclaim in wq context. As of now, for RT, everything is already
> > deferred.
> > 
> > If we revert to spinlock_t then calling of kfree_rcu() from hard IRQ
> > context is broken, even though we think that for RT kernel it will
> > never happen. Therefore i do not see a clear motivation and benefits
> > why we should revert to spinlock_t.
> > 
> > IMHO, if we can avoid of such drawback i 

Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-07 Thread Joel Fernandes
On Tue, Jul 07, 2020 at 07:34:41PM +0200, Uladzislau Rezki wrote:
> On Mon, Jul 06, 2020 at 02:06:45PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 02, 2020 at 10:19:08PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2020-07-02 09:48:26 [-0700], Paul E. McKenney wrote:
> > > > On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior 
> > > > wrote:
> > > > > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > > > > This is not going to work together with the "wait context 
> > > > > > > validator"
> > > > > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain 
> > > > > > > about
> > > > > > > printk() which is why it is still disabled by default.
> > > > > > 
> > > > > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > > > > warnings rely on the raw spin lock to reduce false positives due
> > > > > > to race conditions.  Some thought will be required here.
> > > > > 
> > > > > I don't get this part. Can you explain/give me an example where to 
> > > > > look
> > > > > at?
> > > > 
> > > > Starting from the scheduler-clock interrupt's call into RCU,
> > > > we have rcu_sched_clock_irq() which calls rcu_pending() which
> > > > calls check_cpu_stall() which calls either print_cpu_stall() or
> > > > print_other_cpu_stall(), depending on whether the stall is happening on
> > > > the current CPU or on some other CPU, respectively.
> > > > 
> > > > Both of these last functions acquire the rcu_node structure's raw ->lock
> > > > and expect to do printk()s while holding it.
> > > 
> > > …
> > > > Thoughts?
> > > 
> > > Okay. So in the RT queue there is a printk() rewrite which fixes this
> > > kind of things. Upstream the printk() interface is still broken in this
> > > regard and therefore CONFIG_PROVE_RAW_LOCK_NESTING is disabled.
> > > [Earlier the workqueue would also trigger a warning but this has been
> > > fixed as of v5.8-rc1.]
> > > This was just me explaining why this bad, what debug function would
> > > report it and why it is not enabled by default.
> > 
> > Whew!!!  ;-)
> > 
> > > > > > > So assume that this is fixed and enabled then on !PREEMPT_RT it 
> > > > > > > will
> > > > > > > complain that you have a raw_spinlock_t acquired (the one from 
> > > > > > > patch
> > > > > > > 02/17) and attempt to acquire a spinlock_t in the memory 
> > > > > > > allocator.
> > > > > > 
> > > > > > Given that the slab allocator doesn't acquire any locks until it 
> > > > > > gets
> > > > > > a fair way in, wouldn't it make sense to allow a "shallow" 
> > > > > > allocation
> > > > > > while a raw spinlock is held?  This would require yet another GFP_ 
> > > > > > flag,
> > > > > > but that won't make all that much of a difference in the total.  ;-)
> > > > > 
> > > > > That would be one way of dealing with. But we could go back to
> > > > > spinlock_t and keep the memory allocation even for RT as is. I don't 
> > > > > see
> > > > > a downside of this. And we would worry about kfree_rcu() from real
> > > > > IRQ-off region once we get to it.
> > > > 
> > > > Once we get to it, your thought would be to do per-CPU queuing of
> > > > memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
> > > > up after it?  Or did you have some other trick in mind?
> > > 
> > > So for now I would very much like to revert the raw_spinlock_t back to
> > > the spinlock_t and add a migrate_disable() just avoid the tiny
> > > possible migration between obtaining the CPU-ptr and acquiring the lock
> > > (I think Joel was afraid of performance hit).
> > 
> > Performance is indeed a concern here.
> > 
> > > Should we get to a *real* use case where someone must invoke kfree_rcu()
> > > from a hard-IRQ-off region then we can think what makes sense. per-CPU
> > > queues and IRQ-work would be one way of dealing with it.
> > 
> > It looks like workqueues can also be used, at least in their current
> > form.  And timers.
> > 
> > Vlad, Joel, thoughts?
> >
> Some high level thoughts:
> 
> Currently everything is done in workqueue context, it means all freeing
> happens there. For RT kernel we can invoke a page allocator only for single
> kfree_rcu() argument(though we skip it). As for double one, it is impossible,
> that is why a simple path is used by linking rcu_head among each other for
> further reclaim in wq context. As of now, for RT, everything is already
> deferred.
> 
> If we revert to spinlock_t then calling of kfree_rcu() from hard IRQ
> context is broken, even though we think that for RT kernel it will
> never happen. Therefore i do not see a clear motivation and benefits
> why we should revert to spinlock_t.
> 
> IMHO, if we can avoid of such drawback i would go with that way, i.e.
> i would not like to think what to do with that when it becomes broken.

I am also of Vlad's opinion. It seems to me that doing the migrate_disable()
before dropping the raw spinlock should suffice.

Note that in the future, we may have other users of this path such as
a 

Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-07 Thread Uladzislau Rezki
On Mon, Jul 06, 2020 at 02:06:45PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 02, 2020 at 10:19:08PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2020-07-02 09:48:26 [-0700], Paul E. McKenney wrote:
> > > On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > > > This is not going to work together with the "wait context validator"
> > > > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > > > > printk() which is why it is still disabled by default.
> > > > > 
> > > > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > > > warnings rely on the raw spin lock to reduce false positives due
> > > > > to race conditions.  Some thought will be required here.
> > > > 
> > > > I don't get this part. Can you explain/give me an example where to look
> > > > at?
> > > 
> > > Starting from the scheduler-clock interrupt's call into RCU,
> > > we have rcu_sched_clock_irq() which calls rcu_pending() which
> > > calls check_cpu_stall() which calls either print_cpu_stall() or
> > > print_other_cpu_stall(), depending on whether the stall is happening on
> > > the current CPU or on some other CPU, respectively.
> > > 
> > > Both of these last functions acquire the rcu_node structure's raw ->lock
> > > and expect to do printk()s while holding it.
> > 
> > …
> > > Thoughts?
> > 
> > Okay. So in the RT queue there is a printk() rewrite which fixes this
> > kind of things. Upstream the printk() interface is still broken in this
> > regard and therefore CONFIG_PROVE_RAW_LOCK_NESTING is disabled.
> > [Earlier the workqueue would also trigger a warning but this has been
> > fixed as of v5.8-rc1.]
> > This was just me explaining why this bad, what debug function would
> > report it and why it is not enabled by default.
> 
> Whew!!!  ;-)
> 
> > > > > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > > > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > > > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > > > > 
> > > > > Given that the slab allocator doesn't acquire any locks until it gets
> > > > > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > > > > while a raw spinlock is held?  This would require yet another GFP_ 
> > > > > flag,
> > > > > but that won't make all that much of a difference in the total.  ;-)
> > > > 
> > > > That would be one way of dealing with. But we could go back to
> > > > spinlock_t and keep the memory allocation even for RT as is. I don't see
> > > > a downside of this. And we would worry about kfree_rcu() from real
> > > > IRQ-off region once we get to it.
> > > 
> > > Once we get to it, your thought would be to do per-CPU queuing of
> > > memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
> > > up after it?  Or did you have some other trick in mind?
> > 
> > So for now I would very much like to revert the raw_spinlock_t back to
> > the spinlock_t and add a migrate_disable() just avoid the tiny
> > possible migration between obtaining the CPU-ptr and acquiring the lock
> > (I think Joel was afraid of performance hit).
> 
> Performance is indeed a concern here.
> 
> > Should we get to a *real* use case where someone must invoke kfree_rcu()
> > from a hard-IRQ-off region then we can think what makes sense. per-CPU
> > queues and IRQ-work would be one way of dealing with it.
> 
> It looks like workqueues can also be used, at least in their current
> form.  And timers.
> 
> Vlad, Joel, thoughts?
>
Some high level thoughts:

Currently everything is done in workqueue context, it means all freeing
happens there. For RT kernel we can invoke a page allocator only for single
kfree_rcu() argument(though we skip it). As for double one, it is impossible,
that is why a simple path is used by linking rcu_head among each other for
further reclaim in wq context. As of now, for RT, everything is already
deferred.

If we revert to spinlock_t then calling of kfree_rcu() from hard IRQ
context is broken, even though we think that for RT kernel it will
never happen. Therefore i do not see a clear motivation and benefits
why we should revert to spinlock_t.

IMHO, if we can avoid of such drawback i would go with that way, i.e.
i would not like to think what to do with that when it becomes broken.

Thanks!

--
Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-07 Thread Sebastian Andrzej Siewior
On 2020-07-06 16:29:49 [-0400], Joel Fernandes wrote:
> Anyway, I think as we discussed on IRC, your proposal will work for both
> kernels while allowing us to keep the internal lock as raw.

Why can't we use the migrate_disable() and spinlock_t?

> thanks,
> 
>  - Joel

Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-06 Thread Paul E. McKenney
On Thu, Jul 02, 2020 at 10:19:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-07-02 09:48:26 [-0700], Paul E. McKenney wrote:
> > On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > > This is not going to work together with the "wait context validator"
> > > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > > > printk() which is why it is still disabled by default.
> > > > 
> > > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > > warnings rely on the raw spin lock to reduce false positives due
> > > > to race conditions.  Some thought will be required here.
> > > 
> > > I don't get this part. Can you explain/give me an example where to look
> > > at?
> > 
> > Starting from the scheduler-clock interrupt's call into RCU,
> > we have rcu_sched_clock_irq() which calls rcu_pending() which
> > calls check_cpu_stall() which calls either print_cpu_stall() or
> > print_other_cpu_stall(), depending on whether the stall is happening on
> > the current CPU or on some other CPU, respectively.
> > 
> > Both of these last functions acquire the rcu_node structure's raw ->lock
> > and expect to do printk()s while holding it.
> 
> …
> > Thoughts?
> 
> Okay. So in the RT queue there is a printk() rewrite which fixes this
> kind of things. Upstream the printk() interface is still broken in this
> regard and therefore CONFIG_PROVE_RAW_LOCK_NESTING is disabled.
> [Earlier the workqueue would also trigger a warning but this has been
> fixed as of v5.8-rc1.]
> This was just me explaining why this bad, what debug function would
> report it and why it is not enabled by default.

Whew!!!  ;-)

> > > > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > > > 
> > > > Given that the slab allocator doesn't acquire any locks until it gets
> > > > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > > > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > > > but that won't make all that much of a difference in the total.  ;-)
> > > 
> > > That would be one way of dealing with. But we could go back to
> > > spinlock_t and keep the memory allocation even for RT as is. I don't see
> > > a downside of this. And we would worry about kfree_rcu() from real
> > > IRQ-off region once we get to it.
> > 
> > Once we get to it, your thought would be to do per-CPU queuing of
> > memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
> > up after it?  Or did you have some other trick in mind?
> 
> So for now I would very much like to revert the raw_spinlock_t back to
> the spinlock_t and add a migrate_disable() just avoid the tiny
> possible migration between obtaining the CPU-ptr and acquiring the lock
> (I think Joel was afraid of performance hit).

Performance is indeed a concern here.

> Should we get to a *real* use case where someone must invoke kfree_rcu()
> from a hard-IRQ-off region then we can think what makes sense. per-CPU
> queues and IRQ-work would be one way of dealing with it.

It looks like workqueues can also be used, at least in their current
form.  And timers.

Vlad, Joel, thoughts?

Thanx, Paul


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-06 Thread Joel Fernandes
On Mon, Jul 06, 2020 at 09:55:57PM +0200, Uladzislau Rezki wrote:
[...]
> > > Another way of fixing it is just dropping the lock letting the page
> > > allocator to do an allocation without our "upper/local" lock. I did a
> > > proposal like that once upon a time, so maybe it is a time to highlight
> > > it again:
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 21c2fa5bd8c3..249f10a89bb9 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3278,9 +3278,11 @@ static void kfree_rcu_monitor(struct work_struct 
> > > *work)
> > >  }
> > > 
> > >  static inline bool
> > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > +   void *ptr, unsigned long *flags)
> > >  {
> > > struct kvfree_rcu_bulk_data *bnode;
> > > +   struct kfree_rcu_cpu *tmp;
> > > int idx;
> > > 
> > > if (unlikely(!krcp->initialized))
> > > @@ -3306,6 +3308,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct 
> > > kfree_rcu_cpu *krcp, void *ptr)
> > > if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > > return false;
> > > 
> > > +   migrate_disable();
> > > +   krc_this_cpu_unlock(krcp, *flags);
> > 
> > If I remember, the issue here is that migrate_disable is not implemented on 
> > a
> > non-RT kernel due to issues with starvation.
> > 
> It is implemented. Please have a look linux/preempt.h for regular kernel.

Yeah sorry, I meant it is not implemented in the right way in the sense - it
disables migration with the preempt-disable hammer.

Anyway, I think as we discussed on IRC, your proposal will work for both
kernels while allowing us to keep the internal lock as raw.

thanks,

 - Joel



Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-06 Thread Uladzislau Rezki
On Mon, Jul 06, 2020 at 03:42:32PM -0400, Joel Fernandes wrote:
> On Thu, Jul 02, 2020 at 09:45:06PM +0200, Uladzislau Rezki wrote:
> > On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > > This is not going to work together with the "wait context validator"
> > > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > > > printk() which is why it is still disabled by default.
> > > > 
> > > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > > warnings rely on the raw spin lock to reduce false positives due
> > > > to race conditions.  Some thought will be required here.
> > > 
> > > I don't get this part. Can you explain/give me an example where to look
> > > at?
> > > 
> > > > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > > > 
> > > > Given that the slab allocator doesn't acquire any locks until it gets
> > > > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > > > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > > > but that won't make all that much of a difference in the total.  ;-)
> > > 
> > > That would be one way of dealing with. But we could go back to
> > > spinlock_t and keep the memory allocation even for RT as is. I don't see
> > > a downside of this. And we would worry about kfree_rcu() from real
> > > IRQ-off region once we get to it.
> > > 
> 
> Sorry for my late reply as the day job and family demanded a lot last week...
> 
> > Another way of fixing it is just dropping the lock letting the page
> > allocator to do an allocation without our "upper/local" lock. I did a
> > proposal like that once upon a time, so maybe it is a time to highlight
> > it again:
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 21c2fa5bd8c3..249f10a89bb9 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3278,9 +3278,11 @@ static void kfree_rcu_monitor(struct work_struct 
> > *work)
> >  }
> > 
> >  static inline bool
> > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > +   void *ptr, unsigned long *flags)
> >  {
> > struct kvfree_rcu_bulk_data *bnode;
> > +   struct kfree_rcu_cpu *tmp;
> > int idx;
> > 
> > if (unlikely(!krcp->initialized))
> > @@ -3306,6 +3308,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> > *krcp, void *ptr)
> > if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > return false;
> > 
> > +   migrate_disable();
> > +   krc_this_cpu_unlock(krcp, *flags);
> 
> If I remember, the issue here is that migrate_disable is not implemented on a
> non-RT kernel due to issues with starvation.
> 
It is implemented. Please have a look linux/preempt.h for regular kernel.

--
Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-06 Thread Joel Fernandes
On Thu, Jul 02, 2020 at 09:45:06PM +0200, Uladzislau Rezki wrote:
> On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > This is not going to work together with the "wait context validator"
> > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > > printk() which is why it is still disabled by default.
> > > 
> > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > warnings rely on the raw spin lock to reduce false positives due
> > > to race conditions.  Some thought will be required here.
> > 
> > I don't get this part. Can you explain/give me an example where to look
> > at?
> > 
> > > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > > 
> > > Given that the slab allocator doesn't acquire any locks until it gets
> > > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > > but that won't make all that much of a difference in the total.  ;-)
> > 
> > That would be one way of dealing with. But we could go back to
> > spinlock_t and keep the memory allocation even for RT as is. I don't see
> > a downside of this. And we would worry about kfree_rcu() from real
> > IRQ-off region once we get to it.
> > 

Sorry for my late reply as the day job and family demanded a lot last week...

> Another way of fixing it is just dropping the lock letting the page
> allocator to do an allocation without our "upper/local" lock. I did a
> proposal like that once upon a time, so maybe it is a time to highlight
> it again:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 21c2fa5bd8c3..249f10a89bb9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3278,9 +3278,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  }
> 
>  static inline bool
> -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> +   void *ptr, unsigned long *flags)
>  {
> struct kvfree_rcu_bulk_data *bnode;
> +   struct kfree_rcu_cpu *tmp;
> int idx;
> 
> if (unlikely(!krcp->initialized))
> @@ -3306,6 +3308,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> *krcp, void *ptr)
> if (IS_ENABLED(CONFIG_PREEMPT_RT))
> return false;
> 
> +   migrate_disable();
> +   krc_this_cpu_unlock(krcp, *flags);

If I remember, the issue here is that migrate_disable is not implemented on a
non-RT kernel due to issues with starvation.

What about using the mempools. If we can allocate from that without entry
into the page allocator, then that would be one way. Not sure if there are
other issues such as the mempool allocation itself acquiring regular spinlocks.

We could also just do what Sebastian is suggesting, which is revert to
regular spinlocks and allow this code to sleep while worrying about
atomic callers later. For RT, could such atomic callers perhaps do their
allocations differently such as allocating in advance or later on in process
context?

thanks,

 - Joel


> +
> /*
>  * NOTE: For one argument of kvfree_rcu() we can
>  * drop the lock and get the page in sleepable
> @@ -3315,6 +3320,12 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> *krcp, void *ptr)
>  */
> bnode = (struct kvfree_rcu_bulk_data *)
> __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> +
> +   tmp = krc_this_cpu_lock(flags);
> +   migrate_enable();
> +
> +   /* Sanity check, just in case. */
> +   WARN_ON(tmp != krcp);
> }
>  
> /* Switch to emergency path. */
> @@ -3386,7 +3397,7 @@ void kvfree_call_rcu(struct rcu_head *head, 
> rcu_callback_t func)
>  * Under high memory pressure GFP_NOWAIT can fail,
>  * in that case the emergency path is maintained.
>  */
> -   success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> +   success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr, );
> if (!success) {
> if (head == NULL)
> // Inline if kvfree_rcu(one_arg) call.
> 
> 
> --
> Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-02 Thread Sebastian Andrzej Siewior
On 2020-07-02 09:48:26 [-0700], Paul E. McKenney wrote:
> On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > > This is not going to work together with the "wait context validator"
> > > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > > printk() which is why it is still disabled by default.
> > > 
> > > Fixing that should be "interesting".  In particular, RCU CPU stall
> > > warnings rely on the raw spin lock to reduce false positives due
> > > to race conditions.  Some thought will be required here.
> > 
> > I don't get this part. Can you explain/give me an example where to look
> > at?
> 
> Starting from the scheduler-clock interrupt's call into RCU,
> we have rcu_sched_clock_irq() which calls rcu_pending() which
> calls check_cpu_stall() which calls either print_cpu_stall() or
> print_other_cpu_stall(), depending on whether the stall is happening on
> the current CPU or on some other CPU, respectively.
> 
> Both of these last functions acquire the rcu_node structure's raw ->lock
> and expect to do printk()s while holding it.

…
> Thoughts?

Okay. So in the RT queue there is a printk() rewrite which fixes this
kind of things. Upstream the printk() interface is still broken in this
regard and therefore CONFIG_PROVE_RAW_LOCK_NESTING is disabled.
[Earlier the workqueue would also trigger a warning but this has been
fixed as of v5.8-rc1.]
This was just me explaining why this bad, what debug function would
report it and why it is not enabled by default.

> > > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > > 
> > > Given that the slab allocator doesn't acquire any locks until it gets
> > > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > > but that won't make all that much of a difference in the total.  ;-)
> > 
> > That would be one way of dealing with. But we could go back to
> > spinlock_t and keep the memory allocation even for RT as is. I don't see
> > a downside of this. And we would worry about kfree_rcu() from real
> > IRQ-off region once we get to it.
> 
> Once we get to it, your thought would be to do per-CPU queuing of
> memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
> up after it?  Or did you have some other trick in mind?

So for now I would very much like to revert the raw_spinlock_t back to
the spinlock_t and add a migrate_disable() just avoid the tiny
possible migration between obtaining the CPU-ptr and acquiring the lock
(I think Joel was afraid of performance hit).

Should we get to a *real* use case where someone must invoke kfree_rcu()
from a hard-IRQ-off region then we can think what makes sense. per-CPU
queues and IRQ-work would be one way of dealing with it.

>   Thanx, Paul
> 
Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-02 Thread Uladzislau Rezki
On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > This is not going to work together with the "wait context validator"
> > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > printk() which is why it is still disabled by default.
> > 
> > Fixing that should be "interesting".  In particular, RCU CPU stall
> > warnings rely on the raw spin lock to reduce false positives due
> > to race conditions.  Some thought will be required here.
> 
> I don't get this part. Can you explain/give me an example where to look
> at?
> 
> > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > 
> > Given that the slab allocator doesn't acquire any locks until it gets
> > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > but that won't make all that much of a difference in the total.  ;-)
> 
> That would be one way of dealing with. But we could go back to
> spinlock_t and keep the memory allocation even for RT as is. I don't see
> a downside of this. And we would worry about kfree_rcu() from real
> IRQ-off region once we get to it.
> 
Another way of fixing it is just dropping the lock letting the page
allocator to do an allocation without our "upper/local" lock. I did a
proposal like that once upon a time, so maybe it is a time to highlight
it again:


diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 21c2fa5bd8c3..249f10a89bb9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3278,9 +3278,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
 }

 static inline bool
-kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
+kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
+   void *ptr, unsigned long *flags)
 {
struct kvfree_rcu_bulk_data *bnode;
+   struct kfree_rcu_cpu *tmp;
int idx;

if (unlikely(!krcp->initialized))
@@ -3306,6 +3308,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
*krcp, void *ptr)
if (IS_ENABLED(CONFIG_PREEMPT_RT))
return false;

+   migrate_disable();
+   krc_this_cpu_unlock(krcp, *flags);
+
/*
 * NOTE: For one argument of kvfree_rcu() we can
 * drop the lock and get the page in sleepable
@@ -3315,6 +3320,12 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
*krcp, void *ptr)
 */
bnode = (struct kvfree_rcu_bulk_data *)
__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+
+   tmp = krc_this_cpu_lock(flags);
+   migrate_enable();
+
+   /* Sanity check, just in case. */
+   WARN_ON(tmp != krcp);
}
 
/* Switch to emergency path. */
@@ -3386,7 +3397,7 @@ void kvfree_call_rcu(struct rcu_head *head, 
rcu_callback_t func)
 * Under high memory pressure GFP_NOWAIT can fail,
 * in that case the emergency path is maintained.
 */
-   success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+   success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr, );
if (!success) {
if (head == NULL)
// Inline if kvfree_rcu(one_arg) call.


--
Vlad Rezki


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-02 Thread Paul E. McKenney
On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > This is not going to work together with the "wait context validator"
> > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > printk() which is why it is still disabled by default.
> > 
> > Fixing that should be "interesting".  In particular, RCU CPU stall
> > warnings rely on the raw spin lock to reduce false positives due
> > to race conditions.  Some thought will be required here.
> 
> I don't get this part. Can you explain/give me an example where to look
> at?

Starting from the scheduler-clock interrupt's call into RCU,
we have rcu_sched_clock_irq() which calls rcu_pending() which
calls check_cpu_stall() which calls either print_cpu_stall() or
print_other_cpu_stall(), depending on whether the stall is happening on
the current CPU or on some other CPU, respectively.

Both of these last functions acquire the rcu_node structure's raw ->lock
and expect to do printk()s while holding it.

Not that the lock matters all that much given that all of this code is
in hardirq context.  Which is necessary if RCU CPU stall warnings are
to be at all reliable.  Yeah, yeah, they would be even more reliable if
invoked in NMI context but that would prevent them from acquiring the
->lock that is used to resolve races between multiple concurrent stall
warnings and between a stall warning and the end of that stall.  ;-)

I could imagine accumulating the data in hardirq context and then
using a workqueue or some such to print it all out, but that
requires that workqueues be fully operational for RCU CPU stall
warnings to appear.  :-(

Thoughts?

> > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> > 
> > Given that the slab allocator doesn't acquire any locks until it gets
> > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > while a raw spinlock is held?  This would require yet another GFP_ flag,
> > but that won't make all that much of a difference in the total.  ;-)
> 
> That would be one way of dealing with. But we could go back to
> spinlock_t and keep the memory allocation even for RT as is. I don't see
> a downside of this. And we would worry about kfree_rcu() from real
> IRQ-off region once we get to it.

Once we get to it, your thought would be to do per-CPU queuing of
memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
up after it?  Or did you have some other trick in mind?

Thanx, Paul

> > > > bnode = (struct kfree_rcu_bulk_data *)
> > > > __get_free_page(GFP_NOWAIT | 
> > > > __GFP_NOWARN);
> > > > }
> 
> Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-07-02 Thread Sebastian Andrzej Siewior
On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > This is not going to work together with the "wait context validator"
> > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > printk() which is why it is still disabled by default.
> 
> Fixing that should be "interesting".  In particular, RCU CPU stall
> warnings rely on the raw spin lock to reduce false positives due
> to race conditions.  Some thought will be required here.

I don't get this part. Can you explain/give me an example where to look
at?

> > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > complain that you have a raw_spinlock_t acquired (the one from patch
> > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> 
> Given that the slab allocator doesn't acquire any locks until it gets
> a fair way in, wouldn't it make sense to allow a "shallow" allocation
> while a raw spinlock is held?  This would require yet another GFP_ flag,
> but that won't make all that much of a difference in the total.  ;-)

That would be one way of dealing with. But we could go back to
spinlock_t and keep the memory allocation even for RT as is. I don't see
a downside of this. And we would worry about kfree_rcu() from real
IRQ-off region once we get to it.

>   Thanx, Paul
> 
> > >   bnode = (struct kfree_rcu_bulk_data *)
> > >   __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > >   }

Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-06-30 Thread Paul E. McKenney
On Tue, Jun 30, 2020 at 06:45:43PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-24 13:12:12 [-0700], paul...@kernel.org wrote:
> > From: "Joel Fernandes (Google)" 
> > 
> > To keep the kfree_rcu() code working in purely atomic sections on RT,
> > such as non-threaded IRQ handlers and raw spinlock sections, avoid
> > calling into the page allocator which uses sleeping locks on RT.
> > 
> > In fact, even if the  caller is preemptible, the kfree_rcu() code is
> > not, as the krcp->lock is a raw spinlock.
> > 
> > Calling into the page allocator is optional and avoiding it should be
> > Ok, especially with the page pre-allocation support in future patches.
> > Such pre-allocation would further avoid the a need for a dynamically
> > allocated page in the first place.
> > 
> > Cc: Sebastian Andrzej Siewior 
> > Reviewed-by: Uladzislau Rezki 
> > Co-developed-by: Uladzislau Rezki 
> > Signed-off-by: Uladzislau Rezki 
> > Signed-off-by: Joel Fernandes (Google) 
> > Signed-off-by: Uladzislau Rezki (Sony) 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  kernel/rcu/tree.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 64592b4..dbdd509 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3184,6 +3184,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> > *krcp,
> > if (!bnode) {
> > WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > 
> > PAGE_SIZE);
> >  
> > +   /*
> > +* To keep this path working on raw non-preemptible
> > +* sections, prevent the optional entry into the
> > +* allocator as it uses sleeping locks. In fact, even
> > +* if the caller of kfree_rcu() is preemptible, this
> > +* path still is not, as krcp->lock is a raw spinlock.
> > +* With additional page pre-allocation in the works,
> > +* hitting this return is going to be much less likely.
> > +*/
> > +   if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > +   return false;
> 
> This is not going to work together with the "wait context validator"
> (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> printk() which is why it is still disabled by default.

Fixing that should be "interesting".  In particular, RCU CPU stall
warnings rely on the raw spin lock to reduce false positives due
to race conditions.  Some thought will be required here.

> So assume that this is fixed and enabled then on !PREEMPT_RT it will
> complain that you have a raw_spinlock_t acquired (the one from patch
> 02/17) and attempt to acquire a spinlock_t in the memory allocator.

Given that the slab allocator doesn't acquire any locks until it gets
a fair way in, wouldn't it make sense to allow a "shallow" allocation
while a raw spinlock is held?  This would require yet another GFP_ flag,
but that won't make all that much of a difference in the total.  ;-)

Thanx, Paul

> > bnode = (struct kfree_rcu_bulk_data *)
> > __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > }
> 
> Sebastian


Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

2020-06-30 Thread Sebastian Andrzej Siewior
On 2020-06-24 13:12:12 [-0700], paul...@kernel.org wrote:
> From: "Joel Fernandes (Google)" 
> 
> To keep the kfree_rcu() code working in purely atomic sections on RT,
> such as non-threaded IRQ handlers and raw spinlock sections, avoid
> calling into the page allocator which uses sleeping locks on RT.
> 
> In fact, even if the  caller is preemptible, the kfree_rcu() code is
> not, as the krcp->lock is a raw spinlock.
> 
> Calling into the page allocator is optional and avoiding it should be
> Ok, especially with the page pre-allocation support in future patches.
> Such pre-allocation would further avoid the a need for a dynamically
> allocated page in the first place.
> 
> Cc: Sebastian Andrzej Siewior 
> Reviewed-by: Uladzislau Rezki 
> Co-developed-by: Uladzislau Rezki 
> Signed-off-by: Uladzislau Rezki 
> Signed-off-by: Joel Fernandes (Google) 
> Signed-off-by: Uladzislau Rezki (Sony) 
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcu/tree.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 64592b4..dbdd509 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3184,6 +3184,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu 
> *krcp,
>   if (!bnode) {
>   WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > 
> PAGE_SIZE);
>  
> + /*
> +  * To keep this path working on raw non-preemptible
> +  * sections, prevent the optional entry into the
> +  * allocator as it uses sleeping locks. In fact, even
> +  * if the caller of kfree_rcu() is preemptible, this
> +  * path still is not, as krcp->lock is a raw spinlock.
> +  * With additional page pre-allocation in the works,
> +  * hitting this return is going to be much less likely.
> +  */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> + return false;

This is not going to work together with the "wait context validator"
(CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
printk() which is why it is still disabled by default.

So assume that this is fixed and enabled then on !PREEMPT_RT it will
complain that you have a raw_spinlock_t acquired (the one from patch
02/17) and attempt to acquire a spinlock_t in the memory allocator.

>   bnode = (struct kfree_rcu_bulk_data *)
>   __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
>   }

Sebastian