Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-29 Thread Frederic Weisbecker
On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
> 
> The patch below should be the proper cure.
> 
> Thanks,
> 
>   tglx
> 
> 8<---
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner 
> Date: Fri, 22 Dec 2017 15:51:13 +0100
> 
> From: Thomas Gleixner 
> 
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
> 
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat
> 
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Frederic Weisbecker 
> Cc: Sebastian Siewior 
> Cc: sta...@vger.kernel.org
> Cc: Paul McKenney 
> Cc: Anna-Maria Gleixner 
> 
> ---
>  kernel/time/tick-sched.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
>   ts->next_tick = 0;
>  }
>  
> +static inline bool local_timer_softirq_pending(void)
> +{
> + return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
>  static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>ktime_t now, int cpu)
>  {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
>   } while (read_seqretry(_lock, seq));
>   ts->last_jiffies = basejiff;
>  
> - if (rcu_needs_cpu(basemono, _rcu) ||
> - arch_needs_cpu() || irq_work_needs_cpu()) {
> + if (rcu_needs_cpu(basemono, _rcu) || arch_needs_cpu() ||
> + irq_work_needs_cpu() || local_timer_softirq_pending()) {

Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().

Thanks.

Acked-by: Frederic Weisbecker 


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-29 Thread Frederic Weisbecker
On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
> 
> The patch below should be the proper cure.
> 
> Thanks,
> 
>   tglx
> 
> 8<---
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner 
> Date: Fri, 22 Dec 2017 15:51:13 +0100
> 
> From: Thomas Gleixner 
> 
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
> 
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat
> 
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Frederic Weisbecker 
> Cc: Sebastian Siewior 
> Cc: sta...@vger.kernel.org
> Cc: Paul McKenney 
> Cc: Anna-Maria Gleixner 
> 
> ---
>  kernel/time/tick-sched.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
>   ts->next_tick = 0;
>  }
>  
> +static inline bool local_timer_softirq_pending(void)
> +{
> + return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
>  static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>ktime_t now, int cpu)
>  {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
>   } while (read_seqretry(_lock, seq));
>   ts->last_jiffies = basejiff;
>  
> - if (rcu_needs_cpu(basemono, _rcu) ||
> - arch_needs_cpu() || irq_work_needs_cpu()) {
> + if (rcu_needs_cpu(basemono, _rcu) || arch_needs_cpu() ||
> + irq_work_needs_cpu() || local_timer_softirq_pending()) {

Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().

Thanks.

Acked-by: Frederic Weisbecker 


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> Bah, no. We need to move that into the nohz logic somehow to prevent that
> repetitive expiry yesterday reprogramming. Lemme think about it some more.

The patch below should be the proper cure.

Thanks,

tglx

8<---
Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
From: Thomas Gleixner 
Date: Fri, 22 Dec 2017 15:51:13 +0100

From: Thomas Gleixner 

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

  if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: Sebastian Siewior 
Cc: sta...@vger.kernel.org
Cc: Paul McKenney 
Cc: Anna-Maria Gleixner 

---
 kernel/time/tick-sched.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
ts->next_tick = 0;
 }
 
+static inline bool local_timer_softirq_pending(void)
+{
+   return local_softirq_pending & TIMER_SOFTIRQ;
+}
+
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 ktime_t now, int cpu)
 {
@@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
} while (read_seqretry(_lock, seq));
ts->last_jiffies = basejiff;
 
-   if (rcu_needs_cpu(basemono, _rcu) ||
-   arch_needs_cpu() || irq_work_needs_cpu()) {
+   if (rcu_needs_cpu(basemono, _rcu) || arch_needs_cpu() ||
+   irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> Bah, no. We need to move that into the nohz logic somehow to prevent that
> repetitive expiry yesterday reprogramming. Lemme think about it some more.

The patch below should be the proper cure.

Thanks,

tglx

8<---
Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
From: Thomas Gleixner 
Date: Fri, 22 Dec 2017 15:51:13 +0100

From: Thomas Gleixner 

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

  if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: Sebastian Siewior 
Cc: sta...@vger.kernel.org
Cc: Paul McKenney 
Cc: Anna-Maria Gleixner 

---
 kernel/time/tick-sched.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
ts->next_tick = 0;
 }
 
+static inline bool local_timer_softirq_pending(void)
+{
+   return local_softirq_pending & TIMER_SOFTIRQ;
+}
+
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 ktime_t now, int cpu)
 {
@@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
} while (read_seqretry(_lock, seq));
ts->last_jiffies = basejiff;
 
-   if (rcu_needs_cpu(basemono, _rcu) ||
-   arch_needs_cpu() || irq_work_needs_cpu()) {
+   if (rcu_needs_cpu(basemono, _rcu) || arch_needs_cpu() ||
+   irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> On Tue, 26 Dec 2017, Frederic Weisbecker wrote:
> 
> > On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> > > 
> > >   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> > > 
> > > This is too permissive in various aspects:
> > > 
> > >   1) If need_resched() is set, then the tick cannot be stopped whether
> > >  the CPU is idle or in nohz full mode.
> > 
> > That's not exactly true. In nohz full mode the tick is not restarted on the
> > switch from idle to a single task. And if an idle interrupt wakes up a
> > single task and enqueues a timer, we want that timer to be programmed even
> > though we have need_resched().
> 
> Hrmm, so the check for softirq_pending() should be sufficient, right?
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
>   int cpu = smp_processor_id();
>  
>   /* Make sure that timer wheel updates are propagated */
> - if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> + if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
> + if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
> + !local_softirq_pending()) {
>   if (!in_interrupt())
>   tick_nohz_irq_exit();
>   }

Bah, no. We need to move that into the nohz logic somehow to prevent that
repetitive expiry yesterday reprogramming. Lemme think about it some more.

Thanks,

tglx


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> On Tue, 26 Dec 2017, Frederic Weisbecker wrote:
> 
> > On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> > > 
> > >   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> > > 
> > > This is too permissive in various aspects:
> > > 
> > >   1) If need_resched() is set, then the tick cannot be stopped whether
> > >  the CPU is idle or in nohz full mode.
> > 
> > That's not exactly true. In nohz full mode the tick is not restarted on the
> > switch from idle to a single task. And if an idle interrupt wakes up a
> > single task and enqueues a timer, we want that timer to be programmed even
> > though we have need_resched().
> 
> Hrmm, so the check for softirq_pending() should be sufficient, right?
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
>   int cpu = smp_processor_id();
>  
>   /* Make sure that timer wheel updates are propagated */
> - if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> + if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
> + if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
> + !local_softirq_pending()) {
>   if (!in_interrupt())
>   tick_nohz_irq_exit();
>   }

Bah, no. We need to move that into the nohz logic somehow to prevent that
repetitive expiry yesterday reprogramming. Lemme think about it some more.

Thanks,

tglx


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Tue, 26 Dec 2017, Frederic Weisbecker wrote:

> On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> > 
> >   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> > 
> > This is too permissive in various aspects:
> > 
> >   1) If need_resched() is set, then the tick cannot be stopped whether
> >  the CPU is idle or in nohz full mode.
> 
> That's not exactly true. In nohz full mode the tick is not restarted on the
> switch from idle to a single task. And if an idle interrupt wakes up a
> single task and enqueues a timer, we want that timer to be programmed even
> though we have need_resched().

Hrmm, so the check for softirq_pending() should be sufficient, right?

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();
 
/* Make sure that timer wheel updates are propagated */
-   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+   if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
+   if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
+   !local_softirq_pending()) {
if (!in_interrupt())
tick_nohz_irq_exit();
}






Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-27 Thread Thomas Gleixner
On Tue, 26 Dec 2017, Frederic Weisbecker wrote:

> On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> > 
> >   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> > 
> > This is too permissive in various aspects:
> > 
> >   1) If need_resched() is set, then the tick cannot be stopped whether
> >  the CPU is idle or in nohz full mode.
> 
> That's not exactly true. In nohz full mode the tick is not restarted on the
> switch from idle to a single task. And if an idle interrupt wakes up a
> single task and enqueues a timer, we want that timer to be programmed even
> though we have need_resched().

Hrmm, so the check for softirq_pending() should be sufficient, right?

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();
 
/* Make sure that timer wheel updates are propagated */
-   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+   if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
+   if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
+   !local_softirq_pending()) {
if (!in_interrupt())
tick_nohz_irq_exit();
}






Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-26 Thread Frederic Weisbecker
On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> This is too permissive in various aspects:
> 
>   1) If need_resched() is set, then the tick cannot be stopped whether
>  the CPU is idle or in nohz full mode.

That's not exactly true. In nohz full mode the tick is not restarted on the
switch from idle to a single task. And if an idle interrupt wakes up a
single task and enqueues a timer, we want that timer to be programmed even
though we have need_resched().


Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

2017-12-26 Thread Frederic Weisbecker
On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> This is too permissive in various aspects:
> 
>   1) If need_resched() is set, then the tick cannot be stopped whether
>  the CPU is idle or in nohz full mode.

That's not exactly true. In nohz full mode the tick is not restarted on the
switch from idle to a single task. And if an idle interrupt wakes up a
single task and enqueues a timer, we want that timer to be programmed even
though we have need_resched().