Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-30 Thread Thomas Gleixner
On Thu, 30 Aug 2018, John Crispin wrote:

> > Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
> > - take Thomas's patch 
> > https://lore.kernel.org/patchwork/patch/969521/#1162900
> > - revert commit 2d898915ccf4838c04531c51a598469e921a5eb5
> 
> Hi Frederic,
> 
> I reported this very issue to tglx last night and he asked me to verify his
> proposed patch which i just did. I can confirm the the patch fixes the issue
> on 4.14.67 and Greg should add it to the stable queue please.
> 
> Tested-by: John Crispin 

Let me whip up a proper patch with changelog.

Thanks,

tglx


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-30 Thread Thomas Gleixner
On Thu, 30 Aug 2018, John Crispin wrote:

> > Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
> > - take Thomas's patch 
> > https://lore.kernel.org/patchwork/patch/969521/#1162900
> > - revert commit 2d898915ccf4838c04531c51a598469e921a5eb5
> 
> Hi Frederic,
> 
> I reported this very issue to tglx last night and he asked me to verify his
> proposed patch which i just did. I can confirm the the patch fixes the issue
> on 4.14.67 and Greg should add it to the stable queue please.
> 
> Tested-by: John Crispin 

Let me whip up a proper patch with changelog.

Thanks,

tglx


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-30 Thread John Crispin
> Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
> - take Thomas's patch https://lore.kernel.org/patchwork/patch/969521/#1162900
> - revert commit 2d898915ccf4838c04531c51a598469e921a5eb5

Hi Frederic,

I reported this very issue to tglx last night and he asked me to verify his
proposed patch which i just did. I can confirm the the patch fixes the issue
on 4.14.67 and Greg should add it to the stable queue please.

Tested-by: John Crispin 

Thanks,
John



Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-30 Thread John Crispin
> Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
> - take Thomas's patch https://lore.kernel.org/patchwork/patch/969521/#1162900
> - revert commit 2d898915ccf4838c04531c51a598469e921a5eb5

Hi Frederic,

I reported this very issue to tglx last night and he asked me to verify his
proposed patch which i just did. I can confirm the the patch fixes the issue
on 4.14.67 and Greg should add it to the stable queue please.

Tested-by: John Crispin 

Thanks,
John



Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-28 Thread Grygorii Strashko



On 08/24/2018 01:41 PM, Frederic Weisbecker wrote:
> On Fri, Aug 24, 2018 at 11:10:44AM -0500, Grygorii Strashko wrote:
>> Yes. i do not see local_softirq_pending messages any more
>>
>> But one question, just to clarify, after patch "nohz: Fix missing tick 
>> reprog while interrupting inline timer softirq"
>> the tick_nohz_irq_exit() will be called few times in case of nested 
>> interrupts (min 2):
>> gic_handle_irq
>>   |- irq_exit
>>  |- preempt_count_sub(HARDIRQ_OFFSET);
>>  |-__do_softirq
>>  
>>  |- gic_handle_irq()
>> |- irq_exit()
>>  |- tick_irq_exit()
>> if (!in_irq())
>>  tick_nohz_irq_exit(); <-- [1]
>>  |- tick_irq_exit()
>>  if (!in_irq())
>>  tick_nohz_irq_exit(); <-- [2]
>>
>> Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in 
>> LKML now,
>> and this is hot path.
> 
> That's correct and it's indeed more costly in 4.14 as then the tick is going 
> to be programmed
> twice.
> 

Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
- take Thomas's patch https://lore.kernel.org/patchwork/patch/969521/#1162900
- revert commit 2d898915ccf4838c04531c51a598469e921a5eb5


-- 
regards,
-grygorii


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-28 Thread Grygorii Strashko



On 08/24/2018 01:41 PM, Frederic Weisbecker wrote:
> On Fri, Aug 24, 2018 at 11:10:44AM -0500, Grygorii Strashko wrote:
>> Yes. i do not see local_softirq_pending messages any more
>>
>> But one question, just to clarify, after patch "nohz: Fix missing tick 
>> reprog while interrupting inline timer softirq"
>> the tick_nohz_irq_exit() will be called few times in case of nested 
>> interrupts (min 2):
>> gic_handle_irq
>>   |- irq_exit
>>  |- preempt_count_sub(HARDIRQ_OFFSET);
>>  |-__do_softirq
>>  
>>  |- gic_handle_irq()
>> |- irq_exit()
>>  |- tick_irq_exit()
>> if (!in_irq())
>>  tick_nohz_irq_exit(); <-- [1]
>>  |- tick_irq_exit()
>>  if (!in_irq())
>>  tick_nohz_irq_exit(); <-- [2]
>>
>> Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in 
>> LKML now,
>> and this is hot path.
> 
> That's correct and it's indeed more costly in 4.14 as then the tick is going 
> to be programmed
> twice.
> 

Sry, that disturbing you all, but what are the conclusion here for 4.14.y?
- take Thomas's patch https://lore.kernel.org/patchwork/patch/969521/#1162900
- revert commit 2d898915ccf4838c04531c51a598469e921a5eb5


-- 
regards,
-grygorii


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Frederic Weisbecker
On Fri, Aug 24, 2018 at 11:10:44AM -0500, Grygorii Strashko wrote:
> Yes. i do not see local_softirq_pending messages any more
> 
> But one question, just to clarify, after patch "nohz: Fix missing tick reprog 
> while interrupting inline timer softirq"
> the tick_nohz_irq_exit() will be called few times in case of nested 
> interrupts (min 2):
> gic_handle_irq
>  |- irq_exit
> |- preempt_count_sub(HARDIRQ_OFFSET); 
> |-__do_softirq 
>   
>   |- gic_handle_irq()
>  |- irq_exit()
>   |- tick_irq_exit()
>  if (!in_irq())
>   tick_nohz_irq_exit(); <-- [1]
> |- tick_irq_exit()
>   if (!in_irq())
>   tick_nohz_irq_exit(); <-- [2]
> 
> Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in LKML 
> now,
> and this is hot path.

That's correct and it's indeed more costly in 4.14 as then the tick is going to 
be programmed
twice.


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Frederic Weisbecker
On Fri, Aug 24, 2018 at 11:10:44AM -0500, Grygorii Strashko wrote:
> Yes. i do not see local_softirq_pending messages any more
> 
> But one question, just to clarify, after patch "nohz: Fix missing tick reprog 
> while interrupting inline timer softirq"
> the tick_nohz_irq_exit() will be called few times in case of nested 
> interrupts (min 2):
> gic_handle_irq
>  |- irq_exit
> |- preempt_count_sub(HARDIRQ_OFFSET); 
> |-__do_softirq 
>   
>   |- gic_handle_irq()
>  |- irq_exit()
>   |- tick_irq_exit()
>  if (!in_irq())
>   tick_nohz_irq_exit(); <-- [1]
> |- tick_irq_exit()
>   if (!in_irq())
>   tick_nohz_irq_exit(); <-- [2]
> 
> Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in LKML 
> now,
> and this is hot path.

That's correct and it's indeed more costly in 4.14 as then the tick is going to 
be programmed
twice.


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Grygorii Strashko



On 08/24/2018 01:17 AM, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
>> Hi
>>
>> On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
>>> Before updating the full nohz tick or the idle time on IRQ exit, we
>>> check first if we are not in a nesting interrupt, whether the inner
>>> interrupt is a hard or a soft IRQ.
>>>
>>> There is a historical reason for that: the dyntick idle mode used to
>>> reprogram the tick on IRQ exit, after softirq processing, and there was
>>> no point in doing that job in the outer nesting interrupt because the
>>> tick update will be performed through the end of the inner interrupt
>>> eventually, with even potential new timer updates.
>>>
>>> One corner case could show up though: if an idle tick interrupts a softirq
>>> executing inline in the idle loop (through a call to local_bh_enable())
>>> after we entered in dynticks mode, the IRQ won't reprogram the tick
>>> because it assumes the softirq executes on an inner IRQ-tail. As a
>>> result we might put the CPU in sleep mode with the tick completely
>>> stopped whereas a timer can still be enqueued. Indeed there is no tick
>>> reprogramming in local_bh_enable(). We probably asssumed there was no bh
>>> disabled section in idle, although there didn't seem to be debug code
>>> ensuring that.
>>>
>>> Nowadays the nesting interrupt optimization still stands but only concern
>>> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
>>> and we want to wait for the end of the inner IRQ to reprogramm the tick.
>>> But in_interrupt() doesn't make a difference between softirqs executing
>>> on IRQ tail and those executing inline. What was to be considered a
>>> corner case in dynticks-idle mode now becomes a serious opportunity for
>>> a bug in full dynticks mode: if a tick interrupts a task executing
>>> softirq inline, the tick reprogramming will be ignored and we may exit
>>> to userspace after local_bh_enable() with an enqueued timer that will
>>> never fire.
>>>
>>> To fix this, simply keep reprogramming the tick if we are in a hardirq
>>> interrupting softirq. We can still figure out a way later to restore
>>> this optimization while excluding inline softirq processing.
>>>
>>> Reported-by: Anna-Maria Gleixner 
>>> Signed-off-by: Frederic Weisbecker 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Tested-by: Anna-Maria Gleixner 
>>> ---
>>>kernel/softirq.c | 2 +-
>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>>> index 900dcfe..0980a81 100644
>>> --- a/kernel/softirq.c
>>> +++ b/kernel/softirq.c
>>> @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
>>>
>>> /* Make sure that timer wheel updates are propagated */
>>> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>>> -   if (!in_interrupt())
>>> +   if (!in_irq())
>>> tick_nohz_irq_exit();
>>> }
>>>#endif
>>>
>>
>> This patch was back ported to the Stable linux-4.14.y and It causes 
>> regression -
>>   flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
>> boot (NFS boot):
>>
>> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
>> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
>>
>> the same is not reproducible with LKML - seems due to changes in tick-sched.c
>> __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?

not sure. But it seems set of changes from Rafael J. Wysocki:

ff7de62 nohz: Avoid duplication of code related to got_idle_tick
296bb1e cpuidle: menu: Refine idle state selection for running tick
554c8aa sched: idle: Select idle state before stopping the tick
23a8d88 time: tick-sched: Split tick_nohz_stop_sched_tick()
45f1ff5 cpuidle: Return nohz hint from cpuidle_select()
2aaf709 sched: idle: Do not stop the tick upfront in the idle loop
0e77676 time: tick-sched: Reorganize idle tick management code
b7eaf1a cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

> 
>> I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
>> this
>> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
>> interrupt:
>>
>> gic_handle_irq
>>   |- irq_exit
>>  |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
>>  |-__do_softirq
>>  
>>  |- gic_handle_irq()
>> |- irq_exit()
>>  |- tick_irq_exit()
>> if (!in_irq()) <-- My understanding is that this condition 
>> will be always true due to [1]
>>  tick_nohz_irq_exit();
>>  |-__tick_nohz_idle_enter()
>>|- can_stop_idle_tick()
>>
>> Sry, not sure if my conclusion is right and how can it be fixed.
> 
> Any pointers to a patch that might need to be backported would be
> appreciated.
> 

commit 
Author: Frederic Weisbecker 
Date: Fri Aug 3 15:31:34 2018 +0200

nohz: Fix missing tick reprogram when 

Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Grygorii Strashko



On 08/24/2018 01:17 AM, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
>> Hi
>>
>> On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
>>> Before updating the full nohz tick or the idle time on IRQ exit, we
>>> check first if we are not in a nesting interrupt, whether the inner
>>> interrupt is a hard or a soft IRQ.
>>>
>>> There is a historical reason for that: the dyntick idle mode used to
>>> reprogram the tick on IRQ exit, after softirq processing, and there was
>>> no point in doing that job in the outer nesting interrupt because the
>>> tick update will be performed through the end of the inner interrupt
>>> eventually, with even potential new timer updates.
>>>
>>> One corner case could show up though: if an idle tick interrupts a softirq
>>> executing inline in the idle loop (through a call to local_bh_enable())
>>> after we entered in dynticks mode, the IRQ won't reprogram the tick
>>> because it assumes the softirq executes on an inner IRQ-tail. As a
>>> result we might put the CPU in sleep mode with the tick completely
>>> stopped whereas a timer can still be enqueued. Indeed there is no tick
>>> reprogramming in local_bh_enable(). We probably asssumed there was no bh
>>> disabled section in idle, although there didn't seem to be debug code
>>> ensuring that.
>>>
>>> Nowadays the nesting interrupt optimization still stands but only concern
>>> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
>>> and we want to wait for the end of the inner IRQ to reprogramm the tick.
>>> But in_interrupt() doesn't make a difference between softirqs executing
>>> on IRQ tail and those executing inline. What was to be considered a
>>> corner case in dynticks-idle mode now becomes a serious opportunity for
>>> a bug in full dynticks mode: if a tick interrupts a task executing
>>> softirq inline, the tick reprogramming will be ignored and we may exit
>>> to userspace after local_bh_enable() with an enqueued timer that will
>>> never fire.
>>>
>>> To fix this, simply keep reprogramming the tick if we are in a hardirq
>>> interrupting softirq. We can still figure out a way later to restore
>>> this optimization while excluding inline softirq processing.
>>>
>>> Reported-by: Anna-Maria Gleixner 
>>> Signed-off-by: Frederic Weisbecker 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Tested-by: Anna-Maria Gleixner 
>>> ---
>>>kernel/softirq.c | 2 +-
>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>>> index 900dcfe..0980a81 100644
>>> --- a/kernel/softirq.c
>>> +++ b/kernel/softirq.c
>>> @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
>>>
>>> /* Make sure that timer wheel updates are propagated */
>>> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>>> -   if (!in_interrupt())
>>> +   if (!in_irq())
>>> tick_nohz_irq_exit();
>>> }
>>>#endif
>>>
>>
>> This patch was back ported to the Stable linux-4.14.y and It causes 
>> regression -
>>   flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
>> boot (NFS boot):
>>
>> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
>> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
>>
>> the same is not reproducible with LKML - seems due to changes in tick-sched.c
>> __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?

not sure. But it seems set of changes from Rafael J. Wysocki:

ff7de62 nohz: Avoid duplication of code related to got_idle_tick
296bb1e cpuidle: menu: Refine idle state selection for running tick
554c8aa sched: idle: Select idle state before stopping the tick
23a8d88 time: tick-sched: Split tick_nohz_stop_sched_tick()
45f1ff5 cpuidle: Return nohz hint from cpuidle_select()
2aaf709 sched: idle: Do not stop the tick upfront in the idle loop
0e77676 time: tick-sched: Reorganize idle tick management code
b7eaf1a cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

> 
>> I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
>> this
>> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
>> interrupt:
>>
>> gic_handle_irq
>>   |- irq_exit
>>  |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
>>  |-__do_softirq
>>  
>>  |- gic_handle_irq()
>> |- irq_exit()
>>  |- tick_irq_exit()
>> if (!in_irq()) <-- My understanding is that this condition 
>> will be always true due to [1]
>>  tick_nohz_irq_exit();
>>  |-__tick_nohz_idle_enter()
>>|- can_stop_idle_tick()
>>
>> Sry, not sure if my conclusion is right and how can it be fixed.
> 
> Any pointers to a patch that might need to be backported would be
> appreciated.
> 

commit 
Author: Frederic Weisbecker 
Date: Fri Aug 3 15:31:34 2018 +0200

nohz: Fix missing tick reprogram when 

Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Grygorii Strashko



On 08/24/2018 02:01 AM, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Greg KH wrote:
>> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
>>> This patch was back ported to the Stable linux-4.14.y and It causes 
>>> regression -
>>>   flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
>>> boot (NFS boot):
>>>
>>> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
>>> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> This printout is weird. Did you add something here?

yes. 

ff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da74d2f..a5fad1c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -910,8 +910,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
 
if (ratelimit < 100 &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
-   pr_warn("NOHZ: local_softirq_pending %02x in sirq %d\n",
+   pr_warn("NOHZ: local_softirq_pending %02x in sirq 
%lu\n",
(unsigned int) local_softirq_pending(), 
in_softirq());
+   WARN_ON_ONCE(true);
ratelimit++;
}


> 
>>> the same is not reproducible with LKML - seems due to changes in 
>>> tick-sched.c
>>> __tick_nohz_idle_enter()/tick_nohz_irq_exit().
>>
>> What changes do you think fixed this?
>>
>>> I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
>>> this
>>> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
>>> interrupt:
>>>
>>> gic_handle_irq
>>>   |- irq_exit
>>>  |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
>>>  |-__do_softirq
>>> 
>>> |- gic_handle_irq()
>>>|- irq_exit()
>>> |- tick_irq_exit()
>>>if (!in_irq()) <-- My understanding is that this condition 
>>> will be always true due to [1]
> 
> Correct, but that's not the problem. The issue is that this happens in a
> softirq disabled region. Does the below fix it?
> 
> Thanks,
> 
>   tglx
> 
> 8<
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
> *ts)
>   if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>   static int ratelimit;
>   
> - if (ratelimit < 10 &&
> + if (ratelimit < 10 && !in_softirq() &&
>   (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>   pr_warn("NOHZ: local_softirq_pending %02x\n",
>   (unsigned int) local_softirq_pending());
> 
> 

Yes. i do not see local_softirq_pending messages any more

But one question, just to clarify, after patch "nohz: Fix missing tick reprog 
while interrupting inline timer softirq"
the tick_nohz_irq_exit() will be called few times in case of nested interrupts 
(min 2):
gic_handle_irq
 |- irq_exit
|- preempt_count_sub(HARDIRQ_OFFSET); 
|-__do_softirq 

|- gic_handle_irq()
   |- irq_exit()
|- tick_irq_exit()
   if (!in_irq())
tick_nohz_irq_exit(); <-- [1]
|- tick_irq_exit()
if (!in_irq())
tick_nohz_irq_exit(); <-- [2]

Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in LKML 
now,
and this is hot path.


-- 
regards,
-grygorii


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Grygorii Strashko



On 08/24/2018 02:01 AM, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Greg KH wrote:
>> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
>>> This patch was back ported to the Stable linux-4.14.y and It causes 
>>> regression -
>>>   flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
>>> boot (NFS boot):
>>>
>>> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
>>> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> This printout is weird. Did you add something here?

yes. 

ff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da74d2f..a5fad1c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -910,8 +910,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
 
if (ratelimit < 100 &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
-   pr_warn("NOHZ: local_softirq_pending %02x in sirq %d\n",
+   pr_warn("NOHZ: local_softirq_pending %02x in sirq 
%lu\n",
(unsigned int) local_softirq_pending(), 
in_softirq());
+   WARN_ON_ONCE(true);
ratelimit++;
}


> 
>>> the same is not reproducible with LKML - seems due to changes in 
>>> tick-sched.c
>>> __tick_nohz_idle_enter()/tick_nohz_irq_exit().
>>
>> What changes do you think fixed this?
>>
>>> I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
>>> this
>>> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
>>> interrupt:
>>>
>>> gic_handle_irq
>>>   |- irq_exit
>>>  |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
>>>  |-__do_softirq
>>> 
>>> |- gic_handle_irq()
>>>|- irq_exit()
>>> |- tick_irq_exit()
>>>if (!in_irq()) <-- My understanding is that this condition 
>>> will be always true due to [1]
> 
> Correct, but that's not the problem. The issue is that this happens in a
> softirq disabled region. Does the below fix it?
> 
> Thanks,
> 
>   tglx
> 
> 8<
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
> *ts)
>   if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>   static int ratelimit;
>   
> - if (ratelimit < 10 &&
> + if (ratelimit < 10 && !in_softirq() &&
>   (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>   pr_warn("NOHZ: local_softirq_pending %02x\n",
>   (unsigned int) local_softirq_pending());
> 
> 

Yes. i do not see local_softirq_pending messages any more

But one question, just to clarify, after patch "nohz: Fix missing tick reprog 
while interrupting inline timer softirq"
the tick_nohz_irq_exit() will be called few times in case of nested interrupts 
(min 2):
gic_handle_irq
 |- irq_exit
|- preempt_count_sub(HARDIRQ_OFFSET); 
|-__do_softirq 

|- gic_handle_irq()
   |- irq_exit()
|- tick_irq_exit()
   if (!in_irq())
tick_nohz_irq_exit(); <-- [1]
|- tick_irq_exit()
if (!in_irq())
tick_nohz_irq_exit(); <-- [2]

Is it correct? in 4.14 tick_nohz_irq_exit() is much more complex then in LKML 
now,
and this is hot path.


-- 
regards,
-grygorii


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Frederic Weisbecker
On Fri, Aug 24, 2018 at 09:01:02AM +0200, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Greg KH wrote:
> > On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > > This patch was back ported to the Stable linux-4.14.y and It causes 
> > > regression -
> > >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
> > > boot (NFS boot):
> > > 
> > > [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > > [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> This printout is weird. Did you add something here?
> 
> > > the same is not reproducible with LKML - seems due to changes in 
> > > tick-sched.c 
> > > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> > 
> > What changes do you think fixed this?
> > 
> > > I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
> > > this
> > > patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> > > interrupt:
> > > 
> > > gic_handle_irq
> > >  |- irq_exit
> > > |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > > |-__do_softirq 
> > >   
> > >   |- gic_handle_irq()
> > >  |- irq_exit()
> > >   |- tick_irq_exit()
> > >  if (!in_irq()) <-- My understanding is that this condition 
> > > will be always true due to [1]
> 
> Correct, but that's not the problem. The issue is that this happens in a
> softirq disabled region. Does the below fix it?
> 
> Thanks,
> 
>   tglx
> 
> 8<
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
> *ts)
>   if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>   static int ratelimit;
>  
> - if (ratelimit < 10 &&
> + if (ratelimit < 10 && !in_softirq() &&
>   (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>   pr_warn("NOHZ: local_softirq_pending %02x\n",
>   (unsigned int) local_softirq_pending());
> 
> 

Good catch! In 4.14 Rafael hadn't yet changed the path where we stop the idle 
tick.
We were still stopping it from irq exit and so we could do that while 
interrupting a
softirq. So we may need to backport this along with "nohz: Fix missing tick 
reprogram
when interrupting an inline softirq".


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Frederic Weisbecker
On Fri, Aug 24, 2018 at 09:01:02AM +0200, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Greg KH wrote:
> > On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > > This patch was back ported to the Stable linux-4.14.y and It causes 
> > > regression -
> > >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
> > > boot (NFS boot):
> > > 
> > > [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > > [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> This printout is weird. Did you add something here?
> 
> > > the same is not reproducible with LKML - seems due to changes in 
> > > tick-sched.c 
> > > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> > 
> > What changes do you think fixed this?
> > 
> > > I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
> > > this
> > > patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> > > interrupt:
> > > 
> > > gic_handle_irq
> > >  |- irq_exit
> > > |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > > |-__do_softirq 
> > >   
> > >   |- gic_handle_irq()
> > >  |- irq_exit()
> > >   |- tick_irq_exit()
> > >  if (!in_irq()) <-- My understanding is that this condition 
> > > will be always true due to [1]
> 
> Correct, but that's not the problem. The issue is that this happens in a
> softirq disabled region. Does the below fix it?
> 
> Thanks,
> 
>   tglx
> 
> 8<
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
> *ts)
>   if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>   static int ratelimit;
>  
> - if (ratelimit < 10 &&
> + if (ratelimit < 10 && !in_softirq() &&
>   (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>   pr_warn("NOHZ: local_softirq_pending %02x\n",
>   (unsigned int) local_softirq_pending());
> 
> 

Good catch! In 4.14 Rafael hadn't yet changed the path where we stop the idle 
tick.
We were still stopping it from irq exit and so we could do that while 
interrupting a
softirq. So we may need to backport this along with "nohz: Fix missing tick 
reprogram
when interrupting an inline softirq".


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Thomas Gleixner
On Fri, 24 Aug 2018, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > This patch was back ported to the Stable linux-4.14.y and It causes 
> > regression -
> >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
> > boot (NFS boot):
> > 
> > [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256

This printout is weird. Did you add something here?

> > the same is not reproducible with LKML - seems due to changes in 
> > tick-sched.c 
> > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?
> 
> > I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
> > this
> > patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> > interrupt:
> > 
> > gic_handle_irq
> >  |- irq_exit
> > |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > |-__do_softirq 
> > 
> > |- gic_handle_irq()
> >|- irq_exit()
> > |- tick_irq_exit()
> >if (!in_irq()) <-- My understanding is that this condition 
> > will be always true due to [1]

Correct, but that's not the problem. The issue is that this happens in a
softirq disabled region. Does the below fix it?

Thanks,

tglx

8<
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c0ed..6aab9d54a331 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
 
-   if (ratelimit < 10 &&
+   if (ratelimit < 10 && !in_softirq() &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
pr_warn("NOHZ: local_softirq_pending %02x\n",
(unsigned int) local_softirq_pending());




Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Thomas Gleixner
On Fri, 24 Aug 2018, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > This patch was back ported to the Stable linux-4.14.y and It causes 
> > regression -
> >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
> > boot (NFS boot):
> > 
> > [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256

This printout is weird. Did you add something here?

> > the same is not reproducible with LKML - seems due to changes in 
> > tick-sched.c 
> > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?
> 
> > I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
> > this
> > patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> > interrupt:
> > 
> > gic_handle_irq
> >  |- irq_exit
> > |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > |-__do_softirq 
> > 
> > |- gic_handle_irq()
> >|- irq_exit()
> > |- tick_irq_exit()
> >if (!in_irq()) <-- My understanding is that this condition 
> > will be always true due to [1]

Correct, but that's not the problem. The issue is that this happens in a
softirq disabled region. Does the below fix it?

Thanks,

tglx

8<
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c0ed..6aab9d54a331 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
 
-   if (ratelimit < 10 &&
+   if (ratelimit < 10 && !in_softirq() &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
pr_warn("NOHZ: local_softirq_pending %02x\n",
(unsigned int) local_softirq_pending());




Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Greg KH
On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> Hi
> 
> On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
> > Before updating the full nohz tick or the idle time on IRQ exit, we
> > check first if we are not in a nesting interrupt, whether the inner
> > interrupt is a hard or a soft IRQ.
> > 
> > There is a historical reason for that: the dyntick idle mode used to
> > reprogram the tick on IRQ exit, after softirq processing, and there was
> > no point in doing that job in the outer nesting interrupt because the
> > tick update will be performed through the end of the inner interrupt
> > eventually, with even potential new timer updates.
> > 
> > One corner case could show up though: if an idle tick interrupts a softirq
> > executing inline in the idle loop (through a call to local_bh_enable())
> > after we entered in dynticks mode, the IRQ won't reprogram the tick
> > because it assumes the softirq executes on an inner IRQ-tail. As a
> > result we might put the CPU in sleep mode with the tick completely
> > stopped whereas a timer can still be enqueued. Indeed there is no tick
> > reprogramming in local_bh_enable(). We probably asssumed there was no bh
> > disabled section in idle, although there didn't seem to be debug code
> > ensuring that.
> > 
> > Nowadays the nesting interrupt optimization still stands but only concern
> > full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> > and we want to wait for the end of the inner IRQ to reprogramm the tick.
> > But in_interrupt() doesn't make a difference between softirqs executing
> > on IRQ tail and those executing inline. What was to be considered a
> > corner case in dynticks-idle mode now becomes a serious opportunity for
> > a bug in full dynticks mode: if a tick interrupts a task executing
> > softirq inline, the tick reprogramming will be ignored and we may exit
> > to userspace after local_bh_enable() with an enqueued timer that will
> > never fire.
> > 
> > To fix this, simply keep reprogramming the tick if we are in a hardirq
> > interrupting softirq. We can still figure out a way later to restore
> > this optimization while excluding inline softirq processing.
> > 
> > Reported-by: Anna-Maria Gleixner 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Tested-by: Anna-Maria Gleixner 
> > ---
> >   kernel/softirq.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 900dcfe..0980a81 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
> >   
> > /* Make sure that timer wheel updates are propagated */
> > if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> > -   if (!in_interrupt())
> > +   if (!in_irq())
> > tick_nohz_irq_exit();
> > }
> >   #endif
> > 
> 
> This patch was back ported to the Stable linux-4.14.y and It causes 
> regression -
>  flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot 
> (NFS boot):
> 
> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> the same is not reproducible with LKML - seems due to changes in tick-sched.c 
> __tick_nohz_idle_enter()/tick_nohz_irq_exit().

What changes do you think fixed this?

> I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> interrupt:
> 
> gic_handle_irq
>  |- irq_exit
> |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> |-__do_softirq 
>   
>   |- gic_handle_irq()
>  |- irq_exit()
>   |- tick_irq_exit()
>  if (!in_irq()) <-- My understanding is that this condition 
> will be always true due to [1]
>   tick_nohz_irq_exit();
>   |-__tick_nohz_idle_enter()
> |- can_stop_idle_tick()
> 
> Sry, not sure if my conclusion is right and how can it be fixed.

Any pointers to a patch that might need to be backported would be
appreciated.

thanks,

greg k-h


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Greg KH
On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> Hi
> 
> On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
> > Before updating the full nohz tick or the idle time on IRQ exit, we
> > check first if we are not in a nesting interrupt, whether the inner
> > interrupt is a hard or a soft IRQ.
> > 
> > There is a historical reason for that: the dyntick idle mode used to
> > reprogram the tick on IRQ exit, after softirq processing, and there was
> > no point in doing that job in the outer nesting interrupt because the
> > tick update will be performed through the end of the inner interrupt
> > eventually, with even potential new timer updates.
> > 
> > One corner case could show up though: if an idle tick interrupts a softirq
> > executing inline in the idle loop (through a call to local_bh_enable())
> > after we entered in dynticks mode, the IRQ won't reprogram the tick
> > because it assumes the softirq executes on an inner IRQ-tail. As a
> > result we might put the CPU in sleep mode with the tick completely
> > stopped whereas a timer can still be enqueued. Indeed there is no tick
> > reprogramming in local_bh_enable(). We probably asssumed there was no bh
> > disabled section in idle, although there didn't seem to be debug code
> > ensuring that.
> > 
> > Nowadays the nesting interrupt optimization still stands but only concern
> > full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> > and we want to wait for the end of the inner IRQ to reprogramm the tick.
> > But in_interrupt() doesn't make a difference between softirqs executing
> > on IRQ tail and those executing inline. What was to be considered a
> > corner case in dynticks-idle mode now becomes a serious opportunity for
> > a bug in full dynticks mode: if a tick interrupts a task executing
> > softirq inline, the tick reprogramming will be ignored and we may exit
> > to userspace after local_bh_enable() with an enqueued timer that will
> > never fire.
> > 
> > To fix this, simply keep reprogramming the tick if we are in a hardirq
> > interrupting softirq. We can still figure out a way later to restore
> > this optimization while excluding inline softirq processing.
> > 
> > Reported-by: Anna-Maria Gleixner 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Tested-by: Anna-Maria Gleixner 
> > ---
> >   kernel/softirq.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 900dcfe..0980a81 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
> >   
> > /* Make sure that timer wheel updates are propagated */
> > if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> > -   if (!in_interrupt())
> > +   if (!in_irq())
> > tick_nohz_irq_exit();
> > }
> >   #endif
> > 
> 
> This patch was back ported to the Stable linux-4.14.y and It causes 
> regression -
>  flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot 
> (NFS boot):
> 
> [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
> 
> the same is not reproducible with LKML - seems due to changes in tick-sched.c 
> __tick_nohz_idle_enter()/tick_nohz_irq_exit().

What changes do you think fixed this?

> I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
> patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> interrupt:
> 
> gic_handle_irq
>  |- irq_exit
> |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> |-__do_softirq 
>   
>   |- gic_handle_irq()
>  |- irq_exit()
>   |- tick_irq_exit()
>  if (!in_irq()) <-- My understanding is that this condition 
> will be always true due to [1]
>   tick_nohz_irq_exit();
>   |-__tick_nohz_idle_enter()
> |- can_stop_idle_tick()
> 
> Sry, not sure if my conclusion is right and how can it be fixed.

Any pointers to a patch that might need to be backported would be
appreciated.

thanks,

greg k-h


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-23 Thread Grygorii Strashko
Hi

On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Tested-by: Anna-Maria Gleixner 
> ---
>   kernel/softirq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 900dcfe..0980a81 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
>   
>   /* Make sure that timer wheel updates are propagated */
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> - if (!in_interrupt())
> + if (!in_irq())
>   tick_nohz_irq_exit();
>   }
>   #endif
> 

This patch was back ported to the Stable linux-4.14.y and It causes regression -
 flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot 
(NFS boot):

[4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
[4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256

the same is not reproducible with LKML - seems due to changes in tick-sched.c 
__tick_nohz_idle_enter()/tick_nohz_irq_exit().

I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt:

gic_handle_irq
 |- irq_exit
|- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
|-__do_softirq 

|- gic_handle_irq()
   |- irq_exit()
|- tick_irq_exit()
   if (!in_irq()) <-- My understanding is that this condition 
will be always true due to [1]
tick_nohz_irq_exit();
|-__tick_nohz_idle_enter()
  |- can_stop_idle_tick()

Sry, not sure if my conclusion is right and how can it be fixed.


[3.842320] NOHZ: local_softirq_pending 40 in sirq 256
[3.847485] [ cut here ]
[3.852133] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:915 
__tick_nohz_idle_enter+0x4b8/0x568
[3.861393] Modules linked in:
[3.864469] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.14.66-01768-gc26f664-dirty #311
[3.872506] Hardware name: Generic DRA74X (Flattened Device Tree)
[3.878623] Backtrace: 
[3.881091] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)
[3.888696]  r7:0009 r6:600f0193 r5: r4:c0c5fca4
[3.894386] [] (show_stack) from [] 
(dump_stack+0x8c/0xa0)
[3.901645] [] (dump_stack) from [] (__warn+0xec/0x104)
[3.908638]  r7:0009 r6:c0996d08 r5: r4:
[3.914329] [] (__warn) from [] 
(warn_slowpath_null+0x28/0x30)
[3.921933]  r9: r8:e4e1f7de r7:c0c8c1d8 r6:c0c65180 r5: 
r4:eed408e8
[3.929715] [] (warn_slowpath_null) from [] 
(__tick_nohz_idle_enter+0x4b8/0x568)
[3.938890] [] (__tick_nohz_idle_enter) from [] 
(tick_nohz_irq_exit+0x2c/0x30)
[3.947890]  r10:c0c01f50 r9:c0c0 

Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-23 Thread Grygorii Strashko
Hi

On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Tested-by: Anna-Maria Gleixner 
> ---
>   kernel/softirq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 900dcfe..0980a81 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void)
>   
>   /* Make sure that timer wheel updates are propagated */
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> - if (!in_interrupt())
> + if (!in_irq())
>   tick_nohz_irq_exit();
>   }
>   #endif
> 

This patch was back ported to the Stable linux-4.14.y and It causes regression -
 flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot 
(NFS boot):

[4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
[4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256

the same is not reproducible with LKML - seems due to changes in tick-sched.c 
__tick_nohz_idle_enter()/tick_nohz_irq_exit().

I've generated backtrace from  can_stop_idle_tick() (see below) and seems this
patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt:

gic_handle_irq
 |- irq_exit
|- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
|-__do_softirq 

|- gic_handle_irq()
   |- irq_exit()
|- tick_irq_exit()
   if (!in_irq()) <-- My understanding is that this condition 
will be always true due to [1]
tick_nohz_irq_exit();
|-__tick_nohz_idle_enter()
  |- can_stop_idle_tick()

Sry, not sure if my conclusion is right and how can it be fixed.


[3.842320] NOHZ: local_softirq_pending 40 in sirq 256
[3.847485] [ cut here ]
[3.852133] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:915 
__tick_nohz_idle_enter+0x4b8/0x568
[3.861393] Modules linked in:
[3.864469] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.14.66-01768-gc26f664-dirty #311
[3.872506] Hardware name: Generic DRA74X (Flattened Device Tree)
[3.878623] Backtrace: 
[3.881091] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)
[3.888696]  r7:0009 r6:600f0193 r5: r4:c0c5fca4
[3.894386] [] (show_stack) from [] 
(dump_stack+0x8c/0xa0)
[3.901645] [] (dump_stack) from [] (__warn+0xec/0x104)
[3.908638]  r7:0009 r6:c0996d08 r5: r4:
[3.914329] [] (__warn) from [] 
(warn_slowpath_null+0x28/0x30)
[3.921933]  r9: r8:e4e1f7de r7:c0c8c1d8 r6:c0c65180 r5: 
r4:eed408e8
[3.929715] [] (warn_slowpath_null) from [] 
(__tick_nohz_idle_enter+0x4b8/0x568)
[3.938890] [] (__tick_nohz_idle_enter) from [] 
(tick_nohz_irq_exit+0x2c/0x30)
[3.947890]  r10:c0c01f50 r9:c0c0 

Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-03 Thread Thomas Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> On Wed, Aug 01, 2018 at 07:46:10PM +0200, Thomas Gleixner wrote:
> 
> In fact I should remove this whole paragraph, it's about code history that's
> not relevant anymore and it confuses the whole explanation which should
> concern nohz_full only.

Care to send an updated version?

Thanks,

tglx


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-03 Thread Thomas Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> On Wed, Aug 01, 2018 at 07:46:10PM +0200, Thomas Gleixner wrote:
> 
> In fact I should remove this whole paragraph, it's about code history that's
> not relevant anymore and it confuses the whole explanation which should
> concern nohz_full only.

Care to send an updated version?

Thanks,

tglx


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Frederic Weisbecker
On Wed, Aug 01, 2018 at 07:46:10PM +0200, Thomas Gleixner wrote:
> On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> > Before updating the full nohz tick or the idle time on IRQ exit, we
> > check first if we are not in a nesting interrupt, whether the inner
> > interrupt is a hard or a soft IRQ.
> > 
> > There is a historical reason for that: the dyntick idle mode used to
> > reprogram the tick on IRQ exit, after softirq processing, and there was
> > no point in doing that job in the outer nesting interrupt because the
> > tick update will be performed through the end of the inner interrupt
> > eventually, with even potential new timer updates.
> > 
> > One corner case could show up though: if an idle tick interrupts a softirq
> > executing inline in the idle loop (through a call to local_bh_enable())
> 
> Where does this happen? Why is anything in the idle loop doing a
> local_bh_disable/enable() pair?
> 
> Or are you talking about NOHZ FULL and arbitrary task context?

It's about the idle loop. But I'm not aware of any example in practice, this is
a purely theoretical, and more importantly it doesn't concern upstream anymore 
since
we don't stop the tick from IRQ-tail anymore in dynticks-idle mode after 
Rafael's
changes.

> 
> > after we entered in dynticks mode, the IRQ won't reprogram the tick
> > because it assumes the softirq executes on an inner IRQ-tail. As a
> > result we might put the CPU in sleep mode with the tick completely
> > stopped whereas a timer can still be enqueued. Indeed there is no tick
> > reprogramming in local_bh_enable(). We probably asssumed there was no bh
> > disabled section in idle, although there didn't seem to be debug code
> > ensuring that.

In fact I should remove this whole paragraph, it's about code history that's
not relevant anymore and it confuses the whole explanation which should
concern nohz_full only.

> > 
> > Nowadays the nesting interrupt optimization still stands but only concern
> > full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> > and we want to wait for the end of the inner IRQ to reprogramm the tick.
> > But in_interrupt() doesn't make a difference between softirqs executing
> > on IRQ tail and those executing inline. What was to be considered a
> > corner case in dynticks-idle mode now becomes a serious opportunity for
> > a bug in full dynticks mode: if a tick interrupts a task executing
> > softirq inline, the tick reprogramming will be ignored and we may exit
> > to userspace after local_bh_enable() with an enqueued timer that will
> > never fire.
> > 
> > To fix this, simply keep reprogramming the tick if we are in a hardirq
> > interrupting softirq. We can still figure out a way later to restore
> > this optimization while excluding inline softirq processing.
> 
> I'm not really happy with that 'fix' because what happens if:
> 
>   
>   local_bh_enable()
> do_softirq()
>   --> interrupt()
>tick_nohz_irq_exit();
>   arm_timer();
> 
> So if that new timer is the only one on the CPU, what is going to arm the
> timer hardware which was just switched off in tick_nohz_irq_exit()?
> 
> I haven't looked deep enough, but a simple unconditional call to
> tick_irq_exit() at the end of do_softirq() might do the trick.

Nope it should be ok, nohz_full is supposed to support timers queued on the fly
while the tick is stopped, we issue a self-IPI if necessary:

internal_add_timer() -> trigger_dyntick_cpu() -> wake_up_nohz_cpu()

Thanks.


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Frederic Weisbecker
On Wed, Aug 01, 2018 at 07:46:10PM +0200, Thomas Gleixner wrote:
> On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> > Before updating the full nohz tick or the idle time on IRQ exit, we
> > check first if we are not in a nesting interrupt, whether the inner
> > interrupt is a hard or a soft IRQ.
> > 
> > There is a historical reason for that: the dyntick idle mode used to
> > reprogram the tick on IRQ exit, after softirq processing, and there was
> > no point in doing that job in the outer nesting interrupt because the
> > tick update will be performed through the end of the inner interrupt
> > eventually, with even potential new timer updates.
> > 
> > One corner case could show up though: if an idle tick interrupts a softirq
> > executing inline in the idle loop (through a call to local_bh_enable())
> 
> Where does this happen? Why is anything in the idle loop doing a
> local_bh_disable/enable() pair?
> 
> Or are you talking about NOHZ FULL and arbitrary task context?

It's about the idle loop. But I'm not aware of any example in practice, this is
a purely theoretical, and more importantly it doesn't concern upstream anymore 
since
we don't stop the tick from IRQ-tail anymore in dynticks-idle mode after 
Rafael's
changes.

> 
> > after we entered in dynticks mode, the IRQ won't reprogram the tick
> > because it assumes the softirq executes on an inner IRQ-tail. As a
> > result we might put the CPU in sleep mode with the tick completely
> > stopped whereas a timer can still be enqueued. Indeed there is no tick
> > reprogramming in local_bh_enable(). We probably asssumed there was no bh
> > disabled section in idle, although there didn't seem to be debug code
> > ensuring that.

In fact I should remove this whole paragraph, it's about code history that's
not relevant anymore and it confuses the whole explanation which should
concern nohz_full only.

> > 
> > Nowadays the nesting interrupt optimization still stands but only concern
> > full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> > and we want to wait for the end of the inner IRQ to reprogramm the tick.
> > But in_interrupt() doesn't make a difference between softirqs executing
> > on IRQ tail and those executing inline. What was to be considered a
> > corner case in dynticks-idle mode now becomes a serious opportunity for
> > a bug in full dynticks mode: if a tick interrupts a task executing
> > softirq inline, the tick reprogramming will be ignored and we may exit
> > to userspace after local_bh_enable() with an enqueued timer that will
> > never fire.
> > 
> > To fix this, simply keep reprogramming the tick if we are in a hardirq
> > interrupting softirq. We can still figure out a way later to restore
> > this optimization while excluding inline softirq processing.
> 
> I'm not really happy with that 'fix' because what happens if:
> 
>   
>   local_bh_enable()
> do_softirq()
>   --> interrupt()
>tick_nohz_irq_exit();
>   arm_timer();
> 
> So if that new timer is the only one on the CPU, what is going to arm the
> timer hardware which was just switched off in tick_nohz_irq_exit()?
> 
> I haven't looked deep enough, but a simple unconditional call to
> tick_irq_exit() at the end of do_softirq() might do the trick.

Nope it should be ok, nohz_full is supposed to support timers queued on the fly
while the tick is stopped, we issue a self-IPI if necessary:

internal_add_timer() -> trigger_dyntick_cpu() -> wake_up_nohz_cpu()

Thanks.


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Thomas Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())

Where does this happen? Why is anything in the idle loop doing a
local_bh_disable/enable() pair?

Or are you talking about NOHZ FULL and arbitrary task context?

> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.

I'm not really happy with that 'fix' because what happens if:

  
  local_bh_enable()
do_softirq()
  --> interrupt()
 tick_nohz_irq_exit();
  arm_timer();

So if that new timer is the only one on the CPU, what is going to arm the
timer hardware which was just switched off in tick_nohz_irq_exit()?

I haven't looked deep enough, but a simple unconditional call to
tick_irq_exit() at the end of do_softirq() might do the trick.

Thanks,

tglx





  


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Thomas Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())

Where does this happen? Why is anything in the idle loop doing a
local_bh_disable/enable() pair?

Or are you talking about NOHZ FULL and arbitrary task context?

> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.

I'm not really happy with that 'fix' because what happens if:

  
  local_bh_enable()
do_softirq()
  --> interrupt()
 tick_nohz_irq_exit();
  arm_timer();

So if that new timer is the only one on the CPU, what is going to arm the
timer hardware which was just switched off in tick_nohz_irq_exit()?

I haven't looked deep enough, but a simple unconditional call to
tick_irq_exit() at the end of do_softirq() might do the trick.

Thanks,

tglx





  


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Anna-Maria Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:

> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 

Tested-by: Anna-Maria Gleixner 

Thanks,

Anna-Maria


Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-01 Thread Anna-Maria Gleixner
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:

> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
> 
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
> 
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
> 
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
> 
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
> 
> Reported-by: Anna-Maria Gleixner 
> Signed-off-by: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 

Tested-by: Anna-Maria Gleixner 

Thanks,

Anna-Maria