subject:"\[PATCH\] sched\/fair\: Avoid divide by zero when rebalancing domains"

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-08-17 Thread Valentin Schneider

Hi,

On 17/08/18 11:27, Matt Fleming wrote:
> On Thu, 05 Jul, at 05:54:02PM, Valentin Schneider wrote:
>> On 05/07/18 14:27, Matt Fleming wrote:
>>> On Thu, 05 Jul, at 11:10:42AM, Valentin Schneider wrote:
 Hi,

 On 04/07/18 15:24, Matt Fleming wrote:
> It's possible that the CPU doing nohz idle balance hasn't had its own
> load updated for many seconds. This can lead to huge deltas between
> rq->avg_stamp and rq->clock when rebalancing, and has been seen to
> cause the following crash:
>
>  divide error:  [#1] SMP
>  Call Trace:
>   [] update_sd_lb_stats+0xe8/0x560
>>
>> My confusion comes from not seeing where that crash happens. Would you mind
>> sharing the associated line number? I can feel the "how did I not see this"
>> from there but it can't be helped :(
> 
> The divide by zero comes from scale_rt_capacity() where 'total' is a
> u64 but gets truncated when passed to div_u64() since the divisor
> parameter is u32.
> 

Ah, nasty one. Interestingly enough that bit has been changed quite recently,
so I don't think you can get a div by 0 in there anymore - see
523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()") and subsequent
cleanups.

> Sure, you could use div64_u64() instead, but the real issue is that
> the load hasn't been updated for a very long time and that we're
> trying to balance the domains with stale data.
> 

Yeah I agree with that. However, the problem is with cpu_load - blocked load
on nohz CPUs will be periodically updated until entirely decayed. And if we
end up getting rid of cpu_load (depends on how [1] goes), then there's
nothing left to do. But we're not there yet...

[1]: 
https://lore.kernel.org/lkml/20180809135753.21077-1-dietmar.eggem...@arm.com/

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-08-17 Thread Matt Fleming

On Thu, 05 Jul, at 05:54:02PM, Valentin Schneider wrote:
> On 05/07/18 14:27, Matt Fleming wrote:
> > On Thu, 05 Jul, at 11:10:42AM, Valentin Schneider wrote:
> >> Hi,
> >>
> >> On 04/07/18 15:24, Matt Fleming wrote:
> >>> It's possible that the CPU doing nohz idle balance hasn't had its own
> >>> load updated for many seconds. This can lead to huge deltas between
> >>> rq->avg_stamp and rq->clock when rebalancing, and has been seen to
> >>> cause the following crash:
> >>>
> >>>  divide error:  [#1] SMP
> >>>  Call Trace:
> >>>   [] update_sd_lb_stats+0xe8/0x560
> 
> My confusion comes from not seeing where that crash happens. Would you mind
> sharing the associated line number? I can feel the "how did I not see this"
> from there but it can't be helped :(

The divide by zero comes from scale_rt_capacity() where 'total' is a
u64 but gets truncated when passed to div_u64() since the divisor
parameter is u32.

Sure, you could use div64_u64() instead, but the real issue is that
the load hasn't been updated for a very long time and that we're
trying to balance the domains with stale data.

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-07-05 Thread Valentin Schneider

On 05/07/18 14:27, Matt Fleming wrote:
> On Thu, 05 Jul, at 11:10:42AM, Valentin Schneider wrote:
>> Hi,
>>
>> On 04/07/18 15:24, Matt Fleming wrote:
>>> It's possible that the CPU doing nohz idle balance hasn't had its own
>>> load updated for many seconds. This can lead to huge deltas between
>>> rq->avg_stamp and rq->clock when rebalancing, and has been seen to
>>> cause the following crash:
>>>
>>>  divide error:  [#1] SMP
>>>  Call Trace:
>>>   [] update_sd_lb_stats+0xe8/0x560

My confusion comes from not seeing where that crash happens. Would you mind
sharing the associated line number? I can feel the "how did I not see this"
from there but it can't be helped :(

>>>   [] find_busiest_group+0x2d/0x4b0
>>>   [] load_balance+0x170/0x950
>>>   [] rebalance_domains+0x13f/0x290
>>>   [] __do_softirq+0xec/0x300
>>>   [] irq_exit+0xfa/0x110
>>>   [] reschedule_interrupt+0xc9/0xd0
>>>
>>
>> Do you have some sort of reproducer for that crash? If not I guess I can cook
>> something up with a quiet userspace & rt-app, though I've never seen that one
>> on arm64.
>  
> Unfortunately no, I don't have a reproduction case. Would love to have
> one to verify the patch though.
> 
>>> Make sure we update the rq clock and load before balancing.
>>>
>>> Cc: Ingo Molnar 
>>> Cc: Mike Galbraith 
>>> Cc: Peter Zijlstra 
>>> Signed-off-by: Matt Fleming 
>>> ---
>>>  kernel/sched/fair.c | 10 ++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 2f0a0be4d344..2c81662c858a 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9597,6 +9597,16 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
>>> unsigned int flags,
>>>  */
>>> smp_mb();
>>>  
>>> +   /*
>>> +* Ensure this_rq's clock and load are up-to-date before we
>>> +* rebalance since it's possible that they haven't been
>>> +* updated for multiple schedule periods, i.e. many seconds.
>>> +*/
>>> +   raw_spin_lock_irq(&this_rq->lock);
>>> +   update_rq_clock(this_rq);
>>> +   cpu_load_update_idle(this_rq);
>>> +   raw_spin_unlock_irq(&this_rq->lock);
>>> +
>>
>> I'm failing to understand why the updates further down below are seemingly
>> not enough. After we've potentially done 
>>
>> update_rq_clock(rq);
>> cpu_load_update_idle(rq);
>>
>> for all nohz cpus != this_cpu, we still end up doing:
>>
>> if (idle != CPU_NEWLY_IDLE) {
>>  update_blocked_averages(this_cpu);
>>  has_blocked_load |= this_rq->has_blocked_load;
>> }
>>
>> which should properly update this_rq's clock and load before we attempt to do
>> any balancing on it.
>  
> But cpu_load_update_idle() and update_blocked_averages() are not the same
> thing.
> 

Right, we don't do any rq->cpu_load[] update for this_rq in the current nohz
code (i.e. by using update_blocked_averages()), which I think we do want to
do. I'm just miserably failing to find how not doing this leads to a div by 0.

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-07-05 Thread Matt Fleming

On Thu, 05 Jul, at 11:10:42AM, Valentin Schneider wrote:
> Hi,
> 
> On 04/07/18 15:24, Matt Fleming wrote:
> > It's possible that the CPU doing nohz idle balance hasn't had its own
> > load updated for many seconds. This can lead to huge deltas between
> > rq->avg_stamp and rq->clock when rebalancing, and has been seen to
> > cause the following crash:
> > 
> >  divide error:  [#1] SMP
> >  Call Trace:
> >   [] update_sd_lb_stats+0xe8/0x560
> >   [] find_busiest_group+0x2d/0x4b0
> >   [] load_balance+0x170/0x950
> >   [] rebalance_domains+0x13f/0x290
> >   [] __do_softirq+0xec/0x300
> >   [] irq_exit+0xfa/0x110
> >   [] reschedule_interrupt+0xc9/0xd0
> > 
> 
> Do you have some sort of reproducer for that crash? If not I guess I can cook
> something up with a quiet userspace & rt-app, though I've never seen that one
> on arm64.
 
Unfortunately no, I don't have a reproduction case. Would love to have
one to verify the patch though.

> > Make sure we update the rq clock and load before balancing.
> > 
> > Cc: Ingo Molnar 
> > Cc: Mike Galbraith 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Matt Fleming 
> > ---
> >  kernel/sched/fair.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 2f0a0be4d344..2c81662c858a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9597,6 +9597,16 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
> > unsigned int flags,
> >  */
> > smp_mb();
> >  
> > +   /*
> > +* Ensure this_rq's clock and load are up-to-date before we
> > +* rebalance since it's possible that they haven't been
> > +* updated for multiple schedule periods, i.e. many seconds.
> > +*/
> > +   raw_spin_lock_irq(&this_rq->lock);
> > +   update_rq_clock(this_rq);
> > +   cpu_load_update_idle(this_rq);
> > +   raw_spin_unlock_irq(&this_rq->lock);
> > +
> 
> I'm failing to understand why the updates further down below are seemingly
> not enough. After we've potentially done 
> 
> update_rq_clock(rq);
> cpu_load_update_idle(rq);
> 
> for all nohz cpus != this_cpu, we still end up doing:
> 
> if (idle != CPU_NEWLY_IDLE) {
>   update_blocked_averages(this_cpu);
>   has_blocked_load |= this_rq->has_blocked_load;
> }
> 
> which should properly update this_rq's clock and load before we attempt to do
> any balancing on it.
 
But cpu_load_update_idle() and update_blocked_averages() are not the same
thing.

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-07-05 Thread Valentin Schneider

Hi,

On 04/07/18 15:24, Matt Fleming wrote:
> It's possible that the CPU doing nohz idle balance hasn't had its own
> load updated for many seconds. This can lead to huge deltas between
> rq->avg_stamp and rq->clock when rebalancing, and has been seen to
> cause the following crash:
> 
>  divide error:  [#1] SMP
>  Call Trace:
>   [] update_sd_lb_stats+0xe8/0x560
>   [] find_busiest_group+0x2d/0x4b0
>   [] load_balance+0x170/0x950
>   [] rebalance_domains+0x13f/0x290
>   [] __do_softirq+0xec/0x300
>   [] irq_exit+0xfa/0x110
>   [] reschedule_interrupt+0xc9/0xd0
> 

Do you have some sort of reproducer for that crash? If not I guess I can cook
something up with a quiet userspace & rt-app, though I've never seen that one
on arm64.

> Make sure we update the rq clock and load before balancing.
> 
> Cc: Ingo Molnar 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Signed-off-by: Matt Fleming 
> ---
>  kernel/sched/fair.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2f0a0be4d344..2c81662c858a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9597,6 +9597,16 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
> unsigned int flags,
>*/
>   smp_mb();
>  
> + /*
> +  * Ensure this_rq's clock and load are up-to-date before we
> +  * rebalance since it's possible that they haven't been
> +  * updated for multiple schedule periods, i.e. many seconds.
> +  */
> + raw_spin_lock_irq(&this_rq->lock);
> + update_rq_clock(this_rq);
> + cpu_load_update_idle(this_rq);
> + raw_spin_unlock_irq(&this_rq->lock);
> +

I'm failing to understand why the updates further down below are seemingly
not enough. After we've potentially done 

update_rq_clock(rq);
cpu_load_update_idle(rq);

for all nohz cpus != this_cpu, we still end up doing:

if (idle != CPU_NEWLY_IDLE) {
update_blocked_averages(this_cpu);
has_blocked_load |= this_rq->has_blocked_load;
}

which should properly update this_rq's clock and load before we attempt to do
any balancing on it.

>   for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
>   if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
>   continue;
>

[PATCH] sched/fair: Avoid divide by zero when rebalancing domains

2018-07-04 Thread Matt Fleming

It's possible that the CPU doing nohz idle balance hasn't had its own
load updated for many seconds. This can lead to huge deltas between
rq->avg_stamp and rq->clock when rebalancing, and has been seen to
cause the following crash:

 divide error:  [#1] SMP
 Call Trace:
  [] update_sd_lb_stats+0xe8/0x560
  [] find_busiest_group+0x2d/0x4b0
  [] load_balance+0x170/0x950
  [] rebalance_domains+0x13f/0x290
  [] __do_softirq+0xec/0x300
  [] irq_exit+0xfa/0x110
  [] reschedule_interrupt+0xc9/0xd0

Make sure we update the rq clock and load before balancing.

Cc: Ingo Molnar 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Signed-off-by: Matt Fleming 
---
 kernel/sched/fair.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be4d344..2c81662c858a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9597,6 +9597,16 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
unsigned int flags,
 */
smp_mb();
 
+   /*
+* Ensure this_rq's clock and load are up-to-date before we
+* rebalance since it's possible that they haven't been
+* updated for multiple schedule periods, i.e. many seconds.
+*/
+   raw_spin_lock_irq(&this_rq->lock);
+   update_rq_clock(this_rq);
+   cpu_load_update_idle(this_rq);
+   raw_spin_unlock_irq(&this_rq->lock);
+
for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
continue;
-- 
2.13.6

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

Re: [PATCH] sched/fair: Avoid divide by zero when rebalancing domains

[PATCH] sched/fair: Avoid divide by zero when rebalancing domains

6 matches

Site Navigation

Mail list logo

Footer information