Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-13 Thread Peter Zijlstra
On Tue, Jul 12, 2016 at 06:33:56PM +0200, Thomas Gleixner wrote:

> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 
> Signed-off-by: Thomas Gleixner 

> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
>  
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
>  
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;
>  
>   if (nr_active != this_rq->calc_load_active) {

Yeah, I think this is the only sensible approach.

How do you want to route this?


Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-13 Thread Peter Zijlstra
On Tue, Jul 12, 2016 at 06:33:56PM +0200, Thomas Gleixner wrote:

> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 
> Signed-off-by: Thomas Gleixner 

> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
>  
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
>  
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;
>  
>   if (nr_active != this_rq->calc_load_active) {

Yeah, I think this is the only sensible approach.

How do you want to route this?


Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Shilpasri G Bhat
Hi,

On 07/12/2016 10:03 PM, Thomas Gleixner wrote:
> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
>>> It really does not matter when we fold the load for the outgoing cpu.
>>> It's almost dead anyway, so there is no harm if we fail to fold the
>>> few microseconds which are required for going fully away.
>>
>> We are seeing the load average shoot up when hot unplugging CPUs (+1
>> for every CPU we offline) on ppc64. This reproduces on bare metal as
>> well as inside a KVM guest. A bisect points at this commit.
>>
>> As an example, a completely idle box with 128 CPUS and 112 hot
>> unplugged:
>>
>> # uptime
>>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Tested your patch to see that on an idle box on offlinig the cpus I dont see
increase in loadaverage.

# uptime
01:27:44 up 10 min,  1 user,  load average: 0.00, 0.18, 0.18

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-127

# ppc64_cpu --cores-on=2

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-15
Off-line CPU(s) list:  16-127

# sleep 60
# uptime
 01:28:52 up 11 min,  1 user,  load average: 0.11, 0.19, 0.18

Thanks and Regards,
Shilpa

> 
> Thanks,
> 
>   tglx
> 
> 8<--
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 
> Signed-off-by: Thomas Gleixner 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> - long delta = calc_load_fold_active(rq);
> + long delta = calc_load_fold_active(rq, 1);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
> 
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;
> 
>   if (nr_active != this_rq->calc_load_active) {
> @@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
>* We're going into NOHZ mode, if there's any pending delta, fold it
>* into the pending idle delta.
>*/
> - delta = calc_load_fold_active(this_rq);
> + delta = calc_load_fold_active(this_rq, 0);
>   if (delta) {
>   int idx = calc_load_write_idx();
> 
> @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
>   if (time_before(jiffies, this_rq->calc_load_update))
>   return;
> 
> - delta  = calc_load_fold_active(this_rq);
> + delta  = calc_load_fold_active(this_rq, 0);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7cbeb92a1cb9..898c0d2f18fe 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
>  extern atomic_long_t calc_load_tasks;
> 
>  extern void calc_global_load_tick(struct rq *this_rq);
> -extern long calc_load_fold_active(struct rq *this_rq);
> +extern long calc_load_fold_active(struct rq *this_rq, long adjust);
> 
>  #ifdef CONFIG_SMP
>  extern void cpu_load_update_active(struct rq *this_rq);
> 
> 
> 



Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Shilpasri G Bhat
Hi,

On 07/12/2016 10:03 PM, Thomas Gleixner wrote:
> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
>>> It really does not matter when we fold the load for the outgoing cpu.
>>> It's almost dead anyway, so there is no harm if we fail to fold the
>>> few microseconds which are required for going fully away.
>>
>> We are seeing the load average shoot up when hot unplugging CPUs (+1
>> for every CPU we offline) on ppc64. This reproduces on bare metal as
>> well as inside a KVM guest. A bisect points at this commit.
>>
>> As an example, a completely idle box with 128 CPUS and 112 hot
>> unplugged:
>>
>> # uptime
>>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Tested your patch to see that on an idle box on offlinig the cpus I dont see
increase in loadaverage.

# uptime
01:27:44 up 10 min,  1 user,  load average: 0.00, 0.18, 0.18

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-127

# ppc64_cpu --cores-on=2

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-15
Off-line CPU(s) list:  16-127

# sleep 60
# uptime
 01:28:52 up 11 min,  1 user,  load average: 0.11, 0.19, 0.18

Thanks and Regards,
Shilpa

> 
> Thanks,
> 
>   tglx
> 
> 8<--
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 
> Signed-off-by: Thomas Gleixner 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> - long delta = calc_load_fold_active(rq);
> + long delta = calc_load_fold_active(rq, 1);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
> 
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;
> 
>   if (nr_active != this_rq->calc_load_active) {
> @@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
>* We're going into NOHZ mode, if there's any pending delta, fold it
>* into the pending idle delta.
>*/
> - delta = calc_load_fold_active(this_rq);
> + delta = calc_load_fold_active(this_rq, 0);
>   if (delta) {
>   int idx = calc_load_write_idx();
> 
> @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
>   if (time_before(jiffies, this_rq->calc_load_update))
>   return;
> 
> - delta  = calc_load_fold_active(this_rq);
> + delta  = calc_load_fold_active(this_rq, 0);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7cbeb92a1cb9..898c0d2f18fe 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
>  extern atomic_long_t calc_load_tasks;
> 
>  extern void calc_global_load_tick(struct rq *this_rq);
> -extern long calc_load_fold_active(struct rq *this_rq);
> +extern long calc_load_fold_active(struct rq *this_rq, long adjust);
> 
>  #ifdef CONFIG_SMP
>  extern void cpu_load_update_active(struct rq *this_rq);
> 
> 
> 



Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Vaidyanathan Srinivasan
* Thomas Gleixner  [2016-07-12 18:33:56]:

> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > > It really does not matter when we fold the load for the outgoing cpu.
> > > It's almost dead anyway, so there is no harm if we fail to fold the
> > > few microseconds which are required for going fully away.
> > 
> > We are seeing the load average shoot up when hot unplugging CPUs (+1
> > for every CPU we offline) on ppc64. This reproduces on bare metal as
> > well as inside a KVM guest. A bisect points at this commit.
> > 
> > As an example, a completely idle box with 128 CPUS and 112 hot
> > unplugged:
> > 
> > # uptime
> >  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Hi Thomas,

Yes this patch fixes the issue.  I was able to recreate the problem
and also verify with this patch on 4.7.0-rc7.

> 
> Thanks,
> 
>   tglx
> 
> 8<--
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 

Tested-by: Vaidyanathan Srinivasan 

> Signed-off-by: Thomas Gleixner 
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> - long delta = calc_load_fold_active(rq);
> + long delta = calc_load_fold_active(rq, 1);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
> 
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;

if (nr_active != this_rq->calc_load_active) {
delta = nr_active - this_rq->calc_load_active;
this_rq->calc_load_active = nr_active;
}

return delta;

Does the above calculation hold good even if we send adjust=1 and bump
down nr_active? Tested ok though :)

--Vaidy



Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Vaidyanathan Srinivasan
* Thomas Gleixner  [2016-07-12 18:33:56]:

> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > > It really does not matter when we fold the load for the outgoing cpu.
> > > It's almost dead anyway, so there is no harm if we fail to fold the
> > > few microseconds which are required for going fully away.
> > 
> > We are seeing the load average shoot up when hot unplugging CPUs (+1
> > for every CPU we offline) on ppc64. This reproduces on bare metal as
> > well as inside a KVM guest. A bisect points at this commit.
> > 
> > As an example, a completely idle box with 128 CPUS and 112 hot
> > unplugged:
> > 
> > # uptime
> >  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Hi Thomas,

Yes this patch fixes the issue.  I was able to recreate the problem
and also verify with this patch on 4.7.0-rc7.

> 
> Thanks,
> 
>   tglx
> 
> 8<--
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner 
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard 

Tested-by: Vaidyanathan Srinivasan 

> Signed-off-by: Thomas Gleixner 
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> - long delta = calc_load_fold_active(rq);
> + long delta = calc_load_fold_active(rq, 1);
>   if (delta)
>   atomic_long_add(delta, _load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>   loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>   long nr_active, delta = 0;
> 
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
>   nr_active += (long)this_rq->nr_uninterruptible;

if (nr_active != this_rq->calc_load_active) {
delta = nr_active - this_rq->calc_load_active;
this_rq->calc_load_active = nr_active;
}

return delta;

Does the above calculation hold good even if we send adjust=1 and bump
down nr_active? Tested ok though :)

--Vaidy



Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Thomas Gleixner
Anton,

On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > It really does not matter when we fold the load for the outgoing cpu.
> > It's almost dead anyway, so there is no harm if we fail to fold the
> > few microseconds which are required for going fully away.
> 
> We are seeing the load average shoot up when hot unplugging CPUs (+1
> for every CPU we offline) on ppc64. This reproduces on bare metal as
> well as inside a KVM guest. A bisect points at this commit.
> 
> As an example, a completely idle box with 128 CPUS and 112 hot
> unplugged:
> 
> # uptime
>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Yes, it's an off by one as we now call that from the task which is tearing
down the cpu. Does the patch below fix it?

Thanks,

tglx

8<--

Subject: sched/migration: Correct off by one in load migration
From: Thomas Gleixner 

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
Reported-by: Anton Blanchard 
Signed-off-by: Thomas Gleixner 
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51d7105f529a..97ee9ac7e97c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5394,13 +5394,15 @@ void idle_task_exit(void)
 /*
  * Since this CPU is going 'away' for a while, fold any nr_active delta
  * we might have. Assumes we're called after migrate_tasks() so that the
- * nr_active count is stable.
+ * nr_active count is stable. We need to take the teardown thread which
+ * is calling this into account, so we hand in adjust = 1 to the load
+ * calculation.
  *
  * Also see the comment "Global load-average calculations".
  */
 static void calc_load_migrate(struct rq *rq)
 {
-   long delta = calc_load_fold_active(rq);
+   long delta = calc_load_fold_active(rq, 1);
if (delta)
atomic_long_add(delta, _load_tasks);
 }
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index b0b93fd33af9..a2d6eb71f06b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
offset, int shift)
loads[2] = (avenrun[2] + offset) << shift;
 }
 
-long calc_load_fold_active(struct rq *this_rq)
+long calc_load_fold_active(struct rq *this_rq, long adjust)
 {
long nr_active, delta = 0;
 
-   nr_active = this_rq->nr_running;
+   nr_active = this_rq->nr_running - adjust;
nr_active += (long)this_rq->nr_uninterruptible;
 
if (nr_active != this_rq->calc_load_active) {
@@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
 * We're going into NOHZ mode, if there's any pending delta, fold it
 * into the pending idle delta.
 */
-   delta = calc_load_fold_active(this_rq);
+   delta = calc_load_fold_active(this_rq, 0);
if (delta) {
int idx = calc_load_write_idx();
 
@@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
if (time_before(jiffies, this_rq->calc_load_update))
return;
 
-   delta  = calc_load_fold_active(this_rq);
+   delta  = calc_load_fold_active(this_rq, 0);
if (delta)
atomic_long_add(delta, _load_tasks);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7cbeb92a1cb9..898c0d2f18fe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
 extern atomic_long_t calc_load_tasks;
 
 extern void calc_global_load_tick(struct rq *this_rq);
-extern long calc_load_fold_active(struct rq *this_rq);
+extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 
 #ifdef CONFIG_SMP
 extern void cpu_load_update_active(struct rq *this_rq);





Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-12 Thread Thomas Gleixner
Anton,

On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > It really does not matter when we fold the load for the outgoing cpu.
> > It's almost dead anyway, so there is no harm if we fail to fold the
> > few microseconds which are required for going fully away.
> 
> We are seeing the load average shoot up when hot unplugging CPUs (+1
> for every CPU we offline) on ppc64. This reproduces on bare metal as
> well as inside a KVM guest. A bisect points at this commit.
> 
> As an example, a completely idle box with 128 CPUS and 112 hot
> unplugged:
> 
> # uptime
>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Yes, it's an off by one as we now call that from the task which is tearing
down the cpu. Does the patch below fix it?

Thanks,

tglx

8<--

Subject: sched/migration: Correct off by one in load migration
From: Thomas Gleixner 

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
Reported-by: Anton Blanchard 
Signed-off-by: Thomas Gleixner 
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51d7105f529a..97ee9ac7e97c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5394,13 +5394,15 @@ void idle_task_exit(void)
 /*
  * Since this CPU is going 'away' for a while, fold any nr_active delta
  * we might have. Assumes we're called after migrate_tasks() so that the
- * nr_active count is stable.
+ * nr_active count is stable. We need to take the teardown thread which
+ * is calling this into account, so we hand in adjust = 1 to the load
+ * calculation.
  *
  * Also see the comment "Global load-average calculations".
  */
 static void calc_load_migrate(struct rq *rq)
 {
-   long delta = calc_load_fold_active(rq);
+   long delta = calc_load_fold_active(rq, 1);
if (delta)
atomic_long_add(delta, _load_tasks);
 }
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index b0b93fd33af9..a2d6eb71f06b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
offset, int shift)
loads[2] = (avenrun[2] + offset) << shift;
 }
 
-long calc_load_fold_active(struct rq *this_rq)
+long calc_load_fold_active(struct rq *this_rq, long adjust)
 {
long nr_active, delta = 0;
 
-   nr_active = this_rq->nr_running;
+   nr_active = this_rq->nr_running - adjust;
nr_active += (long)this_rq->nr_uninterruptible;
 
if (nr_active != this_rq->calc_load_active) {
@@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
 * We're going into NOHZ mode, if there's any pending delta, fold it
 * into the pending idle delta.
 */
-   delta = calc_load_fold_active(this_rq);
+   delta = calc_load_fold_active(this_rq, 0);
if (delta) {
int idx = calc_load_write_idx();
 
@@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
if (time_before(jiffies, this_rq->calc_load_update))
return;
 
-   delta  = calc_load_fold_active(this_rq);
+   delta  = calc_load_fold_active(this_rq, 0);
if (delta)
atomic_long_add(delta, _load_tasks);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7cbeb92a1cb9..898c0d2f18fe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
 extern atomic_long_t calc_load_tasks;
 
 extern void calc_global_load_tick(struct rq *this_rq);
-extern long calc_load_fold_active(struct rq *this_rq);
+extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 
 #ifdef CONFIG_SMP
 extern void cpu_load_update_active(struct rq *this_rq);





Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-11 Thread Anton Blanchard
Hi Thomas,

> It really does not matter when we fold the load for the outgoing cpu.
> It's almost dead anyway, so there is no harm if we fail to fold the
> few microseconds which are required for going fully away.

We are seeing the load average shoot up when hot unplugging CPUs (+1
for every CPU we offline) on ppc64. This reproduces on bare metal as
well as inside a KVM guest. A bisect points at this commit.

As an example, a completely idle box with 128 CPUS and 112 hot
unplugged:

# uptime
 04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Anton


Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-07-11 Thread Anton Blanchard
Hi Thomas,

> It really does not matter when we fold the load for the outgoing cpu.
> It's almost dead anyway, so there is no harm if we fail to fold the
> few microseconds which are required for going fully away.

We are seeing the load average shoot up when hot unplugging CPUs (+1
for every CPU we offline) on ppc64. This reproduces on bare metal as
well as inside a KVM guest. A bisect points at this commit.

As an example, a completely idle box with 128 CPUS and 112 hot
unplugged:

# uptime
 04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Anton


[patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-03-10 Thread Thomas Gleixner
It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.

Signed-off-by: Thomas Gleixner 
---
 kernel/sched/core.c |3 ---
 1 file changed, 3 deletions(-)

Index: b/kernel/sched/core.c
===
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5685,9 +5685,6 @@ migration_call(struct notifier_block *nf
migrate_tasks(rq);
BUG_ON(rq->nr_running != 1); /* the migration thread */
raw_spin_unlock_irqrestore(>lock, flags);
-   break;
-
-   case CPU_DEAD:
calc_load_migrate(rq);
break;
 #endif




[patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

2016-03-10 Thread Thomas Gleixner
It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.

Signed-off-by: Thomas Gleixner 
---
 kernel/sched/core.c |3 ---
 1 file changed, 3 deletions(-)

Index: b/kernel/sched/core.c
===
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5685,9 +5685,6 @@ migration_call(struct notifier_block *nf
migrate_tasks(rq);
BUG_ON(rq->nr_running != 1); /* the migration thread */
raw_spin_unlock_irqrestore(>lock, flags);
-   break;
-
-   case CPU_DEAD:
calc_load_migrate(rq);
break;
 #endif