Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-30 Thread Nicolas Pitre
On Tue, 30 Sep 2014, Rik van Riel wrote:

> On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
> > The code in find_idlest_cpu() looks for the CPU with the smallest
> > load. However, if multiple CPUs are idle, the first idle CPU is
> > selected irrespective of the depth of its idle state.
> > 
> > Among the idle CPUs we should pick the one with with the shallowest
> > idle state, or the latest to have gone idle if all idle CPUs are in
> > the same state.  The later applies even when cpuidle is configured
> > out.
> > 
> > This patch doesn't cover the following issues:
> 
> The main thing it does not cover is already running tasks that
> get woken up again, since select_idle_sibling() covers everything
> except for newly forked and newly executed tasks.

True. Now that you bring this up, I remember that Peter mentioned it as 
well.

> I am looking at adding similar logic to select_idle_sibling()

OK thanks.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-30 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest
> load. However, if multiple CPUs are idle, the first idle CPU is
> selected irrespective of the depth of its idle state.
> 
> Among the idle CPUs we should pick the one with with the shallowest
> idle state, or the latest to have gone idle if all idle CPUs are in
> the same state.  The later applies even when cpuidle is configured
> out.
> 
> This patch doesn't cover the following issues:

The main thing it does not cover is already running tasks that
get woken up again, since select_idle_sibling() covers everything
except for newly forked and newly executed tasks.

I am looking at adding similar logic to select_idle_sibling()

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUKyd9AAoJEM553pKExN6DXZgH/1R26XfYv2FzYV9IGbty3vVx
1kMPozdb7jRAUR8+WkUgs7ntkbqau2hC/nTjDsSsiLQwXdjaDdqvnbt8Y6srI1es
/Z+IRaIPGx24D7D6nB5sLgsAq6DsANUdtFK3TsED8+07LbiY71o64YQ3X1IEVyRO
FKBcDw9+DBPGVySKIdMm0h2txdnQ3Jy2lM3nKV8tBFheRuOhU4Rv/fumEYAUYvDV
J9y91RhKOeEJYmaYL6oQYtZgBqhDoJmh/0DjOrK6H71oZYiNWeIUxtieNXaNQp7B
Rd8khOVFLsf/qZK6qjmgnfO9Mm5ij/PvrALOBZt8O9KAD3/v3kXfWIm9tO1NDZU=
=kTdn
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-30 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
 The code in find_idlest_cpu() looks for the CPU with the smallest
 load. However, if multiple CPUs are idle, the first idle CPU is
 selected irrespective of the depth of its idle state.
 
 Among the idle CPUs we should pick the one with with the shallowest
 idle state, or the latest to have gone idle if all idle CPUs are in
 the same state.  The later applies even when cpuidle is configured
 out.
 
 This patch doesn't cover the following issues:

The main thing it does not cover is already running tasks that
get woken up again, since select_idle_sibling() covers everything
except for newly forked and newly executed tasks.

I am looking at adding similar logic to select_idle_sibling()

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUKyd9AAoJEM553pKExN6DXZgH/1R26XfYv2FzYV9IGbty3vVx
1kMPozdb7jRAUR8+WkUgs7ntkbqau2hC/nTjDsSsiLQwXdjaDdqvnbt8Y6srI1es
/Z+IRaIPGx24D7D6nB5sLgsAq6DsANUdtFK3TsED8+07LbiY71o64YQ3X1IEVyRO
FKBcDw9+DBPGVySKIdMm0h2txdnQ3Jy2lM3nKV8tBFheRuOhU4Rv/fumEYAUYvDV
J9y91RhKOeEJYmaYL6oQYtZgBqhDoJmh/0DjOrK6H71oZYiNWeIUxtieNXaNQp7B
Rd8khOVFLsf/qZK6qjmgnfO9Mm5ij/PvrALOBZt8O9KAD3/v3kXfWIm9tO1NDZU=
=kTdn
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-30 Thread Nicolas Pitre
On Tue, 30 Sep 2014, Rik van Riel wrote:

 On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
  The code in find_idlest_cpu() looks for the CPU with the smallest
  load. However, if multiple CPUs are idle, the first idle CPU is
  selected irrespective of the depth of its idle state.
  
  Among the idle CPUs we should pick the one with with the shallowest
  idle state, or the latest to have gone idle if all idle CPUs are in
  the same state.  The later applies even when cpuidle is configured
  out.
  
  This patch doesn't cover the following issues:
 
 The main thing it does not cover is already running tasks that
 get woken up again, since select_idle_sibling() covers everything
 except for newly forked and newly executed tasks.

True. Now that you bring this up, I remember that Peter mentioned it as 
well.

 I am looking at adding similar logic to select_idle_sibling()

OK thanks.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-19 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
> this_cpu)
>  {
>   unsigned long load, min_load = ULONG_MAX;
> - int idlest = -1;
> + unsigned int min_exit_latency = UINT_MAX;
> + u64 latest_idle_timestamp = 0;
> + int least_loaded_cpu = this_cpu;
> + int shallowest_idle_cpu = -1;
>   int i;
>  
>   /* Traverse only the allowed CPUs */
>   for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> - load = weighted_cpuload(i);
> -
> - if (load < min_load || (load == min_load && i == this_cpu)) {
> - min_load = load;
> - idlest = i;
> + if (idle_cpu(i)) {
> + struct rq *rq = cpu_rq(i);
> + struct cpuidle_state *idle = idle_get_state(rq);
> + if (idle && idle->exit_latency < min_exit_latency) {
> + /*
> +  * We give priority to a CPU whose idle state
> +  * has the smallest exit latency irrespective
> +  * of any idle timestamp.
> +  */
> + min_exit_latency = idle->exit_latency;
> + latest_idle_timestamp = rq->idle_stamp;
> + shallowest_idle_cpu = i;
> + } else if ((!idle || idle->exit_latency == 
> min_exit_latency) &&
> +rq->idle_stamp > latest_idle_timestamp) {
> + /*
> +  * If equal or no active idle state, then
> +  * the most recently idled CPU might have
> +  * a warmer cache.
> +  */
> + latest_idle_timestamp = rq->idle_stamp;
> + shallowest_idle_cpu = i;
> + }
> + cpuidle_put_state(rq);

Right, matching the other changes, I killed that line. The rest looks
ok.

> + } else {
> + load = weighted_cpuload(i);
> + if (load < min_load ||
> + (load == min_load && i == this_cpu)) {
> + min_load = load;
> + least_loaded_cpu = i;
> + }
>   }
>   }
>  
> - return idlest;
> + return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
> least_loaded_cpu;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-19 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
> 
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
> 
> This patch doesn't cover the following issues:
> 
> - The idle exit latency of a CPU might be larger than the time needed
>   to migrate the waking task to an already running CPU with sufficient
>   capacity, and therefore performance would benefit from task packing
>   in such case (in most cases task packing is about power saving).
> 
> - Some idle states have a non negligible and non abortable entry latency
>   which needs to run to completion before the exit latency can start.
>   A concurrent patch series is making this info available to the cpuidle
>   core.  Once available, the entry latency with the idle timestamp could
>   determine when the exit latency may be effective.
> 
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
> 
> Based on an initial patch from Daniel Lezcano.
> 
> Signed-off-by: Nicolas Pitre 
> ---
>  kernel/sched/fair.c | 43 ---
>  1 file changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
> this_cpu)

Ah, now I see, you use it in find_idlest_cpu(), that does not indeed
hold rq->lock, but it does already hold rcu_read_lock(), so in that
regard sync_rcu() should be the right primitive.

I suppose we want the same kind of logic in select_idle_sibling() and
that too already has rcu_read_lock().

So I'll replace the kick_all_cpus_sync() with sync_rcu() and add a
WARN_ON(!rcu_read_lock_held()) to idle_get_state(), like the below.

I however do think we need a few word on why we don't need
rcu_assign_pointer() and rcu_dereference() for rq->idle_state -- and I
do indeed think we do not because the idle state data is static.

---
Subject: sched: let the scheduler see CPU idle states
From: Daniel Lezcano 
Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its
struct rq instance which in turn could be used to make a better decision
when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is
cleared.

There are a couple of situations where the idle state pointer could be changed
while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
'acpi_processor_cst_has_changed'
'cpuidle_pause_and_lock'
'cpuidle_uninstall_idle_handler'
'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to
NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of them are
not coded to unregister themselves.  In any case, the unloading code must
call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be
used inside a rcu_read_lock section where we are protected with the
'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
idle_get_state() and idle_put_state() accessors should be used to that
effect.

Cc: "Rafael J. Wysocki" 
Cc: Ingo Molnar 
Signed-off-by: Daniel Lezcano 
Signed-off-by: Nicolas Pitre 
---
 drivers/cpuidle/cpuidle.c |6 ++
 kernel/sched/idle.c   |6 ++
 kernel/sched/sched.h  |   30 ++
 3 files changed, 42 insertions(+)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void
initialized = 0;
wake_up_all_idle_cpus();
}
+
+   /*
+* Make sure external observers (such as the scheduler)
+* are done looking at pointed idle states.
+*/
+   synchronize_rcu();
 }
 
 /**
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -147,6 +147,9 @@ static void cpuidle_idle_call(void)
 

Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-19 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
 The code in find_idlest_cpu() looks for the CPU with the smallest load.
 However, if multiple CPUs are idle, the first idle CPU is selected
 irrespective of the depth of its idle state.
 
 Among the idle CPUs we should pick the one with with the shallowest idle
 state, or the latest to have gone idle if all idle CPUs are in the same
 state.  The later applies even when cpuidle is configured out.
 
 This patch doesn't cover the following issues:
 
 - The idle exit latency of a CPU might be larger than the time needed
   to migrate the waking task to an already running CPU with sufficient
   capacity, and therefore performance would benefit from task packing
   in such case (in most cases task packing is about power saving).
 
 - Some idle states have a non negligible and non abortable entry latency
   which needs to run to completion before the exit latency can start.
   A concurrent patch series is making this info available to the cpuidle
   core.  Once available, the entry latency with the idle timestamp could
   determine when the exit latency may be effective.
 
 Those issues will be handled in due course.  In the mean time, what
 is implemented here should improve things already compared to the current
 state of affairs.
 
 Based on an initial patch from Daniel Lezcano.
 
 Signed-off-by: Nicolas Pitre n...@linaro.org
 ---
  kernel/sched/fair.c | 43 ---
  1 file changed, 36 insertions(+), 7 deletions(-)
 
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index bfa3c86d0d..416329e1a6 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -23,6 +23,7 @@
  #include linux/latencytop.h
  #include linux/sched.h
  #include linux/cpumask.h
 +#include linux/cpuidle.h
  #include linux/slab.h
  #include linux/profile.h
  #include linux/interrupt.h
 @@ -4428,20 +4429,48 @@ static int
  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
 this_cpu)

Ah, now I see, you use it in find_idlest_cpu(), that does not indeed
hold rq-lock, but it does already hold rcu_read_lock(), so in that
regard sync_rcu() should be the right primitive.

I suppose we want the same kind of logic in select_idle_sibling() and
that too already has rcu_read_lock().

So I'll replace the kick_all_cpus_sync() with sync_rcu() and add a
WARN_ON(!rcu_read_lock_held()) to idle_get_state(), like the below.

I however do think we need a few word on why we don't need
rcu_assign_pointer() and rcu_dereference() for rq-idle_state -- and I
do indeed think we do not because the idle state data is static.

---
Subject: sched: let the scheduler see CPU idle states
From: Daniel Lezcano daniel.lezc...@linaro.org
Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its
struct rq instance which in turn could be used to make a better decision
when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is
cleared.

There are a couple of situations where the idle state pointer could be changed
while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
'acpi_processor_cst_has_changed'
'cpuidle_pause_and_lock'
'cpuidle_uninstall_idle_handler'
'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to
NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of them are
not coded to unregister themselves.  In any case, the unloading code must
call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be
used inside a rcu_read_lock section where we are protected with the
'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
idle_get_state() and idle_put_state() accessors should be used to that
effect.

Cc: Rafael J. Wysocki r...@rjwysocki.net
Cc: Ingo Molnar mi...@redhat.com
Signed-off-by: Daniel Lezcano daniel.lezc...@linaro.org
Signed-off-by: Nicolas Pitre n...@linaro.org
---
 drivers/cpuidle/cpuidle.c |6 ++
 kernel/sched/idle.c   |6 ++
 kernel/sched/sched.h  |   30 ++
 3 files changed, 42 insertions(+)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void
initialized = 0;
wake_up_all_idle_cpus();
}
+
+   /*
+* Make sure external observers (such as the scheduler)
+* are done looking at pointed 

Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-19 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index bfa3c86d0d..416329e1a6 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -23,6 +23,7 @@
  #include linux/latencytop.h
  #include linux/sched.h
  #include linux/cpumask.h
 +#include linux/cpuidle.h
  #include linux/slab.h
  #include linux/profile.h
  #include linux/interrupt.h
 @@ -4428,20 +4429,48 @@ static int
  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
 this_cpu)
  {
   unsigned long load, min_load = ULONG_MAX;
 - int idlest = -1;
 + unsigned int min_exit_latency = UINT_MAX;
 + u64 latest_idle_timestamp = 0;
 + int least_loaded_cpu = this_cpu;
 + int shallowest_idle_cpu = -1;
   int i;
  
   /* Traverse only the allowed CPUs */
   for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
 - load = weighted_cpuload(i);
 -
 - if (load  min_load || (load == min_load  i == this_cpu)) {
 - min_load = load;
 - idlest = i;
 + if (idle_cpu(i)) {
 + struct rq *rq = cpu_rq(i);
 + struct cpuidle_state *idle = idle_get_state(rq);
 + if (idle  idle-exit_latency  min_exit_latency) {
 + /*
 +  * We give priority to a CPU whose idle state
 +  * has the smallest exit latency irrespective
 +  * of any idle timestamp.
 +  */
 + min_exit_latency = idle-exit_latency;
 + latest_idle_timestamp = rq-idle_stamp;
 + shallowest_idle_cpu = i;
 + } else if ((!idle || idle-exit_latency == 
 min_exit_latency) 
 +rq-idle_stamp  latest_idle_timestamp) {
 + /*
 +  * If equal or no active idle state, then
 +  * the most recently idled CPU might have
 +  * a warmer cache.
 +  */
 + latest_idle_timestamp = rq-idle_stamp;
 + shallowest_idle_cpu = i;
 + }
 + cpuidle_put_state(rq);

Right, matching the other changes, I killed that line. The rest looks
ok.

 + } else {
 + load = weighted_cpuload(i);
 + if (load  min_load ||
 + (load == min_load  i == this_cpu)) {
 + min_load = load;
 + least_loaded_cpu = i;
 + }
   }
   }
  
 - return idlest;
 + return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
 least_loaded_cpu;
  }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-18 Thread Yao Dongdong
On 2014/9/4 23:32, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
>
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
>
> This patch doesn't cover the following issues:
>
> - The idle exit latency of a CPU might be larger than the time needed
>   to migrate the waking task to an already running CPU with sufficient
>   capacity, and therefore performance would benefit from task packing
>   in such case (in most cases task packing is about power saving).
>
> - Some idle states have a non negligible and non abortable entry latency
>   which needs to run to completion before the exit latency can start.
>   A concurrent patch series is making this info available to the cpuidle
>   core.  Once available, the entry latency with the idle timestamp could
>   determine when the exit latency may be effective.
>
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
>
> Based on an initial patch from Daniel Lezcano.
>
> Signed-off-by: Nicolas Pitre 
> ---
>  kernel/sched/fair.c | 43 ---
>  1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
> this_cpu)
>  {
>   unsigned long load, min_load = ULONG_MAX;
> - int idlest = -1;
> + unsigned int min_exit_latency = UINT_MAX;
> + u64 latest_idle_timestamp = 0;
> + int least_loaded_cpu = this_cpu;
> + int shallowest_idle_cpu = -1;
>   int i;
>  
>   /* Traverse only the allowed CPUs */
>   for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> - load = weighted_cpuload(i);
> -
> - if (load < min_load || (load == min_load && i == this_cpu)) {
> - min_load = load;
> - idlest = i;
> + if (idle_cpu(i)) {
> + struct rq *rq = cpu_rq(i);
> + struct cpuidle_state *idle = idle_get_state(rq);
> + if (idle && idle->exit_latency < min_exit_latency) {
> + /*
> +  * We give priority to a CPU whose idle state
> +  * has the smallest exit latency irrespective
> +  * of any idle timestamp.
> +  */
> + min_exit_latency = idle->exit_latency;
> + latest_idle_timestamp = rq->idle_stamp;
> + shallowest_idle_cpu = i;
> + } else if ((!idle || idle->exit_latency == 
> min_exit_latency) &&
> +rq->idle_stamp > latest_idle_timestamp) {
> + /*
> +  * If equal or no active idle state, then
> +  * the most recently idled CPU might have
> +  * a warmer cache.
> +  */
> + latest_idle_timestamp = rq->idle_stamp;
> + shallowest_idle_cpu = i;
> + }
> + cpuidle_put_state(rq);
> + } else {
I think we needn't test no idle cpus after find an idle cpu.
And what about this?
 } else if (shallowest_idle_cpu == -1) {

> + load = weighted_cpuload(i);
> + if (load < min_load ||
> + (load == min_load && i == this_cpu)) {
> + min_load = load;
> + least_loaded_cpu = i;
> + }
>   }
>   }
>  
> - return idlest;
> + return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
> least_loaded_cpu;
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-18 Thread Yao Dongdong
On 2014/9/4 23:32, Nicolas Pitre wrote:
 The code in find_idlest_cpu() looks for the CPU with the smallest load.
 However, if multiple CPUs are idle, the first idle CPU is selected
 irrespective of the depth of its idle state.

 Among the idle CPUs we should pick the one with with the shallowest idle
 state, or the latest to have gone idle if all idle CPUs are in the same
 state.  The later applies even when cpuidle is configured out.

 This patch doesn't cover the following issues:

 - The idle exit latency of a CPU might be larger than the time needed
   to migrate the waking task to an already running CPU with sufficient
   capacity, and therefore performance would benefit from task packing
   in such case (in most cases task packing is about power saving).

 - Some idle states have a non negligible and non abortable entry latency
   which needs to run to completion before the exit latency can start.
   A concurrent patch series is making this info available to the cpuidle
   core.  Once available, the entry latency with the idle timestamp could
   determine when the exit latency may be effective.

 Those issues will be handled in due course.  In the mean time, what
 is implemented here should improve things already compared to the current
 state of affairs.

 Based on an initial patch from Daniel Lezcano.

 Signed-off-by: Nicolas Pitre n...@linaro.org
 ---
  kernel/sched/fair.c | 43 ---
  1 file changed, 36 insertions(+), 7 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index bfa3c86d0d..416329e1a6 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -23,6 +23,7 @@
  #include linux/latencytop.h
  #include linux/sched.h
  #include linux/cpumask.h
 +#include linux/cpuidle.h
  #include linux/slab.h
  #include linux/profile.h
  #include linux/interrupt.h
 @@ -4428,20 +4429,48 @@ static int
  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
 this_cpu)
  {
   unsigned long load, min_load = ULONG_MAX;
 - int idlest = -1;
 + unsigned int min_exit_latency = UINT_MAX;
 + u64 latest_idle_timestamp = 0;
 + int least_loaded_cpu = this_cpu;
 + int shallowest_idle_cpu = -1;
   int i;
  
   /* Traverse only the allowed CPUs */
   for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
 - load = weighted_cpuload(i);
 -
 - if (load  min_load || (load == min_load  i == this_cpu)) {
 - min_load = load;
 - idlest = i;
 + if (idle_cpu(i)) {
 + struct rq *rq = cpu_rq(i);
 + struct cpuidle_state *idle = idle_get_state(rq);
 + if (idle  idle-exit_latency  min_exit_latency) {
 + /*
 +  * We give priority to a CPU whose idle state
 +  * has the smallest exit latency irrespective
 +  * of any idle timestamp.
 +  */
 + min_exit_latency = idle-exit_latency;
 + latest_idle_timestamp = rq-idle_stamp;
 + shallowest_idle_cpu = i;
 + } else if ((!idle || idle-exit_latency == 
 min_exit_latency) 
 +rq-idle_stamp  latest_idle_timestamp) {
 + /*
 +  * If equal or no active idle state, then
 +  * the most recently idled CPU might have
 +  * a warmer cache.
 +  */
 + latest_idle_timestamp = rq-idle_stamp;
 + shallowest_idle_cpu = i;
 + }
 + cpuidle_put_state(rq);
 + } else {
I think we needn't test no idle cpus after find an idle cpu.
And what about this?
 } else if (shallowest_idle_cpu == -1) {

 + load = weighted_cpuload(i);
 + if (load  min_load ||
 + (load == min_load  i == this_cpu)) {
 + min_load = load;
 + least_loaded_cpu = i;
 + }
   }
   }
  
 - return idlest;
 + return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
 least_loaded_cpu;
  }
  
  /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-05 Thread Daniel Lezcano

On 09/04/2014 05:32 PM, Nicolas Pitre wrote:

The code in find_idlest_cpu() looks for the CPU with the smallest load.
However, if multiple CPUs are idle, the first idle CPU is selected
irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle
state, or the latest to have gone idle if all idle CPUs are in the same
state.  The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed
   to migrate the waking task to an already running CPU with sufficient
   capacity, and therefore performance would benefit from task packing
   in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency
   which needs to run to completion before the exit latency can start.
   A concurrent patch series is making this info available to the cpuidle
   core.  Once available, the entry latency with the idle timestamp could
   determine when the exit latency may be effective.

Those issues will be handled in due course.  In the mean time, what
is implemented here should improve things already compared to the current
state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre 


Sounds good for me.

Acked-by: Daniel Lezcano 


---
  kernel/sched/fair.c | 43 ---
  1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d..416329e1a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -4428,20 +4429,48 @@ static int
  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
this_cpu)
  {
unsigned long load, min_load = ULONG_MAX;
-   int idlest = -1;
+   unsigned int min_exit_latency = UINT_MAX;
+   u64 latest_idle_timestamp = 0;
+   int least_loaded_cpu = this_cpu;
+   int shallowest_idle_cpu = -1;
int i;

/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-   load = weighted_cpuload(i);
-
-   if (load < min_load || (load == min_load && i == this_cpu)) {
-   min_load = load;
-   idlest = i;
+   if (idle_cpu(i)) {
+   struct rq *rq = cpu_rq(i);
+   struct cpuidle_state *idle = idle_get_state(rq);
+   if (idle && idle->exit_latency < min_exit_latency) {
+   /*
+* We give priority to a CPU whose idle state
+* has the smallest exit latency irrespective
+* of any idle timestamp.
+*/
+   min_exit_latency = idle->exit_latency;
+   latest_idle_timestamp = rq->idle_stamp;
+   shallowest_idle_cpu = i;
+   } else if ((!idle || idle->exit_latency == min_exit_latency) 
&&
+  rq->idle_stamp > latest_idle_timestamp) {
+   /*
+* If equal or no active idle state, then
+* the most recently idled CPU might have
+* a warmer cache.
+*/
+   latest_idle_timestamp = rq->idle_stamp;
+   shallowest_idle_cpu = i;
+   }
+   cpuidle_put_state(rq);
+   } else {
+   load = weighted_cpuload(i);
+   if (load < min_load ||
+   (load == min_load && i == this_cpu)) {
+   min_load = load;
+   least_loaded_cpu = i;
+   }
}
}

-   return idlest;
+   return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
least_loaded_cpu;
  }

  /*




--
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-05 Thread Daniel Lezcano

On 09/04/2014 05:32 PM, Nicolas Pitre wrote:

The code in find_idlest_cpu() looks for the CPU with the smallest load.
However, if multiple CPUs are idle, the first idle CPU is selected
irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle
state, or the latest to have gone idle if all idle CPUs are in the same
state.  The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed
   to migrate the waking task to an already running CPU with sufficient
   capacity, and therefore performance would benefit from task packing
   in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency
   which needs to run to completion before the exit latency can start.
   A concurrent patch series is making this info available to the cpuidle
   core.  Once available, the entry latency with the idle timestamp could
   determine when the exit latency may be effective.

Those issues will be handled in due course.  In the mean time, what
is implemented here should improve things already compared to the current
state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre n...@linaro.org


Sounds good for me.

Acked-by: Daniel Lezcano daniel.lezc...@linaro.org


---
  kernel/sched/fair.c | 43 ---
  1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d..416329e1a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
  #include linux/latencytop.h
  #include linux/sched.h
  #include linux/cpumask.h
+#include linux/cpuidle.h
  #include linux/slab.h
  #include linux/profile.h
  #include linux/interrupt.h
@@ -4428,20 +4429,48 @@ static int
  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int 
this_cpu)
  {
unsigned long load, min_load = ULONG_MAX;
-   int idlest = -1;
+   unsigned int min_exit_latency = UINT_MAX;
+   u64 latest_idle_timestamp = 0;
+   int least_loaded_cpu = this_cpu;
+   int shallowest_idle_cpu = -1;
int i;

/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-   load = weighted_cpuload(i);
-
-   if (load  min_load || (load == min_load  i == this_cpu)) {
-   min_load = load;
-   idlest = i;
+   if (idle_cpu(i)) {
+   struct rq *rq = cpu_rq(i);
+   struct cpuidle_state *idle = idle_get_state(rq);
+   if (idle  idle-exit_latency  min_exit_latency) {
+   /*
+* We give priority to a CPU whose idle state
+* has the smallest exit latency irrespective
+* of any idle timestamp.
+*/
+   min_exit_latency = idle-exit_latency;
+   latest_idle_timestamp = rq-idle_stamp;
+   shallowest_idle_cpu = i;
+   } else if ((!idle || idle-exit_latency == min_exit_latency) 

+  rq-idle_stamp  latest_idle_timestamp) {
+   /*
+* If equal or no active idle state, then
+* the most recently idled CPU might have
+* a warmer cache.
+*/
+   latest_idle_timestamp = rq-idle_stamp;
+   shallowest_idle_cpu = i;
+   }
+   cpuidle_put_state(rq);
+   } else {
+   load = weighted_cpuload(i);
+   if (load  min_load ||
+   (load == min_load  i == this_cpu)) {
+   min_load = load;
+   least_loaded_cpu = i;
+   }
}
}

-   return idlest;
+   return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
least_loaded_cpu;
  }

  /*




--
 http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  http://www.facebook.com/pages/Linaro Facebook |
http://twitter.com/#!/linaroorg Twitter |
http://www.linaro.org/linaro-blog/ Blog

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

2014-09-04 Thread Nicolas Pitre
The code in find_idlest_cpu() looks for the CPU with the smallest load.
However, if multiple CPUs are idle, the first idle CPU is selected
irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle
state, or the latest to have gone idle if all idle CPUs are in the same
state.  The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed
  to migrate the waking task to an already running CPU with sufficient
  capacity, and therefore performance would benefit from task packing
  in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency
  which needs to run to completion before the exit latency can start.
  A concurrent patch series is making this info available to the cpuidle
  core.  Once available, the entry latency with the idle timestamp could
  determine when the exit latency may be effective.

Those issues will be handled in due course.  In the mean time, what
is implemented here should improve things already compared to the current
state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre 
---
 kernel/sched/fair.c | 43 ---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d..416329e1a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4428,20 +4429,48 @@ static int
 find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 {
unsigned long load, min_load = ULONG_MAX;
-   int idlest = -1;
+   unsigned int min_exit_latency = UINT_MAX;
+   u64 latest_idle_timestamp = 0;
+   int least_loaded_cpu = this_cpu;
+   int shallowest_idle_cpu = -1;
int i;
 
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-   load = weighted_cpuload(i);
-
-   if (load < min_load || (load == min_load && i == this_cpu)) {
-   min_load = load;
-   idlest = i;
+   if (idle_cpu(i)) {
+   struct rq *rq = cpu_rq(i);
+   struct cpuidle_state *idle = idle_get_state(rq);
+   if (idle && idle->exit_latency < min_exit_latency) {
+   /*
+* We give priority to a CPU whose idle state
+* has the smallest exit latency irrespective
+* of any idle timestamp.
+*/
+   min_exit_latency = idle->exit_latency;
+   latest_idle_timestamp = rq->idle_stamp;
+   shallowest_idle_cpu = i;
+   } else if ((!idle || idle->exit_latency == 
min_exit_latency) &&
+  rq->idle_stamp > latest_idle_timestamp) {
+   /*
+* If equal or no active idle state, then
+* the most recently idled CPU might have
+* a warmer cache.
+*/
+   latest_idle_timestamp = rq->idle_stamp;
+   shallowest_idle_cpu = i;
+   }
+   cpuidle_put_state(rq);
+   } else {
+   load = weighted_cpuload(i);
+   if (load < min_load ||
+   (load == min_load && i == this_cpu)) {
+   min_load = load;
+   least_loaded_cpu = i;
+   }
}
}
 
-   return idlest;
+   return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
least_loaded_cpu;
 }
 
 /*
-- 
1.8.4.108.g55ea5f6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the idlest cpu

2014-09-04 Thread Nicolas Pitre
The code in find_idlest_cpu() looks for the CPU with the smallest load.
However, if multiple CPUs are idle, the first idle CPU is selected
irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle
state, or the latest to have gone idle if all idle CPUs are in the same
state.  The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed
  to migrate the waking task to an already running CPU with sufficient
  capacity, and therefore performance would benefit from task packing
  in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency
  which needs to run to completion before the exit latency can start.
  A concurrent patch series is making this info available to the cpuidle
  core.  Once available, the entry latency with the idle timestamp could
  determine when the exit latency may be effective.

Those issues will be handled in due course.  In the mean time, what
is implemented here should improve things already compared to the current
state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre n...@linaro.org
---
 kernel/sched/fair.c | 43 ---
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d..416329e1a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
 #include linux/latencytop.h
 #include linux/sched.h
 #include linux/cpumask.h
+#include linux/cpuidle.h
 #include linux/slab.h
 #include linux/profile.h
 #include linux/interrupt.h
@@ -4428,20 +4429,48 @@ static int
 find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 {
unsigned long load, min_load = ULONG_MAX;
-   int idlest = -1;
+   unsigned int min_exit_latency = UINT_MAX;
+   u64 latest_idle_timestamp = 0;
+   int least_loaded_cpu = this_cpu;
+   int shallowest_idle_cpu = -1;
int i;
 
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-   load = weighted_cpuload(i);
-
-   if (load  min_load || (load == min_load  i == this_cpu)) {
-   min_load = load;
-   idlest = i;
+   if (idle_cpu(i)) {
+   struct rq *rq = cpu_rq(i);
+   struct cpuidle_state *idle = idle_get_state(rq);
+   if (idle  idle-exit_latency  min_exit_latency) {
+   /*
+* We give priority to a CPU whose idle state
+* has the smallest exit latency irrespective
+* of any idle timestamp.
+*/
+   min_exit_latency = idle-exit_latency;
+   latest_idle_timestamp = rq-idle_stamp;
+   shallowest_idle_cpu = i;
+   } else if ((!idle || idle-exit_latency == 
min_exit_latency) 
+  rq-idle_stamp  latest_idle_timestamp) {
+   /*
+* If equal or no active idle state, then
+* the most recently idled CPU might have
+* a warmer cache.
+*/
+   latest_idle_timestamp = rq-idle_stamp;
+   shallowest_idle_cpu = i;
+   }
+   cpuidle_put_state(rq);
+   } else {
+   load = weighted_cpuload(i);
+   if (load  min_load ||
+   (load == min_load  i == this_cpu)) {
+   min_load = load;
+   least_loaded_cpu = i;
+   }
}
}
 
-   return idlest;
+   return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : 
least_loaded_cpu;
 }
 
 /*
-- 
1.8.4.108.g55ea5f6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/