Re: VST and Sched Load Balance

2005-04-20 Thread Srivatsa Vaddagiri
On Tue, Apr 19, 2005 at 09:07:49AM -0700, Nish Aravamudan wrote:
> > +   if (jiffies - sd1->last_balance >= interval) {


> Sorry for the late reply, but shouldn't this jiffies comparison be
> done with time_after() or time_before()?

I think it is not needed. The check should be able to handle overflow case also.

This probably assumes that you don't sleep longer than (2e32 - 1) jiffies
(which is ~1193 hrs). Current VST implementation let us sleep way less than that
limit (~896 ms) since it uses 32-bit number for sampling TSC. When it is
upgraded to use 64-bit number, we may have to ensure that this limit (1193 hrs)
is not exceeded.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-20 Thread Srivatsa Vaddagiri
On Tue, Apr 19, 2005 at 09:07:49AM -0700, Nish Aravamudan wrote:
  +   if (jiffies - sd1-last_balance = interval) {


 Sorry for the late reply, but shouldn't this jiffies comparison be
 done with time_after() or time_before()?

I think it is not needed. The check should be able to handle overflow case also.

This probably assumes that you don't sleep longer than (2e32 - 1) jiffies
(which is ~1193 hrs). Current VST implementation let us sleep way less than that
limit (~896 ms) since it uses 32-bit number for sampling TSC. When it is
upgraded to use 64-bit number, we may have to ensure that this limit (1193 hrs)
is not exceeded.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-19 Thread Nish Aravamudan
On 4/7/05, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:
> Hi,
> VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless
> regular (local) timer ticks when a CPU is idle.



>  linux-2.6.11-vatsa/kernel/sched.c |   52 
> ++
>  1 files changed, 52 insertions(+)
> 
> diff -puN kernel/sched.c~vst-sched_load_balance kernel/sched.c
> --- linux-2.6.11/kernel/sched.c~vst-sched_load_balance  2005-04-07 
> 17:51:34.0 +0530
> +++ linux-2.6.11-vatsa/kernel/sched.c   2005-04-07 17:56:18.0 +0530



> @@ -1796,6 +1817,25 @@ find_busiest_group(struct sched_domain *
> 
> nr_cpus++;
> avg_load += load;
> +
> +#ifdef CONFIG_VST
> +   if (idle != NOT_IDLE || !grp_sleeping ||
> +   (grp_sleeping && woken))
> +   continue;
> +
> +   sd1 = sd + (i-cpu);
> +   interval = sd1->balance_interval;
> +
> +   /* scale ms to jiffies */
> +   interval = msecs_to_jiffies(interval);
> +   if (unlikely(!interval))
> +   interval = 1;
> +
> +   if (jiffies - sd1->last_balance >= interval) {
> +   woken = 1;
> +   cpu_set(i, wakemask);
> +   }

Sorry for the late reply, but shouldn't this jiffies comparison be
done with time_after() or time_before()?

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-19 Thread Nish Aravamudan
On 4/7/05, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:
 Hi,
 VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless
 regular (local) timer ticks when a CPU is idle.

snip

  linux-2.6.11-vatsa/kernel/sched.c |   52 
 ++
  1 files changed, 52 insertions(+)
 
 diff -puN kernel/sched.c~vst-sched_load_balance kernel/sched.c
 --- linux-2.6.11/kernel/sched.c~vst-sched_load_balance  2005-04-07 
 17:51:34.0 +0530
 +++ linux-2.6.11-vatsa/kernel/sched.c   2005-04-07 17:56:18.0 +0530

snip

 @@ -1796,6 +1817,25 @@ find_busiest_group(struct sched_domain *
 
 nr_cpus++;
 avg_load += load;
 +
 +#ifdef CONFIG_VST
 +   if (idle != NOT_IDLE || !grp_sleeping ||
 +   (grp_sleeping  woken))
 +   continue;
 +
 +   sd1 = sd + (i-cpu);
 +   interval = sd1-balance_interval;
 +
 +   /* scale ms to jiffies */
 +   interval = msecs_to_jiffies(interval);
 +   if (unlikely(!interval))
 +   interval = 1;
 +
 +   if (jiffies - sd1-last_balance = interval) {
 +   woken = 1;
 +   cpu_set(i, wakemask);
 +   }

Sorry for the late reply, but shouldn't this jiffies comparison be
done with time_after() or time_before()?

Thanks,
Nish
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-08 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
On Thu, Apr 07, 2005 at 05:10:24PM +0200, Ingo Molnar wrote:
Interaction with VST is not a big issue right now because this only matters 
on SMP boxes which is a rare (but not unprecedented) target for embedded 
platforms.  

Well, I don't think VST is targetting just power management in embedded 
platforms. Even (virtualized) servers will benefit from this patch, by
making use of the (virtual) CPU resources more efficiently.

I still think looking at just using the rebalance backoff would be
a good start.
What would be really nice is to measure the power draw on your favourite
SMP system with your current patches that *don't* schedule ticks to
service rebalancing.
Then measure again with the current rebalance backoff settings (which
will likely be not very good, because some intervals are constrained to
quite small values).
Then we can aim for something like 80-90% of the first (ie perfect)
efficiency rating.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-08 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
On Thu, Apr 07, 2005 at 05:10:24PM +0200, Ingo Molnar wrote:
Interaction with VST is not a big issue right now because this only matters 
on SMP boxes which is a rare (but not unprecedented) target for embedded 
platforms.  

Well, I don't think VST is targetting just power management in embedded 
platforms. Even (virtualized) servers will benefit from this patch, by
making use of the (virtual) CPU resources more efficiently.

I still think looking at just using the rebalance backoff would be
a good start.
What would be really nice is to measure the power draw on your favourite
SMP system with your current patches that *don't* schedule ticks to
service rebalancing.
Then measure again with the current rebalance backoff settings (which
will likely be not very good, because some intervals are constrained to
quite small values).
Then we can aim for something like 80-90% of the first (ie perfect)
efficiency rating.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
On Thu, Apr 07, 2005 at 05:10:24PM +0200, Ingo Molnar wrote:
> Interaction with VST is not a big issue right now because this only matters 
> on SMP boxes which is a rare (but not unprecedented) target for embedded 
> platforms.  

Well, I don't think VST is targetting just power management in embedded 
platforms. Even (virtualized) servers will benefit from this patch, by
making use of the (virtual) CPU resources more efficiently.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
[Sorry about sending my response from a different account. Can't seem
to access my ibm account right now]

* Ingo wrote:

> Another, more effective, less intrusive but also more complex approach
> would be to make a distinction between 'totally idle' and 'partially
> idle or busy' system states. When all CPUs are idle then all timer irqs
> may be stopped and full VST logic applies. When at least one CPU is
> busy, all the other CPUs may still be put to sleep completely and
> immediately, but the busy CPU(s) have to take over a 'watchdog' role,
> and need to run the 'do the idle CPUs need new tasks' balancing
> functions. I.e. the scheduling function of other CPUs is migrated to
> busy CPUs. If there are no busy CPUs then there's no work, so this ought
> to be simple on the VST side. This needs some reorganization on the
> scheduler side but ought to be doable as well.


Hmm ..I think this is the approach that I have followed in my patch, where
busy CPUs act as watchdogs and wakeup sleeping CPUs at an appropriate
time. The appropriate time is currently based on the busy CPU's load
being greater than 1 and the sleeping CPU not having balanced for its
minimum balance_interval.

Do you have any other suggestions on how the watchdog function should
be implemented?

- vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Ingo Molnar

* Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

> Hi,
>   VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless 
> regular (local) timer ticks when a CPU is idle.
> 
> I think a potential area which VST may need to address is scheduler 
> load balance. If idle CPUs stop taking local timer ticks for some 
> time, then during that period it could cause the various runqueues to 
> go out of balance, since the idle CPUs will no longer pull tasks from 
> non-idle CPUs.
> 
> Do we care about this imbalance? Especially considering that most 
> implementations will let the idle CPUs sleep only for some max 
> duration (~900 ms in case of x86).

yeah, we care about this imbalance, it would materially change the 
scheduling logic, which side-effect we dont want. Interaction with VST 
is not a big issue right now because this only matters on SMP boxes 
which is a rare (but not unprecedented) target for embedded platforms.  

One solution would be to add an exponential backoff would be ok (as Nick 
suggested too), not an unconditional 'we wont fire a timer interrupt for 
the next 10 seconds' logic. It still impacts scheduling though.

Another, more effective, less intrusive but also more complex approach 
would be to make a distinction between 'totally idle' and 'partially 
idle or busy' system states. When all CPUs are idle then all timer irqs 
may be stopped and full VST logic applies. When at least one CPU is 
busy, all the other CPUs may still be put to sleep completely and 
immediately, but the busy CPU(s) have to take over a 'watchdog' role, 
and need to run the 'do the idle CPUs need new tasks' balancing 
functions. I.e. the scheduling function of other CPUs is migrated to 
busy CPUs. If there are no busy CPUs then there's no work, so this ought 
to be simple on the VST side. This needs some reorganization on the 
scheduler side but ought to be doable as well.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
Hmm ..I guess we could restrict the max time a idle CPU will sleep taking
into account its balance interval. But whatever heuristics we follow to 
maximize balance_interval of about-to-sleep idle CPU, don't we still run the 
risk of idle cpu being woken up and going immediately back to sleep (because 
there was no imbalance)?

Yep.
Moreover we may be greatly reducing the amount of time a CPU is allowed to 
sleep this way ...

Yes. I was assuming you get some kind of fairly rapidly diminishing
efficiency return curve based on your maximum sleep time. If that is
not so, then I agree this wouldn't be the best method.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
On Thu, Apr 07, 2005 at 11:07:55PM +1000, Nick Piggin wrote:
> 3. This is exactly one of the situations that the balancing backoff code
>was designed for. Can you just schedule interrupts to fire when the
>next balance interval has passed? This may require some adjustments to
>the backoff code in order to get good powersaving, but it would be the
>cleanest approach from the scheduler's point of view.


Hmm ..I guess we could restrict the max time a idle CPU will sleep taking
into account its balance interval. But whatever heuristics we follow to 
maximize balance_interval of about-to-sleep idle CPU, don't we still run the 
risk of idle cpu being woken up and going immediately back to sleep (because 
there was no imbalance)?

Moreover we may be greatly reducing the amount of time a CPU is allowed to 
sleep this way ...






-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
I think a potential area which VST may need to address is 
scheduler load balance. If idle CPUs stop taking local timer ticks for 
some time, then during that period it could cause the various runqueues to 
go out of balance, since the idle CPUs will no longer pull tasks from 
non-idle CPUs. 

Yep.
Do we care about this imbalance? Especially considering that most 
implementations will let the idle CPUs sleep only for some max duration
(~900 ms in case of x86).

I think we do care, yes. It could be pretty harmful to sleep for
even a few 10s of ms on a regular basis for some workloads. Although
I guess many of those will be covered by try_to_wake_up events...
Not sure in practice, I would imagine it will hurt some multiprocessor
workloads.
If we do care about this imbalance, then we could hope that the balance logic
present in try_to_wake_up and sched_exec may avoid this imbalance, but can we 
bank upon these events to restore the runqueue balance?

If we cannot, then I had something in mind on these lines:
1. A non-idle CPU (having nr_running > 1) can wakeup a idle sleeping CPU if it 
   finds that the sleeping CPU has not balanced itself for it's 
   "balance_interval" period.

2. It would be nice to minimize the "cross-domain" wakeups. For ex: we may want 
   to avoid a non-idle CPU in node B sending a wakeup to a idle sleeping CPU in 
   another node A, when this wakeup could have been sent from another non-idle
   CPU in node A itself. 
 
3. This is exactly one of the situations that the balancing backoff code
   was designed for. Can you just schedule interrupts to fire when the
   next balance interval has passed? This may require some adjustments to
   the backoff code in order to get good powersaving, but it would be the
   cleanest approach from the scheduler's point of view.
Nick
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
Hi,
VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless 
regular (local) timer ticks when a CPU is idle.

I think a potential area which VST may need to address is 
scheduler load balance. If idle CPUs stop taking local timer ticks for 
some time, then during that period it could cause the various runqueues to 
go out of balance, since the idle CPUs will no longer pull tasks from 
non-idle CPUs. 

Do we care about this imbalance? Especially considering that most 
implementations will let the idle CPUs sleep only for some max duration
(~900 ms in case of x86).

If we do care about this imbalance, then we could hope that the balance logic
present in try_to_wake_up and sched_exec may avoid this imbalance, but can we 
bank upon these events to restore the runqueue balance?

If we cannot, then I had something in mind on these lines:

1. A non-idle CPU (having nr_running > 1) can wakeup a idle sleeping CPU if it 
   finds that the sleeping CPU has not balanced itself for it's 
   "balance_interval" period.

2. It would be nice to minimize the "cross-domain" wakeups. For ex: we may want 
   to avoid a non-idle CPU in node B sending a wakeup to a idle sleeping CPU in 
   another node A, when this wakeup could have been sent from another non-idle
   CPU in node A itself. 
 
That is why I have imposed the condition for sending wakeup only when
   a whole sched_group of CPUs are sleeping in a domain. We wake one of them 
   up. The chosen one is one which has not balanced itself for 
   "balance_interval" period.

I did think about avoiding all these and putting some hooks in 
wake_up_new_task, to wakeup the sleeping CPUs. But the problem is 
the woken-up CPU may refuse to pull any tasks and go to sleep again
if it has balanced itself in the domain "recently" (balance_interval).


Comments?

Patch (not fully-tested) against 2.6.11 follows.


---

 linux-2.6.11-vatsa/kernel/sched.c |   52 ++
 1 files changed, 52 insertions(+)

diff -puN kernel/sched.c~vst-sched_load_balance kernel/sched.c
--- linux-2.6.11/kernel/sched.c~vst-sched_load_balance  2005-04-07 
17:51:34.0 +0530
+++ linux-2.6.11-vatsa/kernel/sched.c   2005-04-07 17:56:18.0 +0530
@@ -1774,9 +1774,17 @@ find_busiest_group(struct sched_domain *
 {
struct sched_group *busiest = NULL, *this = NULL, *group = sd->groups;
unsigned long max_load, avg_load, total_load, this_load, total_pwr;
+#ifdef CONFIG_VST
+   int grp_sleeping;
+   cpumask_t tmpmask, wakemask;
+#endif
 
max_load = this_load = total_load = total_pwr = 0;
 
+#ifdef CONFIG_VST
+   cpus_clear(wakemask);
+#endif
+
do {
unsigned long load;
int local_group;
@@ -1787,7 +1795,20 @@ find_busiest_group(struct sched_domain *
/* Tally up the load of all CPUs in the group */
avg_load = 0;
 
+#ifdef CONFIG_VST
+   grp_sleeping = 0;
+   cpus_and(tmpmask, group->cpumask, nohz_cpu_mask);
+   if (cpus_equal(tmpmask, group->cpumask))
+   grp_sleeping = 1;
+#endif
+
for_each_cpu_mask(i, group->cpumask) {
+#ifdef CONFIG_VST
+   int cpu = smp_processor_id();
+   struct sched_domain *sd1;
+   unsigned long interval;
+   int woken = 0;
+#endif
/* Bias balancing toward cpus of our domain */
if (local_group)
load = target_load(i);
@@ -1796,6 +1817,25 @@ find_busiest_group(struct sched_domain *
 
nr_cpus++;
avg_load += load;
+
+#ifdef CONFIG_VST
+   if (idle != NOT_IDLE || !grp_sleeping ||
+   (grp_sleeping && woken))
+   continue;
+
+   sd1 = sd + (i-cpu);
+   interval = sd1->balance_interval;
+
+   /* scale ms to jiffies */
+   interval = msecs_to_jiffies(interval);
+   if (unlikely(!interval))
+   interval = 1;
+
+   if (jiffies - sd1->last_balance >= interval) {
+   woken = 1;
+   cpu_set(i, wakemask);
+   }
+#endif
}
 
if (!nr_cpus)
@@ -1819,6 +1859,18 @@ nextgroup:
group = group->next;
} while (group != sd->groups);
 
+#ifdef CONFIG_VST
+   if (idle == NOT_IDLE && this_load > SCHED_LOAD_SCALE) {
+   int i;
+
+   for_each_cpu_mask(i, wakemask) {
+   spin_lock(_rq(i)->lock);
+   resched_task(cpu_rq(i)->idle);
+   spin_unlock(_rq(i)->lock);
+   }
+   }
+#endif
+
if (!busiest || 

VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
Hi,
VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless 
regular (local) timer ticks when a CPU is idle.

I think a potential area which VST may need to address is 
scheduler load balance. If idle CPUs stop taking local timer ticks for 
some time, then during that period it could cause the various runqueues to 
go out of balance, since the idle CPUs will no longer pull tasks from 
non-idle CPUs. 

Do we care about this imbalance? Especially considering that most 
implementations will let the idle CPUs sleep only for some max duration
(~900 ms in case of x86).

If we do care about this imbalance, then we could hope that the balance logic
present in try_to_wake_up and sched_exec may avoid this imbalance, but can we 
bank upon these events to restore the runqueue balance?

If we cannot, then I had something in mind on these lines:

1. A non-idle CPU (having nr_running  1) can wakeup a idle sleeping CPU if it 
   finds that the sleeping CPU has not balanced itself for it's 
   balance_interval period.

2. It would be nice to minimize the cross-domain wakeups. For ex: we may want 
   to avoid a non-idle CPU in node B sending a wakeup to a idle sleeping CPU in 
   another node A, when this wakeup could have been sent from another non-idle
   CPU in node A itself. 
 
That is why I have imposed the condition for sending wakeup only when
   a whole sched_group of CPUs are sleeping in a domain. We wake one of them 
   up. The chosen one is one which has not balanced itself for 
   balance_interval period.

I did think about avoiding all these and putting some hooks in 
wake_up_new_task, to wakeup the sleeping CPUs. But the problem is 
the woken-up CPU may refuse to pull any tasks and go to sleep again
if it has balanced itself in the domain recently (balance_interval).


Comments?

Patch (not fully-tested) against 2.6.11 follows.


---

 linux-2.6.11-vatsa/kernel/sched.c |   52 ++
 1 files changed, 52 insertions(+)

diff -puN kernel/sched.c~vst-sched_load_balance kernel/sched.c
--- linux-2.6.11/kernel/sched.c~vst-sched_load_balance  2005-04-07 
17:51:34.0 +0530
+++ linux-2.6.11-vatsa/kernel/sched.c   2005-04-07 17:56:18.0 +0530
@@ -1774,9 +1774,17 @@ find_busiest_group(struct sched_domain *
 {
struct sched_group *busiest = NULL, *this = NULL, *group = sd-groups;
unsigned long max_load, avg_load, total_load, this_load, total_pwr;
+#ifdef CONFIG_VST
+   int grp_sleeping;
+   cpumask_t tmpmask, wakemask;
+#endif
 
max_load = this_load = total_load = total_pwr = 0;
 
+#ifdef CONFIG_VST
+   cpus_clear(wakemask);
+#endif
+
do {
unsigned long load;
int local_group;
@@ -1787,7 +1795,20 @@ find_busiest_group(struct sched_domain *
/* Tally up the load of all CPUs in the group */
avg_load = 0;
 
+#ifdef CONFIG_VST
+   grp_sleeping = 0;
+   cpus_and(tmpmask, group-cpumask, nohz_cpu_mask);
+   if (cpus_equal(tmpmask, group-cpumask))
+   grp_sleeping = 1;
+#endif
+
for_each_cpu_mask(i, group-cpumask) {
+#ifdef CONFIG_VST
+   int cpu = smp_processor_id();
+   struct sched_domain *sd1;
+   unsigned long interval;
+   int woken = 0;
+#endif
/* Bias balancing toward cpus of our domain */
if (local_group)
load = target_load(i);
@@ -1796,6 +1817,25 @@ find_busiest_group(struct sched_domain *
 
nr_cpus++;
avg_load += load;
+
+#ifdef CONFIG_VST
+   if (idle != NOT_IDLE || !grp_sleeping ||
+   (grp_sleeping  woken))
+   continue;
+
+   sd1 = sd + (i-cpu);
+   interval = sd1-balance_interval;
+
+   /* scale ms to jiffies */
+   interval = msecs_to_jiffies(interval);
+   if (unlikely(!interval))
+   interval = 1;
+
+   if (jiffies - sd1-last_balance = interval) {
+   woken = 1;
+   cpu_set(i, wakemask);
+   }
+#endif
}
 
if (!nr_cpus)
@@ -1819,6 +1859,18 @@ nextgroup:
group = group-next;
} while (group != sd-groups);
 
+#ifdef CONFIG_VST
+   if (idle == NOT_IDLE  this_load  SCHED_LOAD_SCALE) {
+   int i;
+
+   for_each_cpu_mask(i, wakemask) {
+   spin_lock(cpu_rq(i)-lock);
+   resched_task(cpu_rq(i)-idle);
+   spin_unlock(cpu_rq(i)-lock);
+   }
+   }
+#endif
+
if (!busiest || this_load = max_load)
   

Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
I think a potential area which VST may need to address is 
scheduler load balance. If idle CPUs stop taking local timer ticks for 
some time, then during that period it could cause the various runqueues to 
go out of balance, since the idle CPUs will no longer pull tasks from 
non-idle CPUs. 

Yep.
Do we care about this imbalance? Especially considering that most 
implementations will let the idle CPUs sleep only for some max duration
(~900 ms in case of x86).

I think we do care, yes. It could be pretty harmful to sleep for
even a few 10s of ms on a regular basis for some workloads. Although
I guess many of those will be covered by try_to_wake_up events...
Not sure in practice, I would imagine it will hurt some multiprocessor
workloads.
If we do care about this imbalance, then we could hope that the balance logic
present in try_to_wake_up and sched_exec may avoid this imbalance, but can we 
bank upon these events to restore the runqueue balance?

If we cannot, then I had something in mind on these lines:
1. A non-idle CPU (having nr_running  1) can wakeup a idle sleeping CPU if it 
   finds that the sleeping CPU has not balanced itself for it's 
   balance_interval period.

2. It would be nice to minimize the cross-domain wakeups. For ex: we may want 
   to avoid a non-idle CPU in node B sending a wakeup to a idle sleeping CPU in 
   another node A, when this wakeup could have been sent from another non-idle
   CPU in node A itself. 
 
3. This is exactly one of the situations that the balancing backoff code
   was designed for. Can you just schedule interrupts to fire when the
   next balance interval has passed? This may require some adjustments to
   the backoff code in order to get good powersaving, but it would be the
   cleanest approach from the scheduler's point of view.
Nick
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
On Thu, Apr 07, 2005 at 11:07:55PM +1000, Nick Piggin wrote:
 3. This is exactly one of the situations that the balancing backoff code
was designed for. Can you just schedule interrupts to fire when the
next balance interval has passed? This may require some adjustments to
the backoff code in order to get good powersaving, but it would be the
cleanest approach from the scheduler's point of view.


Hmm ..I guess we could restrict the max time a idle CPU will sleep taking
into account its balance interval. But whatever heuristics we follow to 
maximize balance_interval of about-to-sleep idle CPU, don't we still run the 
risk of idle cpu being woken up and going immediately back to sleep (because 
there was no imbalance)?

Moreover we may be greatly reducing the amount of time a CPU is allowed to 
sleep this way ...






-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote:
Hmm ..I guess we could restrict the max time a idle CPU will sleep taking
into account its balance interval. But whatever heuristics we follow to 
maximize balance_interval of about-to-sleep idle CPU, don't we still run the 
risk of idle cpu being woken up and going immediately back to sleep (because 
there was no imbalance)?

Yep.
Moreover we may be greatly reducing the amount of time a CPU is allowed to 
sleep this way ...

Yes. I was assuming you get some kind of fairly rapidly diminishing
efficiency return curve based on your maximum sleep time. If that is
not so, then I agree this wouldn't be the best method.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Ingo Molnar

* Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

 Hi,
   VST patch (http://lwn.net/Articles/118693/) attempts to avoid useless 
 regular (local) timer ticks when a CPU is idle.
 
 I think a potential area which VST may need to address is scheduler 
 load balance. If idle CPUs stop taking local timer ticks for some 
 time, then during that period it could cause the various runqueues to 
 go out of balance, since the idle CPUs will no longer pull tasks from 
 non-idle CPUs.
 
 Do we care about this imbalance? Especially considering that most 
 implementations will let the idle CPUs sleep only for some max 
 duration (~900 ms in case of x86).

yeah, we care about this imbalance, it would materially change the 
scheduling logic, which side-effect we dont want. Interaction with VST 
is not a big issue right now because this only matters on SMP boxes 
which is a rare (but not unprecedented) target for embedded platforms.  

One solution would be to add an exponential backoff would be ok (as Nick 
suggested too), not an unconditional 'we wont fire a timer interrupt for 
the next 10 seconds' logic. It still impacts scheduling though.

Another, more effective, less intrusive but also more complex approach 
would be to make a distinction between 'totally idle' and 'partially 
idle or busy' system states. When all CPUs are idle then all timer irqs 
may be stopped and full VST logic applies. When at least one CPU is 
busy, all the other CPUs may still be put to sleep completely and 
immediately, but the busy CPU(s) have to take over a 'watchdog' role, 
and need to run the 'do the idle CPUs need new tasks' balancing 
functions. I.e. the scheduling function of other CPUs is migrated to 
busy CPUs. If there are no busy CPUs then there's no work, so this ought 
to be simple on the VST side. This needs some reorganization on the 
scheduler side but ought to be doable as well.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
[Sorry about sending my response from a different account. Can't seem
to access my ibm account right now]

* Ingo wrote:

 Another, more effective, less intrusive but also more complex approach
 would be to make a distinction between 'totally idle' and 'partially
 idle or busy' system states. When all CPUs are idle then all timer irqs
 may be stopped and full VST logic applies. When at least one CPU is
 busy, all the other CPUs may still be put to sleep completely and
 immediately, but the busy CPU(s) have to take over a 'watchdog' role,
 and need to run the 'do the idle CPUs need new tasks' balancing
 functions. I.e. the scheduling function of other CPUs is migrated to
 busy CPUs. If there are no busy CPUs then there's no work, so this ought
 to be simple on the VST side. This needs some reorganization on the
 scheduler side but ought to be doable as well.


Hmm ..I think this is the approach that I have followed in my patch, where
busy CPUs act as watchdogs and wakeup sleeping CPUs at an appropriate
time. The appropriate time is currently based on the busy CPU's load
being greater than 1 and the sleeping CPU not having balanced for its
minimum balance_interval.

Do you have any other suggestions on how the watchdog function should
be implemented?

- vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VST and Sched Load Balance

2005-04-07 Thread Srivatsa Vaddagiri
On Thu, Apr 07, 2005 at 05:10:24PM +0200, Ingo Molnar wrote:
 Interaction with VST is not a big issue right now because this only matters 
 on SMP boxes which is a rare (but not unprecedented) target for embedded 
 platforms.  

Well, I don't think VST is targetting just power management in embedded 
platforms. Even (virtualized) servers will benefit from this patch, by
making use of the (virtual) CPU resources more efficiently.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/