Re: [RFC] Patch: dynticks: idle load balancing

2007-02-07 Thread Siddha, Suresh B
On Tue, Jan 30, 2007 at 01:57:09PM -0800, Siddha, Suresh B wrote:
> Please let me know if you still see this issue with the latest -rt kernel.
> 
> On Tue, Jan 16, 2007 at 12:35:05PM +0100, Ingo Molnar wrote:
> > on the latest -rt kernel, when the dynticks load-balancer is enabled, 
> > then a dual-core Core2 Duo test-system increases its irq rate from the 
> > normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
> > Any idea what's going on? I'll disable the load balancer for now.

Ok. got time to look into this.

The answer is simple. load_balancing in the recent kernels is happening
using SCHED_SOFTIRQ and in -rt tree that happens not in the idle process
context but in the context of softirqd for SCHED_SOFTIRQ.

This breaks the dynticks load balancer and also the regular idle load balancing
too :(

Am on to fixing the problem now :)

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2007-02-07 Thread Siddha, Suresh B
On Tue, Jan 30, 2007 at 01:57:09PM -0800, Siddha, Suresh B wrote:
 Please let me know if you still see this issue with the latest -rt kernel.
 
 On Tue, Jan 16, 2007 at 12:35:05PM +0100, Ingo Molnar wrote:
  on the latest -rt kernel, when the dynticks load-balancer is enabled, 
  then a dual-core Core2 Duo test-system increases its irq rate from the 
  normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
  Any idea what's going on? I'll disable the load balancer for now.

Ok. got time to look into this.

The answer is simple. load_balancing in the recent kernels is happening
using SCHED_SOFTIRQ and in -rt tree that happens not in the idle process
context but in the context of softirqd for SCHED_SOFTIRQ.

This breaks the dynticks load balancer and also the regular idle load balancing
too :(

Am on to fixing the problem now :)

thanks,
suresh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2007-01-30 Thread Siddha, Suresh B
Sorry for the delayed response. I was away on vacation.

Please let me know if you still see this issue with the latest -rt kernel.

thanks,
suresh

On Tue, Jan 16, 2007 at 12:35:05PM +0100, Ingo Molnar wrote:
> 
> Suresh,
> 
> on the latest -rt kernel, when the dynticks load-balancer is enabled, 
> then a dual-core Core2 Duo test-system increases its irq rate from the 
> normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
> Any idea what's going on? I'll disable the load balancer for now.
> 
>   Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2007-01-30 Thread Siddha, Suresh B
Sorry for the delayed response. I was away on vacation.

Please let me know if you still see this issue with the latest -rt kernel.

thanks,
suresh

On Tue, Jan 16, 2007 at 12:35:05PM +0100, Ingo Molnar wrote:
 
 Suresh,
 
 on the latest -rt kernel, when the dynticks load-balancer is enabled, 
 then a dual-core Core2 Duo test-system increases its irq rate from the 
 normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
 Any idea what's going on? I'll disable the load balancer for now.
 
   Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2007-01-16 Thread Ingo Molnar

Suresh,

on the latest -rt kernel, when the dynticks load-balancer is enabled, 
then a dual-core Core2 Duo test-system increases its irq rate from the 
normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
Any idea what's going on? I'll disable the load balancer for now.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2007-01-16 Thread Ingo Molnar

Suresh,

on the latest -rt kernel, when the dynticks load-balancer is enabled, 
then a dual-core Core2 Duo test-system increases its irq rate from the 
normal 15/17 per second to 300-400/sec - on a completely idle system(!). 
Any idea what's going on? I'll disable the load balancer for now.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Steven Rostedt
On Mon, 2006-12-11 at 15:53 -0800, Siddha, Suresh B wrote:

> 
> Comments and review feedback welcome. Minimal testing done on couple of
> i386 platforms. Perf testing yet to be done.

Nice work!

> 
> thanks,
> suresh
> ---


> diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h
> --- linux-2.6.19-mm1/include/linux/sched.h2006-12-12 06:39:22.0 
> -0800
> +++ linux/include/linux/sched.h   2006-12-12 06:51:03.0 -0800
> @@ -195,6 +195,14 @@ extern void sched_init_smp(void);
>  extern void init_idle(struct task_struct *idle, int cpu);
>  
>  extern cpumask_t nohz_cpu_mask;
> +#ifdef CONFIG_SMP
> +extern int select_notick_load_balancer(int cpu);
> +#else
> +static inline int select_notick_load_balancer(int cpu)

Later on in the actual code, the parameter is named stop_tick, which
makes sense. You should change the name here too so it's not confusing
when looking later on at the code.

> +{
> + return 0;
> +}
> +#endif

[...]

> +
> +/*
> + * This routine will try to nominate the ilb (idle load balancing)
> + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> + * load balancing on behalf of all those cpus. If all the cpus in the system
> + * go into this tickless mode, then there will be no ilb owner (as there is
> + * no need for one) and all the cpus will sleep till the next wakeup event
> + * arrives...
> + *
> + * For the ilb owner, tick is not stopped. And this tick will be used
> + * for idle load balancing. ilb owner will still be part of
> + * notick.cpu_mask..
> + *
> + * While stopping the tick, this cpu will become the ilb owner if there
> + * is no other owner. And will be the owner till that cpu becomes busy
> + * or if all cpus in the system stop their ticks at which point
> + * there is no need for ilb owner.
> + *
> + * When the ilb owner becomes busy, it nominates another owner, during the
> + * schedule()
> + */
> +int select_notick_load_balancer(int stop_tick)
> +{
> + int cpu = smp_processor_id();
> +

[...]

> +#ifdef CONFIG_NO_HZ
> + if (idle_cpu(local_cpu) && notick.load_balancer == local_cpu &&
> + !cpus_empty(cpus))
> + goto restart;
> +#endif
>  }
>  #else
>  /*
> @@ -3562,6 +3669,21 @@ switch_tasks:
>   ++*switch_count;
>  
>   prepare_task_switch(rq, next);
> +#if defined(CONFIG_HZ) && defined(CONFIG_SMP)

Ah! so this is where the CONFIG_NO_HZ mistake came in ;)


> + if (prev == rq->idle && notick.load_balancer == -1) {
> + /*
> +  * simple selection for now: Nominate the first cpu in
> +  * the notick list to be the next ilb owner.
> +  *
> +  * TBD: Traverse the sched domains and nominate
> +  * the nearest cpu in the notick.cpu_mask.
> +  */
> + int ilb = first_cpu(notick.cpu_mask);
> +
> + if (ilb != NR_CPUS)
> + resched_cpu(ilb);
> + }
> +#endif
>   prev = context_switch(rq, prev, next);


-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Siddha, Suresh B
On Tue, Dec 19, 2006 at 09:12:48PM +0100, Ingo Molnar wrote:
>  restart:
>   if (idle_cpu(local_cpu) && notick.load_balancer == local_cpu) {
>   this_cpu = first_cpu(cpus);
> + if (unlikely(this_cpu >= NR_CPUS))
> + return;

oops.

There is window when local_cpu is cleared from notick.cpumask
but the notick.load_balancer still points to local_cpu..
This can also be corrected by first resetting the notick.load_balancer
before clearing that cpu from notick.cpumask in select_notick_load_balancer()

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Ingo Molnar

find below another bugfix for the dynticks-sched patch. (this bug caused 
crashed under a stresstest)

Ingo

---
 kernel/sched.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3015,6 +3015,8 @@ static void run_rebalance_domains(struct
 restart:
if (idle_cpu(local_cpu) && notick.load_balancer == local_cpu) {
this_cpu = first_cpu(cpus);
+   if (unlikely(this_cpu >= NR_CPUS))
+   return;
this_rq = cpu_rq(this_cpu);
cpu_clear(this_cpu, cpus);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Ingo Molnar

find below another bugfix for the dynticks-sched patch. (this bug caused 
crashed under a stresstest)

Ingo

---
 kernel/sched.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3015,6 +3015,8 @@ static void run_rebalance_domains(struct
 restart:
if (idle_cpu(local_cpu)  notick.load_balancer == local_cpu) {
this_cpu = first_cpu(cpus);
+   if (unlikely(this_cpu = NR_CPUS))
+   return;
this_rq = cpu_rq(this_cpu);
cpu_clear(this_cpu, cpus);
}

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Siddha, Suresh B
On Tue, Dec 19, 2006 at 09:12:48PM +0100, Ingo Molnar wrote:
  restart:
   if (idle_cpu(local_cpu)  notick.load_balancer == local_cpu) {
   this_cpu = first_cpu(cpus);
 + if (unlikely(this_cpu = NR_CPUS))
 + return;

oops.

There is window when local_cpu is cleared from notick.cpumask
but the notick.load_balancer still points to local_cpu..
This can also be corrected by first resetting the notick.load_balancer
before clearing that cpu from notick.cpumask in select_notick_load_balancer()

thanks,
suresh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-19 Thread Steven Rostedt
On Mon, 2006-12-11 at 15:53 -0800, Siddha, Suresh B wrote:

 
 Comments and review feedback welcome. Minimal testing done on couple of
 i386 platforms. Perf testing yet to be done.

Nice work!

 
 thanks,
 suresh
 ---


 diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h
 --- linux-2.6.19-mm1/include/linux/sched.h2006-12-12 06:39:22.0 
 -0800
 +++ linux/include/linux/sched.h   2006-12-12 06:51:03.0 -0800
 @@ -195,6 +195,14 @@ extern void sched_init_smp(void);
  extern void init_idle(struct task_struct *idle, int cpu);
  
  extern cpumask_t nohz_cpu_mask;
 +#ifdef CONFIG_SMP
 +extern int select_notick_load_balancer(int cpu);
 +#else
 +static inline int select_notick_load_balancer(int cpu)

Later on in the actual code, the parameter is named stop_tick, which
makes sense. You should change the name here too so it's not confusing
when looking later on at the code.

 +{
 + return 0;
 +}
 +#endif

[...]

 +
 +/*
 + * This routine will try to nominate the ilb (idle load balancing)
 + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
 + * load balancing on behalf of all those cpus. If all the cpus in the system
 + * go into this tickless mode, then there will be no ilb owner (as there is
 + * no need for one) and all the cpus will sleep till the next wakeup event
 + * arrives...
 + *
 + * For the ilb owner, tick is not stopped. And this tick will be used
 + * for idle load balancing. ilb owner will still be part of
 + * notick.cpu_mask..
 + *
 + * While stopping the tick, this cpu will become the ilb owner if there
 + * is no other owner. And will be the owner till that cpu becomes busy
 + * or if all cpus in the system stop their ticks at which point
 + * there is no need for ilb owner.
 + *
 + * When the ilb owner becomes busy, it nominates another owner, during the
 + * schedule()
 + */
 +int select_notick_load_balancer(int stop_tick)
 +{
 + int cpu = smp_processor_id();
 +

[...]

 +#ifdef CONFIG_NO_HZ
 + if (idle_cpu(local_cpu)  notick.load_balancer == local_cpu 
 + !cpus_empty(cpus))
 + goto restart;
 +#endif
  }
  #else
  /*
 @@ -3562,6 +3669,21 @@ switch_tasks:
   ++*switch_count;
  
   prepare_task_switch(rq, next);
 +#if defined(CONFIG_HZ)  defined(CONFIG_SMP)

Ah! so this is where the CONFIG_NO_HZ mistake came in ;)


 + if (prev == rq-idle  notick.load_balancer == -1) {
 + /*
 +  * simple selection for now: Nominate the first cpu in
 +  * the notick list to be the next ilb owner.
 +  *
 +  * TBD: Traverse the sched domains and nominate
 +  * the nearest cpu in the notick.cpu_mask.
 +  */
 + int ilb = first_cpu(notick.cpu_mask);
 +
 + if (ilb != NR_CPUS)
 + resched_cpu(ilb);
 + }
 +#endif
   prev = context_switch(rq, prev, next);


-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> > the other problem is load_balance(): there this_rq is locked and you 
> > call resched_cpu() unconditionally.
> 
> But here resched_cpu() was called after double_rq_unlock().

yeah, you are right - the schedule() one is/was the only problem.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Siddha, Suresh B
On Thu, Dec 14, 2006 at 12:31:57AM +0100, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
> > > there's another bug as well: in schedule() resched_cpu() is called with 
> > > the current runqueue held in two places, which is deadlock potential. 
> > 
> > resched_cpu() was getting called after prepare_task_switch() which 
> > releases the current runqueue lock. Isn't it?
> 
> no, it doesnt release it. The finish stage is what releases it.

I see.

> the other problem is load_balance(): there this_rq is locked and you 
> call resched_cpu() unconditionally.

But here resched_cpu() was called after double_rq_unlock().

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> The easiest fix for this is to use trylock - find the patch for that. 

FYI, i have released 2.6.19-rt14 with your patch integrated into it as 
well - it's looking good so far in my testing. (the yum repository will 
be updated in a few minutes.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Siddha, Suresh B
On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
> there's another bug as well: in schedule() resched_cpu() is called with 
> the current runqueue held in two places, which is deadlock potential. 

resched_cpu() was getting called after prepare_task_switch() which
releases the current runqueue lock. Isn't it?

> The easiest fix for this is to use trylock - find the patch for that. 
> This is a hint only anyway - and if a CPU is idle its runqueue will be 

Though I don't see a potential deadlock, I like this optimization.

thanks,
suresh

> lockable. (fixing it via double-locking is easy in the first call site, 
> but the second one looks harder)
> 
>   Ingo
> 
> Index: linux/kernel/sched.c
> ===
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -1167,12 +1167,14 @@ static void resched_task(struct task_str
>   if (!tsk_is_polling(p))
>   smp_send_reschedule(cpu);
>  }
> +
>  static void resched_cpu(int cpu)
>  {
>   struct rq *rq = cpu_rq(cpu);
> - unsigned int flags;
> + unsigned long flags;
>  
> - spin_lock_irqsave(>lock, flags);
> + if (!spin_trylock_irqsave(>lock, flags))
> + return;
>   resched_task(cpu_curr(cpu));
>   spin_unlock_irqrestore(>lock, flags);
>  }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
> > there's another bug as well: in schedule() resched_cpu() is called with 
> > the current runqueue held in two places, which is deadlock potential. 
> 
> resched_cpu() was getting called after prepare_task_switch() which 
> releases the current runqueue lock. Isn't it?

no, it doesnt release it. The finish stage is what releases it.

the other problem is load_balance(): there this_rq is locked and you 
call resched_cpu() unconditionally.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > Appended patch attempts to fix the process idle load balancing in 
> > the presence of dynticks. cpus for which ticks are stopped will 
> > sleep till the next event wakes it up. Potentially these sleeps can 
> > be for large durations and during which today, there is no idle load 
> > balancing being done. There was some discussion happened(last year) 
> > on this topic on lkml, where two main approaches were gettting 
> > debated. One is to back off the idle load balancing for bigger 
> > intervals and the second is a watchdog mechanism where the busy cpu 
> > will trigger the load balance on an idle cpu.  Both of these 
> > mechanisms have its drawbacks.
> 
> nice work! I have added your patch to -rt. Btw., it needs the patch 
> below to work on 64-bit.

there's another bug as well: in schedule() resched_cpu() is called with 
the current runqueue held in two places, which is deadlock potential. 
The easiest fix for this is to use trylock - find the patch for that. 
This is a hint only anyway - and if a CPU is idle its runqueue will be 
lockable. (fixing it via double-locking is easy in the first call site, 
but the second one looks harder)

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1167,12 +1167,14 @@ static void resched_task(struct task_str
if (!tsk_is_polling(p))
smp_send_reschedule(cpu);
 }
+
 static void resched_cpu(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   unsigned int flags;
+   unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   if (!spin_trylock_irqsave(>lock, flags))
+   return;
resched_task(cpu_curr(cpu));
spin_unlock_irqrestore(>lock, flags);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> Appended patch attempts to fix the process idle load balancing in the 
> presence of dynticks. cpus for which ticks are stopped will sleep till 
> the next event wakes it up. Potentially these sleeps can be for large 
> durations and during which today, there is no idle load balancing 
> being done. There was some discussion happened(last year) on this 
> topic on lkml, where two main approaches were gettting debated. One is 
> to back off the idle load balancing for bigger intervals and the 
> second is a watchdog mechanism where the busy cpu will trigger the 
> load balance on an idle cpu.  Both of these mechanisms have its 
> drawbacks.

nice work! I have added your patch to -rt. Btw., it needs the patch 
below to work on 64-bit.

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1170,7 +1170,7 @@ static void resched_task(struct task_str
 static void resched_cpu(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   unsigned int flags;
+   unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
resched_task(cpu_curr(cpu));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B [EMAIL PROTECTED] wrote:

 Appended patch attempts to fix the process idle load balancing in the 
 presence of dynticks. cpus for which ticks are stopped will sleep till 
 the next event wakes it up. Potentially these sleeps can be for large 
 durations and during which today, there is no idle load balancing 
 being done. There was some discussion happened(last year) on this 
 topic on lkml, where two main approaches were gettting debated. One is 
 to back off the idle load balancing for bigger intervals and the 
 second is a watchdog mechanism where the busy cpu will trigger the 
 load balance on an idle cpu.  Both of these mechanisms have its 
 drawbacks.

nice work! I have added your patch to -rt. Btw., it needs the patch 
below to work on 64-bit.

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1170,7 +1170,7 @@ static void resched_task(struct task_str
 static void resched_cpu(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   unsigned int flags;
+   unsigned long flags;
 
spin_lock_irqsave(rq-lock, flags);
resched_task(cpu_curr(cpu));
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  Appended patch attempts to fix the process idle load balancing in 
  the presence of dynticks. cpus for which ticks are stopped will 
  sleep till the next event wakes it up. Potentially these sleeps can 
  be for large durations and during which today, there is no idle load 
  balancing being done. There was some discussion happened(last year) 
  on this topic on lkml, where two main approaches were gettting 
  debated. One is to back off the idle load balancing for bigger 
  intervals and the second is a watchdog mechanism where the busy cpu 
  will trigger the load balance on an idle cpu.  Both of these 
  mechanisms have its drawbacks.
 
 nice work! I have added your patch to -rt. Btw., it needs the patch 
 below to work on 64-bit.

there's another bug as well: in schedule() resched_cpu() is called with 
the current runqueue held in two places, which is deadlock potential. 
The easiest fix for this is to use trylock - find the patch for that. 
This is a hint only anyway - and if a CPU is idle its runqueue will be 
lockable. (fixing it via double-locking is easy in the first call site, 
but the second one looks harder)

Ingo

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1167,12 +1167,14 @@ static void resched_task(struct task_str
if (!tsk_is_polling(p))
smp_send_reschedule(cpu);
 }
+
 static void resched_cpu(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   unsigned int flags;
+   unsigned long flags;
 
-   spin_lock_irqsave(rq-lock, flags);
+   if (!spin_trylock_irqsave(rq-lock, flags))
+   return;
resched_task(cpu_curr(cpu));
spin_unlock_irqrestore(rq-lock, flags);
 }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B [EMAIL PROTECTED] wrote:

 On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
  there's another bug as well: in schedule() resched_cpu() is called with 
  the current runqueue held in two places, which is deadlock potential. 
 
 resched_cpu() was getting called after prepare_task_switch() which 
 releases the current runqueue lock. Isn't it?

no, it doesnt release it. The finish stage is what releases it.

the other problem is load_balance(): there this_rq is locked and you 
call resched_cpu() unconditionally.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Siddha, Suresh B
On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
 there's another bug as well: in schedule() resched_cpu() is called with 
 the current runqueue held in two places, which is deadlock potential. 

resched_cpu() was getting called after prepare_task_switch() which
releases the current runqueue lock. Isn't it?

 The easiest fix for this is to use trylock - find the patch for that. 
 This is a hint only anyway - and if a CPU is idle its runqueue will be 

Though I don't see a potential deadlock, I like this optimization.

thanks,
suresh

 lockable. (fixing it via double-locking is easy in the first call site, 
 but the second one looks harder)
 
   Ingo
 
 Index: linux/kernel/sched.c
 ===
 --- linux.orig/kernel/sched.c
 +++ linux/kernel/sched.c
 @@ -1167,12 +1167,14 @@ static void resched_task(struct task_str
   if (!tsk_is_polling(p))
   smp_send_reschedule(cpu);
  }
 +
  static void resched_cpu(int cpu)
  {
   struct rq *rq = cpu_rq(cpu);
 - unsigned int flags;
 + unsigned long flags;
  
 - spin_lock_irqsave(rq-lock, flags);
 + if (!spin_trylock_irqsave(rq-lock, flags))
 + return;
   resched_task(cpu_curr(cpu));
   spin_unlock_irqrestore(rq-lock, flags);
  }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 The easiest fix for this is to use trylock - find the patch for that. 

FYI, i have released 2.6.19-rt14 with your patch integrated into it as 
well - it's looking good so far in my testing. (the yum repository will 
be updated in a few minutes.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Siddha, Suresh B
On Thu, Dec 14, 2006 at 12:31:57AM +0100, Ingo Molnar wrote:
 
 * Siddha, Suresh B [EMAIL PROTECTED] wrote:
 
  On Thu, Dec 14, 2006 at 12:13:16AM +0100, Ingo Molnar wrote:
   there's another bug as well: in schedule() resched_cpu() is called with 
   the current runqueue held in two places, which is deadlock potential. 
  
  resched_cpu() was getting called after prepare_task_switch() which 
  releases the current runqueue lock. Isn't it?
 
 no, it doesnt release it. The finish stage is what releases it.

I see.

 the other problem is load_balance(): there this_rq is locked and you 
 call resched_cpu() unconditionally.

But here resched_cpu() was called after double_rq_unlock().

thanks,
suresh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Patch: dynticks: idle load balancing

2006-12-13 Thread Ingo Molnar

* Siddha, Suresh B [EMAIL PROTECTED] wrote:

  the other problem is load_balance(): there this_rq is locked and you 
  call resched_cpu() unconditionally.
 
 But here resched_cpu() was called after double_rq_unlock().

yeah, you are right - the schedule() one is/was the only problem.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Patch: dynticks: idle load balancing

2006-12-11 Thread Siddha, Suresh B
Appended patch attempts to fix the process idle load balancing in the
presence of dynticks. cpus for which ticks are stopped will sleep
till the next event wakes it up. Potentially these sleeps can be for large
durations and during which today, there is no idle load balancing being done.
There was some discussion happened(last year) on this topic on lkml, where two
main approaches were gettting debated. One is to back off the idle load
balancing for bigger intervals and the second is a watchdog mechanism where
the busy cpu will trigger the load balance on an idle cpu.  Both of these
mechanisms have its drawbacks.

For the first mechanism, defining the interval will be tricky and if it is too
much, then the response time will also be high and we won't be able to respond
for sudden changes in the load. If it is small, then we won't be able to save
power.

Second mechanism will be making changes to the busy load balancing(which will
be doing more load balancing work, while the current busy task on that cpu
is eagerly waiting for the cpu cycles). Also busy load balancing intervals are
quite different from idle load balancing intervals. Similar to the first
mechanism, we won't be able to respond quickly to change in loads. And also
figuring out that a cpu is heavily loaded and where that extra load need to
moved, is some what difficult job, especially so in the case of hierarchical
scheduler domains.

Appended patch takes a third route which nominates an owner among
the idle cpus, which does the idle load balancing on behalf of the other
idle cpus. And once all the cpus are completely idle, then we can stop
this idle load balancing too. Checks added in fast path are minimized.
Whenever there are busy cpus in the system, there will be an owner(idle cpu)
doing the system wide idle load balancing. If we nominate this owner
carefully(like an idle core in a busy package), we can minimize the power
wasted also.

Some of the questions I have are: Will this single owner become bottleneck?
Idle load balancing is now serialized among all the idle cpus. This perhaps
will add some delays in load movement to different idle cpus. IMO, these
delays will be small and tolerable. If this comes out to be a concern, we
can offload the actual load movement work to the idle cpu, where the load
will be finally run.

Any more optimizations we can do to start/stop_sched_tick() routines to track
this info more efficiently?

Comments and review feedback welcome. Minimal testing done on couple of
i386 platforms. Perf testing yet to be done.

thanks,
suresh
---
Track the cpus for which ticks are stopped and one among these cpus will
be doing the idle load balancing on behalf of all the remaining cpus.
If the ticks are stopped for all the cpus in the system, idle load balancing
will stop at that moment. And restarts as soon as there is a busy cpu in
the system.

TBD: Select the appropriate idle cpu for doing this idle load balancing.
Such as an idle core in a busy package(which has a busy core). Selecting an idle
thread as the owner when there are other busy thread siblings is
not a good idea.

We can also think of offloading the task movements from the idle load balancing
owner to the idle cpu on behalf of which this work is being done.

Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.19-mm1/include/linux/sched.h  2006-12-12 06:39:22.0 
-0800
+++ linux/include/linux/sched.h 2006-12-12 06:51:03.0 -0800
@@ -195,6 +195,14 @@ extern void sched_init_smp(void);
 extern void init_idle(struct task_struct *idle, int cpu);
 
 extern cpumask_t nohz_cpu_mask;
+#ifdef CONFIG_SMP
+extern int select_notick_load_balancer(int cpu);
+#else
+static inline int select_notick_load_balancer(int cpu)
+{
+   return 0;
+}
+#endif
 
 /*
  * Only dump TASK_* tasks. (-1 for all tasks)
diff -pNru linux-2.6.19-mm1/kernel/hrtimer.c linux/kernel/hrtimer.c
--- linux-2.6.19-mm1/kernel/hrtimer.c   2006-12-12 06:39:22.0 -0800
+++ linux/kernel/hrtimer.c  2006-12-12 06:51:03.0 -0800
@@ -600,6 +600,9 @@ void hrtimer_stop_sched_tick(void)
 * the scheduler tick in hrtimer_restart_sched_tick.
 */
if (!cpu_base->tick_stopped) {
+   if (select_notick_load_balancer(1))
+   goto end;
+
cpu_base->idle_tick = cpu_base->sched_timer.expires;
cpu_base->tick_stopped = 1;
cpu_base->idle_jiffies = last_jiffies;
@@ -616,6 +619,7 @@ void hrtimer_stop_sched_tick(void)
raise_softirq_irqoff(TIMER_SOFTIRQ);
}
 
+end:
local_irq_restore(flags);
 }
 
@@ -630,6 +634,8 @@ void hrtimer_restart_sched_tick(void)
unsigned long ticks;
ktime_t now, delta;
 
+   select_notick_load_balancer(0);
+
if (!cpu_base->hres_active || 

[RFC] Patch: dynticks: idle load balancing

2006-12-11 Thread Siddha, Suresh B
Appended patch attempts to fix the process idle load balancing in the
presence of dynticks. cpus for which ticks are stopped will sleep
till the next event wakes it up. Potentially these sleeps can be for large
durations and during which today, there is no idle load balancing being done.
There was some discussion happened(last year) on this topic on lkml, where two
main approaches were gettting debated. One is to back off the idle load
balancing for bigger intervals and the second is a watchdog mechanism where
the busy cpu will trigger the load balance on an idle cpu.  Both of these
mechanisms have its drawbacks.

For the first mechanism, defining the interval will be tricky and if it is too
much, then the response time will also be high and we won't be able to respond
for sudden changes in the load. If it is small, then we won't be able to save
power.

Second mechanism will be making changes to the busy load balancing(which will
be doing more load balancing work, while the current busy task on that cpu
is eagerly waiting for the cpu cycles). Also busy load balancing intervals are
quite different from idle load balancing intervals. Similar to the first
mechanism, we won't be able to respond quickly to change in loads. And also
figuring out that a cpu is heavily loaded and where that extra load need to
moved, is some what difficult job, especially so in the case of hierarchical
scheduler domains.

Appended patch takes a third route which nominates an owner among
the idle cpus, which does the idle load balancing on behalf of the other
idle cpus. And once all the cpus are completely idle, then we can stop
this idle load balancing too. Checks added in fast path are minimized.
Whenever there are busy cpus in the system, there will be an owner(idle cpu)
doing the system wide idle load balancing. If we nominate this owner
carefully(like an idle core in a busy package), we can minimize the power
wasted also.

Some of the questions I have are: Will this single owner become bottleneck?
Idle load balancing is now serialized among all the idle cpus. This perhaps
will add some delays in load movement to different idle cpus. IMO, these
delays will be small and tolerable. If this comes out to be a concern, we
can offload the actual load movement work to the idle cpu, where the load
will be finally run.

Any more optimizations we can do to start/stop_sched_tick() routines to track
this info more efficiently?

Comments and review feedback welcome. Minimal testing done on couple of
i386 platforms. Perf testing yet to be done.

thanks,
suresh
---
Track the cpus for which ticks are stopped and one among these cpus will
be doing the idle load balancing on behalf of all the remaining cpus.
If the ticks are stopped for all the cpus in the system, idle load balancing
will stop at that moment. And restarts as soon as there is a busy cpu in
the system.

TBD: Select the appropriate idle cpu for doing this idle load balancing.
Such as an idle core in a busy package(which has a busy core). Selecting an idle
thread as the owner when there are other busy thread siblings is
not a good idea.

We can also think of offloading the task movements from the idle load balancing
owner to the idle cpu on behalf of which this work is being done.

Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

diff -pNru linux-2.6.19-mm1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.19-mm1/include/linux/sched.h  2006-12-12 06:39:22.0 
-0800
+++ linux/include/linux/sched.h 2006-12-12 06:51:03.0 -0800
@@ -195,6 +195,14 @@ extern void sched_init_smp(void);
 extern void init_idle(struct task_struct *idle, int cpu);
 
 extern cpumask_t nohz_cpu_mask;
+#ifdef CONFIG_SMP
+extern int select_notick_load_balancer(int cpu);
+#else
+static inline int select_notick_load_balancer(int cpu)
+{
+   return 0;
+}
+#endif
 
 /*
  * Only dump TASK_* tasks. (-1 for all tasks)
diff -pNru linux-2.6.19-mm1/kernel/hrtimer.c linux/kernel/hrtimer.c
--- linux-2.6.19-mm1/kernel/hrtimer.c   2006-12-12 06:39:22.0 -0800
+++ linux/kernel/hrtimer.c  2006-12-12 06:51:03.0 -0800
@@ -600,6 +600,9 @@ void hrtimer_stop_sched_tick(void)
 * the scheduler tick in hrtimer_restart_sched_tick.
 */
if (!cpu_base-tick_stopped) {
+   if (select_notick_load_balancer(1))
+   goto end;
+
cpu_base-idle_tick = cpu_base-sched_timer.expires;
cpu_base-tick_stopped = 1;
cpu_base-idle_jiffies = last_jiffies;
@@ -616,6 +619,7 @@ void hrtimer_stop_sched_tick(void)
raise_softirq_irqoff(TIMER_SOFTIRQ);
}
 
+end:
local_irq_restore(flags);
 }
 
@@ -630,6 +634,8 @@ void hrtimer_restart_sched_tick(void)
unsigned long ticks;
ktime_t now, delta;
 
+   select_notick_load_balancer(0);
+
if (!cpu_base-hres_active ||