Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Andi Kleen

> Might also be useful to add an extra option to "top" to reduce the  
> polling frequency if the system is otherwise idle.  A fixed 30-sec  
> timer and a deferrable 1-sec timer or somesuch?

Hmm, i think the current implementation is per CPU. top really would
like to have one that applies to all CPUs though.

Thinking about it for sane user space semantics it would probably need
a global implementation anyways.

Perhaps it could use the same infrastructure as RCU does to handle this?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Kyle Moffett

On Mar 29, 2007, at 07:41:12, Andi Kleen wrote:
ondemand is the biggest offender and the patch below reduces the  
number of interrupts by 50% or more (depending on HZ) on different  
test systems here.


Cool!

Yes. There are quite a few other timers inside kernel that can be  
migrated. I will use timer_stats and track others and send in the  
patches soon.


Longer term it might make sense to even expose this as a option to  
user space. Maybe as a new timer in setitimer()? This might safe  
power with "wiggling desktop applets" too.


Might also be useful to add an extra option to "top" to reduce the  
polling frequency if the system is otherwise idle.  A fixed 30-sec  
timer and a deferrable 1-sec timer or somesuch?


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Andi Kleen

> ondemand is the biggest offender and the patch below reduces the number of
> interrupts by 50% or more (depending on HZ) on different test systems here.

Cool!
 
> Yes. There are quite a few other timers inside kernel that can be
> migrated. I will use timer_stats and track others and send in the patches
> soon.

Longer term it might make sense to even expose this as a option to user space.
Maybe as a new timer in setitimer()? This might safe power with "wiggling 
desktop 
applets" too.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Andi Kleen

 ondemand is the biggest offender and the patch below reduces the number of
 interrupts by 50% or more (depending on HZ) on different test systems here.

Cool!
 
 Yes. There are quite a few other timers inside kernel that can be
 migrated. I will use timer_stats and track others and send in the patches
 soon.

Longer term it might make sense to even expose this as a option to user space.
Maybe as a new timer in setitimer()? This might safe power with wiggling 
desktop 
applets too.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Kyle Moffett

On Mar 29, 2007, at 07:41:12, Andi Kleen wrote:
ondemand is the biggest offender and the patch below reduces the  
number of interrupts by 50% or more (depending on HZ) on different  
test systems here.


Cool!

Yes. There are quite a few other timers inside kernel that can be  
migrated. I will use timer_stats and track others and send in the  
patches soon.


Longer term it might make sense to even expose this as a option to  
user space. Maybe as a new timer in setitimer()? This might safe  
power with wiggling desktop applets too.


Might also be useful to add an extra option to top to reduce the  
polling frequency if the system is otherwise idle.  A fixed 30-sec  
timer and a deferrable 1-sec timer or somesuch?


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-29 Thread Andi Kleen

 Might also be useful to add an extra option to top to reduce the  
 polling frequency if the system is otherwise idle.  A fixed 30-sec  
 timer and a deferrable 1-sec timer or somesuch?

Hmm, i think the current implementation is per CPU. top really would
like to have one that applies to all CPUs though.

Thinking about it for sane user space semantics it would probably need
a global implementation anyways.

Perhaps it could use the same infrastructure as RCU does to handle this?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Venki Pallipadi
On Wed, Mar 28, 2007 at 05:01:59PM -0700, Andrew Morton wrote:
> On Wed, 28 Mar 2007 16:00:21 -0700
> Venki Pallipadi <[EMAIL PROTECTED]> wrote:
> 
> > Please drop the patch you included yesterday and two incremental patches and
> > use the patch below.
> 
> As you saw, I went and turned it into an incremental patch again.  It makes
> it easier to see what changed, but harder to see the whole thing.
> 
> > Introduce a new flag for timers - deferrable:
> 
> OK, but there's nothing in-kernel whcih actually uses this.
> 
> It would be good to identify some timer users which can be switched over (as
> many as possible, really) so this thing actually gets some runtime testing.

ondemand is the biggest offender and the patch below reduces the number of
interrupts by 50% or more (depending on HZ) on different test systems here.

Yes. There are quite a few other timers inside kernel that can be
migrated. I will use timer_stats and track others and send in the patches
soon.

Thanks,
Venki


--

Add a new deferrable delayed work init. This can be used to schedule work
that are 'unimportant' when CPU is idle and can be called later, when CPU
eventually comes out of idle.

Use this init in cpufreq ondemand governor.
 
Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: new/drivers/cpufreq/cpufreq_ondemand.c
===
--- new.orig/drivers/cpufreq/cpufreq_ondemand.c 2007-03-28 10:03:21.0 
-0800
+++ new/drivers/cpufreq/cpufreq_ondemand.c  2007-03-28 10:05:44.0 
-0800
@@ -470,7 +470,7 @@
dbs_info->enable = 1;
ondemand_powersave_bias_init();
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
-   INIT_DELAYED_WORK(_info->work, do_dbs_timer);
+   INIT_DELAYED_WORK_DEFERRABLE(_info->work, do_dbs_timer);
queue_delayed_work_on(dbs_info->cpu, kondemand_wq, _info->work,
  delay);
 }
Index: new/include/linux/workqueue.h
===
--- new.orig/include/linux/workqueue.h  2007-03-28 10:03:21.0 -0800
+++ new/include/linux/workqueue.h   2007-03-28 10:05:44.0 -0800
@@ -89,6 +89,12 @@
init_timer(&(_work)->timer);\
} while (0)
 
+#define INIT_DELAYED_WORK_DEFERRABLE(_work, _func) \
+   do {\
+   INIT_WORK(&(_work)->work, (_func)); \
+   init_timer_deferrable(&(_work)->timer); \
+   } while (0)
+
 /**
  * work_pending - Find out whether a work item is currently pending
  * @work: The work item in question
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Andrew Morton
On Wed, 28 Mar 2007 16:00:21 -0700
Venki Pallipadi <[EMAIL PROTECTED]> wrote:

> Please drop the patch you included yesterday and two incremental patches and
> use the patch below.

As you saw, I went and turned it into an incremental patch again.  It makes
it easier to see what changed, but harder to see the whole thing.

> Introduce a new flag for timers - deferrable:

OK, but there's nothing in-kernel whcih actually uses this.

It would be good to identify some timer users which can be switched over (as
many as possible, really) so this thing actually gets some runtime testing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Venki Pallipadi

Andrew,

Please drop the patch you included yesterday and two incremental patches and
use the patch below.

This patch is - yesterday's patch + Your tidy cleanup +
minor changes based on comments from Oleg and Andi. This is a lot
cleaner (and smaller) than earlier patches.

Thanks,
Venki


Introduce a new flag for timers - deferrable:
Timers that work normally when system is busy. But, will not cause CPU to
come out of idle (just to service this timer), when CPU is idle. Instead,
this timer will be serviced when CPU eventually wakes up with a subsequent
non-deferrable timer.

The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until next
event without any issues, this new timer can be used to setup timer event
for that routine. This, with dynticks, allows CPUs to be lazy, allowing them
to stay in idle for extended period of time by reducing unnecesary wakeup and
thereby reducing the power consumption.

This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this
deferrable timer flag. __next_timer_interrupt() function
skips over these deferrable timers when CPU looks for
next timer event for which it has to wake up.

This is exported by a new interface init_timer_deferrable() that can
be called in place of regular init_timer().

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

Index: new/kernel/timer.c
===
--- new.orig/kernel/timer.c 2007-03-22 16:27:44.0 -0800
+++ new/kernel/timer.c  2007-03-28 10:05:38.0 -0800
@@ -74,7 +74,7 @@
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
-} cacheline_aligned_in_smp;
+} cacheline_aligned;
 
 typedef struct tvec_t_base_s tvec_base_t;
 
@@ -82,6 +82,37 @@
 EXPORT_SYMBOL(boot_tvec_bases);
 static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = _tvec_bases;
 
+/*
+ * Note that all tvec_bases is 2 byte aligned and lower bit of
+ * base in timer_list is guaranteed to be zero. Use the LSB for
+ * the new flag to indicate whether the timer is deferrable
+ */
+#define TBASE_DEFERRABLE_FLAG  (0x1)
+
+/* Functions below help us manage 'deferrable' flag */
+static inline unsigned int tbase_get_deferrable(tvec_base_t *base)
+{
+   return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG);
+}
+
+static inline tvec_base_t *tbase_get_base(tvec_base_t *base)
+{
+   return ((tvec_base_t *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void timer_set_deferrable(struct timer_list *timer)
+{
+   timer->base = ((tvec_base_t *)((unsigned long)(timer->base) |
+  TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void
+timer_set_base(struct timer_list *timer, tvec_base_t *new_base)
+{
+   timer->base = (tvec_base_t *)((unsigned long)(new_base) |
+ tbase_get_deferrable(timer->base));
+}
+
 /**
  * __round_jiffies - function to round jiffies to a full second
  * @j: the time in (absolute) jiffies that should be rounded
@@ -295,6 +326,13 @@
 }
 EXPORT_SYMBOL(init_timer);
 
+void fastcall init_timer_deferrable(struct timer_list *timer)
+{
+   init_timer(timer);
+   timer_set_deferrable(timer);
+}
+EXPORT_SYMBOL(init_timer_deferrable);
+
 static inline void detach_timer(struct timer_list *timer,
int clear_pending)
 {
@@ -325,10 +363,11 @@
tvec_base_t *base;
 
for (;;) {
-   base = timer->base;
+   tvec_base_t *prelock_base = timer->base;
+   base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(>lock, *flags);
-   if (likely(base == timer->base))
+   if (likely(prelock_base == timer->base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(>lock, *flags);
@@ -365,11 +404,11 @@
 */
if (likely(base->running_timer != timer)) {
/* See the comment in lock_timer_base() */
-   timer->base = NULL;
+   timer_set_base(timer, NULL);
spin_unlock(>lock);
base = new_base;
spin_lock(>lock);
-   timer->base = base;
+   timer_set_base(timer, base);
}
}
 
@@ -397,7 +436,7 @@
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(>lock, flags);
-   timer->base = base;
+   timer_set_base(timer, base);
internal_add_timer(base, timer);
spin_unlock_irqrestore(>lock, flags);
 }
@@ -548,7 

Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Venki Pallipadi

Andrew,

Please drop the patch you included yesterday and two incremental patches and
use the patch below.

This patch is - yesterday's patch + Your tidy cleanup +
minor changes based on comments from Oleg and Andi. This is a lot
cleaner (and smaller) than earlier patches.

Thanks,
Venki


Introduce a new flag for timers - deferrable:
Timers that work normally when system is busy. But, will not cause CPU to
come out of idle (just to service this timer), when CPU is idle. Instead,
this timer will be serviced when CPU eventually wakes up with a subsequent
non-deferrable timer.

The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until next
event without any issues, this new timer can be used to setup timer event
for that routine. This, with dynticks, allows CPUs to be lazy, allowing them
to stay in idle for extended period of time by reducing unnecesary wakeup and
thereby reducing the power consumption.

This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this
deferrable timer flag. __next_timer_interrupt() function
skips over these deferrable timers when CPU looks for
next timer event for which it has to wake up.

This is exported by a new interface init_timer_deferrable() that can
be called in place of regular init_timer().

Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED]

Index: new/kernel/timer.c
===
--- new.orig/kernel/timer.c 2007-03-22 16:27:44.0 -0800
+++ new/kernel/timer.c  2007-03-28 10:05:38.0 -0800
@@ -74,7 +74,7 @@
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
-} cacheline_aligned_in_smp;
+} cacheline_aligned;
 
 typedef struct tvec_t_base_s tvec_base_t;
 
@@ -82,6 +82,37 @@
 EXPORT_SYMBOL(boot_tvec_bases);
 static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = boot_tvec_bases;
 
+/*
+ * Note that all tvec_bases is 2 byte aligned and lower bit of
+ * base in timer_list is guaranteed to be zero. Use the LSB for
+ * the new flag to indicate whether the timer is deferrable
+ */
+#define TBASE_DEFERRABLE_FLAG  (0x1)
+
+/* Functions below help us manage 'deferrable' flag */
+static inline unsigned int tbase_get_deferrable(tvec_base_t *base)
+{
+   return ((unsigned int)(unsigned long)base  TBASE_DEFERRABLE_FLAG);
+}
+
+static inline tvec_base_t *tbase_get_base(tvec_base_t *base)
+{
+   return ((tvec_base_t *)((unsigned long)base  ~TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void timer_set_deferrable(struct timer_list *timer)
+{
+   timer-base = ((tvec_base_t *)((unsigned long)(timer-base) |
+  TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void
+timer_set_base(struct timer_list *timer, tvec_base_t *new_base)
+{
+   timer-base = (tvec_base_t *)((unsigned long)(new_base) |
+ tbase_get_deferrable(timer-base));
+}
+
 /**
  * __round_jiffies - function to round jiffies to a full second
  * @j: the time in (absolute) jiffies that should be rounded
@@ -295,6 +326,13 @@
 }
 EXPORT_SYMBOL(init_timer);
 
+void fastcall init_timer_deferrable(struct timer_list *timer)
+{
+   init_timer(timer);
+   timer_set_deferrable(timer);
+}
+EXPORT_SYMBOL(init_timer_deferrable);
+
 static inline void detach_timer(struct timer_list *timer,
int clear_pending)
 {
@@ -325,10 +363,11 @@
tvec_base_t *base;
 
for (;;) {
-   base = timer-base;
+   tvec_base_t *prelock_base = timer-base;
+   base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(base-lock, *flags);
-   if (likely(base == timer-base))
+   if (likely(prelock_base == timer-base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(base-lock, *flags);
@@ -365,11 +404,11 @@
 */
if (likely(base-running_timer != timer)) {
/* See the comment in lock_timer_base() */
-   timer-base = NULL;
+   timer_set_base(timer, NULL);
spin_unlock(base-lock);
base = new_base;
spin_lock(base-lock);
-   timer-base = base;
+   timer_set_base(timer, base);
}
}
 
@@ -397,7 +436,7 @@
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer-function);
spin_lock_irqsave(base-lock, flags);
-   timer-base = base;
+   timer_set_base(timer, base);
internal_add_timer(base, timer);
spin_unlock_irqrestore(base-lock, flags);
 }
@@ 

Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Andrew Morton
On Wed, 28 Mar 2007 16:00:21 -0700
Venki Pallipadi [EMAIL PROTECTED] wrote:

 Please drop the patch you included yesterday and two incremental patches and
 use the patch below.

As you saw, I went and turned it into an incremental patch again.  It makes
it easier to see what changed, but harder to see the whole thing.

 Introduce a new flag for timers - deferrable:

OK, but there's nothing in-kernel whcih actually uses this.

It would be good to identify some timer users which can be switched over (as
many as possible, really) so this thing actually gets some runtime testing.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support for deferrable timers (respun-Mar28)

2007-03-28 Thread Venki Pallipadi
On Wed, Mar 28, 2007 at 05:01:59PM -0700, Andrew Morton wrote:
 On Wed, 28 Mar 2007 16:00:21 -0700
 Venki Pallipadi [EMAIL PROTECTED] wrote:
 
  Please drop the patch you included yesterday and two incremental patches and
  use the patch below.
 
 As you saw, I went and turned it into an incremental patch again.  It makes
 it easier to see what changed, but harder to see the whole thing.
 
  Introduce a new flag for timers - deferrable:
 
 OK, but there's nothing in-kernel whcih actually uses this.
 
 It would be good to identify some timer users which can be switched over (as
 many as possible, really) so this thing actually gets some runtime testing.

ondemand is the biggest offender and the patch below reduces the number of
interrupts by 50% or more (depending on HZ) on different test systems here.

Yes. There are quite a few other timers inside kernel that can be
migrated. I will use timer_stats and track others and send in the patches
soon.

Thanks,
Venki


--

Add a new deferrable delayed work init. This can be used to schedule work
that are 'unimportant' when CPU is idle and can be called later, when CPU
eventually comes out of idle.

Use this init in cpufreq ondemand governor.
 
Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED]

Index: new/drivers/cpufreq/cpufreq_ondemand.c
===
--- new.orig/drivers/cpufreq/cpufreq_ondemand.c 2007-03-28 10:03:21.0 
-0800
+++ new/drivers/cpufreq/cpufreq_ondemand.c  2007-03-28 10:05:44.0 
-0800
@@ -470,7 +470,7 @@
dbs_info-enable = 1;
ondemand_powersave_bias_init();
dbs_info-sample_type = DBS_NORMAL_SAMPLE;
-   INIT_DELAYED_WORK(dbs_info-work, do_dbs_timer);
+   INIT_DELAYED_WORK_DEFERRABLE(dbs_info-work, do_dbs_timer);
queue_delayed_work_on(dbs_info-cpu, kondemand_wq, dbs_info-work,
  delay);
 }
Index: new/include/linux/workqueue.h
===
--- new.orig/include/linux/workqueue.h  2007-03-28 10:03:21.0 -0800
+++ new/include/linux/workqueue.h   2007-03-28 10:05:44.0 -0800
@@ -89,6 +89,12 @@
init_timer((_work)-timer);\
} while (0)
 
+#define INIT_DELAYED_WORK_DEFERRABLE(_work, _func) \
+   do {\
+   INIT_WORK((_work)-work, (_func)); \
+   init_timer_deferrable((_work)-timer); \
+   } while (0)
+
 /**
  * work_pending - Find out whether a work item is currently pending
  * @work: The work item in question
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/