Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Arjan van de Ven

On 11/20/2013 8:33 AM, Peter Zijlstra wrote:

On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:

On 11/20/2013 2:58 AM, Peter Zijlstra wrote:

So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.


you can
just not to the cacheline you are watching (or things that alias with that)


Ah indeed, the thread_info::status is in the same cacheline.



you told it to wake on any write to that cacheline
and then you write to it ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:
> On 11/20/2013 2:58 AM, Peter Zijlstra wrote:
> >So pretty silly actually; you cannot do a store (any store) in between
> >monitor and mwait.
> 
> you can
> just not to the cacheline you are watching (or things that alias with that)

Ah indeed, the thread_info::status is in the same cacheline.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Arjan van de Ven

On 11/20/2013 2:58 AM, Peter Zijlstra wrote:

On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:

On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:

I applied this patch on top of upstream kernel (801a760) and found out
my machine completely failed to enter idle when nothing is running.
turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
frequently.

Both ACPI idle and intel_idle behaves the same way. I have to do the
following change to allow entering C-states again.



That doesn't make any sense; current_set_polling_and_test() returns the
same thing need_resched() does.

But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
C1 residency... most weird.


So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.


you can
just not to the cacheline you are watching (or things that alias with that)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
> > I applied this patch on top of upstream kernel (801a760) and found out
> > my machine completely failed to enter idle when nothing is running.
> > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
> > frequently.
> > 
> > Both ACPI idle and intel_idle behaves the same way. I have to do the
> > following change to allow entering C-states again.

> That doesn't make any sense; current_set_polling_and_test() returns the
> same thing need_resched() does.
> 
> But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
> C1 residency... most weird.

So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.

The below version seems to work properly again with both acpi_idle and
intel_idle.

Now to go make that preempt_disable_no_resched cleanup compile.. :-)

---
Subject: x86, acpi, idle: Restructure the mwait idle routines
From: Peter Zijlstra 
Date: Tue, 19 Nov 2013 12:31:53 +0100

People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.

This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.

Also remove __sti_mwait() as its unused.

Cc: ar...@linux.intel.com
Cc: jacob.jun@linux.intel.com
Cc: Mike Galbraith 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: h...@zytor.com
Cc: l...@kernel.org
Cc: shaohua...@intel.com
Cc: rui.zh...@intel.com
Acked-by: Rafael J. Wysocki 
Signed-off-by: Peter Zijlstra 
---
 arch/x86/include/asm/mwait.h   |   40 +
 arch/x86/include/asm/processor.h   |   23 -
 arch/x86/kernel/acpi/cstate.c  |   23 -
 drivers/acpi/acpi_pad.c|5 
 drivers/acpi/processor_idle.c  |   15 -
 drivers/idle/intel_idle.c  |8 ---
 drivers/thermal/intel_powerclamp.c |4 ---
 7 files changed, 43 insertions(+), 75 deletions(-)

--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_MWAIT_H
 #define _ASM_X86_MWAIT_H
 
+#include 
+
 #define MWAIT_SUBSTATE_MASK0xf
 #define MWAIT_CSTATE_MASK  0xf
 #define MWAIT_SUBSTATE_SIZE4
@@ -13,4 +15,42 @@
 
 #define MWAIT_ECX_INTERRUPT_BREAK  0x1
 
+static inline void __monitor(const void *eax, unsigned long ecx,
+unsigned long edx)
+{
+   /* "monitor %eax, %ecx, %edx;" */
+   asm volatile(".byte 0x0f, 0x01, 0xc8;"
+:: "a" (eax), "c" (ecx), "d"(edx));
+}
+
+static inline void __mwait(unsigned long eax, unsigned long ecx)
+{
+   /* "mwait %eax, %ecx;" */
+   asm volatile(".byte 0x0f, 0x01, 0xc9;"
+:: "a" (eax), "c" (ecx));
+}
+
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state
+ * through MWAIT. Whenever someone changes need_resched, we would be woken
+ * up from MWAIT (without an IPI).
+ *
+ * New with Core Duo processors, MWAIT can take some hints based on CPU
+ * capability.
+ */
+static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
+{
+   if (!current_set_polling_and_test()) {
+   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   clflush((void *)_thread_info()->flags);
+
+   __monitor((void *)_thread_info()->flags, 0, 0);
+   if (!need_resched())
+   __mwait(eax, ecx);
+   }
+   __current_clr_polling();
+}
+
 #endif /* _ASM_X86_MWAIT_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -700,29 +700,6 @@ static inline void sync_core(void)
 #endif
 }
 
-static inline void __monitor(const void *eax, unsigned long ecx,
-unsigned long edx)
-{
-   /* "monitor %eax, %ecx, %edx;" */
-   asm volatile(".byte 0x0f, 0x01, 0xc8;"
-:: "a" (eax), "c" (ecx), "d"(edx));
-}
-
-static inline void __mwait(unsigned long eax, unsigned long ecx)
-{
-   /* "mwait %eax, %ecx;" */
-   asm volatile(".byte 0x0f, 0x01, 0xc9;"
-:: "a" (eax), "c" (ecx));
-}
-
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
-{
-   trace_hardirqs_on();
-   /* "mwait %eax, %ecx;" */
-   asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
-:: "a" (eax), "c" (ecx));
-}
-
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 extern void init_amd_e400_c1e_mask(void);
 
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
 
-/*
- * This uses new MONITOR/MWAIT 

Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
> On Tue, 19 Nov 2013 16:13:38 +0100
> Peter Zijlstra  wrote:
> 
> > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > > That said, that drive is completely wrecked. It uses
> > > preempt_enable_no_resched() wrong too, it has uncommented barriers..
> > > 
> > > Dude, wtf are you guys smoking?
> > 
> I applied this patch on top of upstream kernel (801a760) and found out
> my machine completely failed to enter idle when nothing is running.
> turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
> frequently.
> 
> Both ACPI idle and intel_idle behaves the same way. I have to do the
> following change to allow entering C-states again.
> 
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 80014da..b51d1e1 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
> long eax, unsigned long ecx) clflush((void
> *)_thread_info()->flags); 
> __monitor((void *)_thread_info()->flags, 0, 0);
> -   if (!current_set_polling_and_test())
> +if (!need_resched())
> __mwait(eax, ecx);
> -
> -   __current_clr_polling();
>  }
>  
>  #endif /* _ASM_X86_MWAIT_H */

That doesn't make any sense; current_set_polling_and_test() returns the
same thing need_resched() does.

But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
C1 residency... most weird.

/me goes prod at it
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
 On Tue, 19 Nov 2013 16:13:38 +0100
 Peter Zijlstra pet...@infradead.org wrote:
 
  On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
   That said, that drive is completely wrecked. It uses
   preempt_enable_no_resched() wrong too, it has uncommented barriers..
   
   Dude, wtf are you guys smoking?
  
 I applied this patch on top of upstream kernel (801a760) and found out
 my machine completely failed to enter idle when nothing is running.
 turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
 frequently.
 
 Both ACPI idle and intel_idle behaves the same way. I have to do the
 following change to allow entering C-states again.
 
 diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
 index 80014da..b51d1e1 100644
 --- a/arch/x86/include/asm/mwait.h
 +++ b/arch/x86/include/asm/mwait.h
 @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
 long eax, unsigned long ecx) clflush((void
 *)current_thread_info()-flags); 
 __monitor((void *)current_thread_info()-flags, 0, 0);
 -   if (!current_set_polling_and_test())
 +if (!need_resched())
 __mwait(eax, ecx);
 -
 -   __current_clr_polling();
  }
  
  #endif /* _ASM_X86_MWAIT_H */

That doesn't make any sense; current_set_polling_and_test() returns the
same thing need_resched() does.

But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
C1 residency... most weird.

/me goes prod at it
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:
 On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
  I applied this patch on top of upstream kernel (801a760) and found out
  my machine completely failed to enter idle when nothing is running.
  turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
  frequently.
  
  Both ACPI idle and intel_idle behaves the same way. I have to do the
  following change to allow entering C-states again.

 That doesn't make any sense; current_set_polling_and_test() returns the
 same thing need_resched() does.
 
 But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
 C1 residency... most weird.

So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.

The below version seems to work properly again with both acpi_idle and
intel_idle.

Now to go make that preempt_disable_no_resched cleanup compile.. :-)

---
Subject: x86, acpi, idle: Restructure the mwait idle routines
From: Peter Zijlstra pet...@infradead.org
Date: Tue, 19 Nov 2013 12:31:53 +0100

People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.

This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.

Also remove __sti_mwait() as its unused.

Cc: ar...@linux.intel.com
Cc: jacob.jun@linux.intel.com
Cc: Mike Galbraith bitbuc...@online.de
Cc: Ingo Molnar mi...@kernel.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: h...@zytor.com
Cc: l...@kernel.org
Cc: shaohua...@intel.com
Cc: rui.zh...@intel.com
Acked-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
---
 arch/x86/include/asm/mwait.h   |   40 +
 arch/x86/include/asm/processor.h   |   23 -
 arch/x86/kernel/acpi/cstate.c  |   23 -
 drivers/acpi/acpi_pad.c|5 
 drivers/acpi/processor_idle.c  |   15 -
 drivers/idle/intel_idle.c  |8 ---
 drivers/thermal/intel_powerclamp.c |4 ---
 7 files changed, 43 insertions(+), 75 deletions(-)

--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_MWAIT_H
 #define _ASM_X86_MWAIT_H
 
+#include linux/sched.h
+
 #define MWAIT_SUBSTATE_MASK0xf
 #define MWAIT_CSTATE_MASK  0xf
 #define MWAIT_SUBSTATE_SIZE4
@@ -13,4 +15,42 @@
 
 #define MWAIT_ECX_INTERRUPT_BREAK  0x1
 
+static inline void __monitor(const void *eax, unsigned long ecx,
+unsigned long edx)
+{
+   /* monitor %eax, %ecx, %edx; */
+   asm volatile(.byte 0x0f, 0x01, 0xc8;
+:: a (eax), c (ecx), d(edx));
+}
+
+static inline void __mwait(unsigned long eax, unsigned long ecx)
+{
+   /* mwait %eax, %ecx; */
+   asm volatile(.byte 0x0f, 0x01, 0xc9;
+:: a (eax), c (ecx));
+}
+
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state
+ * through MWAIT. Whenever someone changes need_resched, we would be woken
+ * up from MWAIT (without an IPI).
+ *
+ * New with Core Duo processors, MWAIT can take some hints based on CPU
+ * capability.
+ */
+static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
+{
+   if (!current_set_polling_and_test()) {
+   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   clflush((void *)current_thread_info()-flags);
+
+   __monitor((void *)current_thread_info()-flags, 0, 0);
+   if (!need_resched())
+   __mwait(eax, ecx);
+   }
+   __current_clr_polling();
+}
+
 #endif /* _ASM_X86_MWAIT_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -700,29 +700,6 @@ static inline void sync_core(void)
 #endif
 }
 
-static inline void __monitor(const void *eax, unsigned long ecx,
-unsigned long edx)
-{
-   /* monitor %eax, %ecx, %edx; */
-   asm volatile(.byte 0x0f, 0x01, 0xc8;
-:: a (eax), c (ecx), d(edx));
-}
-
-static inline void __mwait(unsigned long eax, unsigned long ecx)
-{
-   /* mwait %eax, %ecx; */
-   asm volatile(.byte 0x0f, 0x01, 0xc9;
-:: a (eax), c (ecx));
-}
-
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
-{
-   trace_hardirqs_on();
-   /* mwait %eax, %ecx; */
-   asm volatile(sti; .byte 0x0f, 0x01, 0xc9;
-:: a (eax), c (ecx));
-}
-
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 extern void init_amd_e400_c1e_mask(void);
 
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi
 }
 

Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Arjan van de Ven

On 11/20/2013 2:58 AM, Peter Zijlstra wrote:

On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:

On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:

I applied this patch on top of upstream kernel (801a760) and found out
my machine completely failed to enter idle when nothing is running.
turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
frequently.

Both ACPI idle and intel_idle behaves the same way. I have to do the
following change to allow entering C-states again.



That doesn't make any sense; current_set_polling_and_test() returns the
same thing need_resched() does.

But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
C1 residency... most weird.


So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.


you can
just not to the cacheline you are watching (or things that alias with that)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Peter Zijlstra
On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:
 On 11/20/2013 2:58 AM, Peter Zijlstra wrote:
 So pretty silly actually; you cannot do a store (any store) in between
 monitor and mwait.
 
 you can
 just not to the cacheline you are watching (or things that alias with that)

Ah indeed, the thread_info::status is in the same cacheline.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-20 Thread Arjan van de Ven

On 11/20/2013 8:33 AM, Peter Zijlstra wrote:

On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:

On 11/20/2013 2:58 AM, Peter Zijlstra wrote:

So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.


you can
just not to the cacheline you are watching (or things that alias with that)


Ah indeed, the thread_info::status is in the same cacheline.



you told it to wake on any write to that cacheline
and then you write to it ;-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Jacob Pan
On Tue, 19 Nov 2013 16:13:38 +0100
Peter Zijlstra  wrote:

> On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > That said, that drive is completely wrecked. It uses
> > preempt_enable_no_resched() wrong too, it has uncommented barriers..
> > 
> > Dude, wtf are you guys smoking?
> 
I applied this patch on top of upstream kernel (801a760) and found out
my machine completely failed to enter idle when nothing is running.
turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
frequently.

Both ACPI idle and intel_idle behaves the same way. I have to do the
following change to allow entering C-states again.

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 80014da..b51d1e1 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
long eax, unsigned long ecx) clflush((void
*)_thread_info()->flags); 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!current_set_polling_and_test())
+if (!need_resched())
__mwait(eax, ecx);
-
-   __current_clr_polling();
 }
 
 #endif /* _ASM_X86_MWAIT_H */

Did i miss any other patches?

Jacob

> ---
> Subject: sched: Take away preempt_enable_no_resched() and friends
> from modules
> 
> There is no way in hell modules are going to play preemption tricks
> like this.
> 
> Cc: eliezer.ta...@linux.intel.com
> Cc: ar...@linux.intel.com
> Reviewed-by: Thomas Gleixner 
> Signed-off-by: Peter Zijlstra 
> ---
>  drivers/thermal/intel_powerclamp.c |  2 +-
>  include/linux/preempt.h|  8 +++-
>  include/net/busy_poll.h| 15 +++
>  3 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/thermal/intel_powerclamp.c
> b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c
> 100644 --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
>   atomic_inc(_wakeup_counter);
>   }
>   tick_nohz_idle_exit();
> - preempt_enable_no_resched();
> + preempt_enable();
>   }
>   del_timer_sync(_timer);
>   clear_bit(cpunr, cpu_clamping_mask);
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index a3d9dc8c2c00..3ed2b5335ab4 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -64,7 +64,7 @@ do { \
>  } while (0)
>  
>  #else
> -#define preempt_enable() preempt_enable_no_resched()
> +#define preempt_enable() sched_preempt_enable_no_resched()
>  #define preempt_check_resched() do { } while (0)
>  #endif
>  
> @@ -116,6 +116,12 @@ do { \
>  
>  #endif /* CONFIG_PREEMPT_COUNT */
>  
> +#ifdef MODULE
> +#undef preempt_enable_no_resched
> +#undef preempt_enable_no_resched_notrace
> +#undef preempt_check_resched
> +#endif
> +
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>  
>  struct preempt_notifier;
> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
> index 829627d7b846..756827a86c2d 100644
> --- a/include/net/busy_poll.h
> +++ b/include/net/busy_poll.h
> @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
>   return sysctl_net_busy_poll;
>  }
>  
> -/* a wrapper to make debug_smp_processor_id() happy
> - * we can use sched_clock() because we don't care much about
> precision
> - * we only care that the average is bounded
> - */
> -#ifdef CONFIG_DEBUG_PREEMPT
>  static inline u64 busy_loop_us_clock(void)
>  {
>   u64 rc;
>  
> + /* XXX with interrupts enabled sched_clock() can return
> utter garbage */ +
>   preempt_disable_notrace();
>   rc = sched_clock();
> - preempt_enable_no_resched_notrace();
> + preempt_enable_notrace();
>  
>   return rc >> 10;
>  }
> -#else /* CONFIG_DEBUG_PREEMPT */
> -static inline u64 busy_loop_us_clock(void)
> -{
> - return sched_clock() >> 10;
> -}
> -#endif /* CONFIG_DEBUG_PREEMPT */
>  
>  static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
>  {
> 

[Jacob Pan]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> That said, that drive is completely wrecked. It uses
> preempt_enable_no_resched() wrong too, it has uncommented barriers..
> 
> Dude, wtf are you guys smoking?

---
Subject: sched: Take away preempt_enable_no_resched() and friends from modules

There is no way in hell modules are going to play preemption tricks like
this.

Cc: eliezer.ta...@linux.intel.com
Cc: ar...@linux.intel.com
Reviewed-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra 
---
 drivers/thermal/intel_powerclamp.c |  2 +-
 include/linux/preempt.h|  8 +++-
 include/net/busy_poll.h| 15 +++
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..0a12ddc2eb4c 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
atomic_inc(_wakeup_counter);
}
tick_nohz_idle_exit();
-   preempt_enable_no_resched();
+   preempt_enable();
}
del_timer_sync(_timer);
clear_bit(cpunr, cpu_clamping_mask);
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index a3d9dc8c2c00..3ed2b5335ab4 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -64,7 +64,7 @@ do { \
 } while (0)
 
 #else
-#define preempt_enable() preempt_enable_no_resched()
+#define preempt_enable() sched_preempt_enable_no_resched()
 #define preempt_check_resched() do { } while (0)
 #endif
 
@@ -116,6 +116,12 @@ do { \
 
 #endif /* CONFIG_PREEMPT_COUNT */
 
+#ifdef MODULE
+#undef preempt_enable_no_resched
+#undef preempt_enable_no_resched_notrace
+#undef preempt_check_resched
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 
 struct preempt_notifier;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 829627d7b846..756827a86c2d 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
return sysctl_net_busy_poll;
 }
 
-/* a wrapper to make debug_smp_processor_id() happy
- * we can use sched_clock() because we don't care much about precision
- * we only care that the average is bounded
- */
-#ifdef CONFIG_DEBUG_PREEMPT
 static inline u64 busy_loop_us_clock(void)
 {
u64 rc;
 
+   /* XXX with interrupts enabled sched_clock() can return utter garbage */
+
preempt_disable_notrace();
rc = sched_clock();
-   preempt_enable_no_resched_notrace();
+   preempt_enable_notrace();
 
return rc >> 10;
 }
-#else /* CONFIG_DEBUG_PREEMPT */
-static inline u64 busy_loop_us_clock(void)
-{
-   return sched_clock() >> 10;
-}
-#endif /* CONFIG_DEBUG_PREEMPT */
 
 static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
 {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
> >diff --git a/drivers/thermal/intel_powerclamp.c 
> >b/drivers/thermal/intel_powerclamp.c
> >index 8f181b3f842b..e8275f2df9af 100644
> >--- a/drivers/thermal/intel_powerclamp.c
> >+++ b/drivers/thermal/intel_powerclamp.c
> >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> >  */
> > local_touch_nmi();
> > stop_critical_timings();
> >-__monitor((void *)_thread_info()->flags, 0, 0);
> >-cpu_relax(); /* allow HT sibling to run */
> >-__mwait(eax, ecx);
> >+mwait_idle_with_hints(eax, ecx);
> > start_critical_timings();
> > atomic_inc(_wakeup_counter);
> > }
> >
> 
> hmm I take it that mwait_idle_with_hints is the one that also checks 
> need_resched() ?
> if so... powerclamp may not want to use that (the whole point is to NOT give 
> the cpu
> to tasks!)

That said, that drive is completely wrecked. It uses
preempt_enable_no_resched() wrong too, it has uncommented barriers..

Dude, wtf are you guys smoking?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
> >diff --git a/drivers/thermal/intel_powerclamp.c 
> >b/drivers/thermal/intel_powerclamp.c
> >index 8f181b3f842b..e8275f2df9af 100644
> >--- a/drivers/thermal/intel_powerclamp.c
> >+++ b/drivers/thermal/intel_powerclamp.c
> >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> >  */
> > local_touch_nmi();
> > stop_critical_timings();
> >-__monitor((void *)_thread_info()->flags, 0, 0);
> >-cpu_relax(); /* allow HT sibling to run */
> >-__mwait(eax, ecx);
> >+mwait_idle_with_hints(eax, ecx);
> > start_critical_timings();
> > atomic_inc(_wakeup_counter);
> > }
> >
> 
> hmm I take it that mwait_idle_with_hints is the one that also checks 
> need_resched() ?
> if so... powerclamp may not want to use that (the whole point is to NOT give 
> the cpu
> to tasks!)

Of course, you're very much not getting something that wins from
stop_machine or a fifo-99 task.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Arjan van de Ven

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..e8275f2df9af 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
 */
local_touch_nmi();
stop_critical_timings();
-   __monitor((void *)_thread_info()->flags, 0, 0);
-   cpu_relax(); /* allow HT sibling to run */
-   __mwait(eax, ecx);
+   mwait_idle_with_hints(eax, ecx);
start_critical_timings();
atomic_inc(_wakeup_counter);
}



hmm I take it that mwait_idle_with_hints is the one that also checks 
need_resched() ?
if so... powerclamp may not want to use that (the whole point is to NOT give 
the cpu
to tasks!)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Mike Galbraith
On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote: 
> People seem to delight in writing wrong and broken mwait idle routines;
> collapse the lot.
> 
> This leaves mwait_play_dead() the sole remaining user of __mwait() and
> new __mwait() users are probably doing it wrong.
> 
> Also remove __sti_mwait() as its unused.
> 
> Signed-off-by: Peter Zijlstra 
> ---
> 
> Mike, does this cure your core2?

Nope.  Maybe an acpi/bios thingy on this box since diags that fired on
lappy during boot did not fire on desktop, and both are core2 booting
same kernel.  I kinda suspect I'll either be stuck with default_idle or
have to resurrect mwait_idle and carry it locally if I want the thing to
work well.  Guess I'll find out if/when I have time to squabble with it.

Meanwhile, desktop box works fine modulo benchmarks, lappy works fine
too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I
don't care about much.  Neither box is wonderful at rt testing where I
usually boot max_cstate=1, both just became a bit _less_ wonderful :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Rafael J. Wysocki
On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote:
> People seem to delight in writing wrong and broken mwait idle routines;
> collapse the lot.
> 
> This leaves mwait_play_dead() the sole remaining user of __mwait() and
> new __mwait() users are probably doing it wrong.
> 
> Also remove __sti_mwait() as its unused.
> 
> Signed-off-by: Peter Zijlstra 

For the ACPI part:

Acked-by: Rafael J. Wysocki 

> ---
> 
> Mike, does this cure your core2?
> 
>  arch/x86/include/asm/mwait.h   | 42 
> ++
>  arch/x86/include/asm/processor.h   | 23 -
>  arch/x86/kernel/acpi/cstate.c  | 23 -
>  drivers/acpi/acpi_pad.c|  5 +
>  drivers/acpi/processor_idle.c  | 15 --
>  drivers/idle/intel_idle.c  |  8 +---
>  drivers/thermal/intel_powerclamp.c |  4 +---
>  7 files changed, 45 insertions(+), 75 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 2f366d0ac6b4..80014dade987 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_X86_MWAIT_H
>  #define _ASM_X86_MWAIT_H
>  
> +#include 
> +
>  #define MWAIT_SUBSTATE_MASK  0xf
>  #define MWAIT_CSTATE_MASK0xf
>  #define MWAIT_SUBSTATE_SIZE  4
> @@ -13,4 +15,44 @@
>  
>  #define MWAIT_ECX_INTERRUPT_BREAK0x1
>  
> +static inline void __monitor(const void *eax, unsigned long ecx,
> +  unsigned long edx)
> +{
> + /* "monitor %eax, %ecx, %edx;" */
> + asm volatile(".byte 0x0f, 0x01, 0xc8;"
> +  :: "a" (eax), "c" (ecx), "d"(edx));
> +}
> +
> +static inline void __mwait(unsigned long eax, unsigned long ecx)
> +{
> + /* "mwait %eax, %ecx;" */
> + asm volatile(".byte 0x0f, 0x01, 0xc9;"
> +  :: "a" (eax), "c" (ecx));
> +}
> +
> +/*
> + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> + * which can obviate IPI to trigger checking of need_resched.
> + * We execute MONITOR against need_resched and enter optimized wait state
> + * through MWAIT. Whenever someone changes need_resched, we would be woken
> + * up from MWAIT (without an IPI).
> + *
> + * New with Core Duo processors, MWAIT can take some hints based on CPU
> + * capability.
> + */
> +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long 
> ecx)
> +{
> + if (need_resched())
> + return;
> +
> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + clflush((void *)_thread_info()->flags);
> +
> + __monitor((void *)_thread_info()->flags, 0, 0);
> + if (!current_set_polling_and_test())
> + __mwait(eax, ecx);
> +
> + __current_clr_polling();
> +}
> +
>  #endif /* _ASM_X86_MWAIT_H */
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index 7b034a4057f9..24821f5768bc 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -700,29 +700,6 @@ static inline void sync_core(void)
>  #endif
>  }
>  
> -static inline void __monitor(const void *eax, unsigned long ecx,
> -  unsigned long edx)
> -{
> - /* "monitor %eax, %ecx, %edx;" */
> - asm volatile(".byte 0x0f, 0x01, 0xc8;"
> -  :: "a" (eax), "c" (ecx), "d"(edx));
> -}
> -
> -static inline void __mwait(unsigned long eax, unsigned long ecx)
> -{
> - /* "mwait %eax, %ecx;" */
> - asm volatile(".byte 0x0f, 0x01, 0xc9;"
> -  :: "a" (eax), "c" (ecx));
> -}
> -
> -static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> -{
> - trace_hardirqs_on();
> - /* "mwait %eax, %ecx;" */
> - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
> -  :: "a" (eax), "c" (ecx));
> -}
> -
>  extern void select_idle_routine(const struct cpuinfo_x86 *c);
>  extern void init_amd_e400_c1e_mask(void);
>  
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index d2b7f27781bc..e69182fd01cf 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>  }
>  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
>  
> -/*
> - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> - * which can obviate IPI to trigger checking of need_resched.
> - * We execute MONITOR against need_resched and enter optimized wait state
> - * through MWAIT. Whenever someone changes need_resched, we would be woken
> - * up from MWAIT (without an IPI).
> - *
> - * New with Core Duo processors, MWAIT can take some hints based on CPU
> - * capability.
> - */
> -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
> -{
> - if (!need_resched()) {
> - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> - clflush((void *)_thread_info()->flags);
> -
> - 

Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Rafael J. Wysocki
On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote:
 People seem to delight in writing wrong and broken mwait idle routines;
 collapse the lot.
 
 This leaves mwait_play_dead() the sole remaining user of __mwait() and
 new __mwait() users are probably doing it wrong.
 
 Also remove __sti_mwait() as its unused.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org

For the ACPI part:

Acked-by: Rafael J. Wysocki rafael.j.wyso...@intel.com

 ---
 
 Mike, does this cure your core2?
 
  arch/x86/include/asm/mwait.h   | 42 
 ++
  arch/x86/include/asm/processor.h   | 23 -
  arch/x86/kernel/acpi/cstate.c  | 23 -
  drivers/acpi/acpi_pad.c|  5 +
  drivers/acpi/processor_idle.c  | 15 --
  drivers/idle/intel_idle.c  |  8 +---
  drivers/thermal/intel_powerclamp.c |  4 +---
  7 files changed, 45 insertions(+), 75 deletions(-)
 
 diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
 index 2f366d0ac6b4..80014dade987 100644
 --- a/arch/x86/include/asm/mwait.h
 +++ b/arch/x86/include/asm/mwait.h
 @@ -1,6 +1,8 @@
  #ifndef _ASM_X86_MWAIT_H
  #define _ASM_X86_MWAIT_H
  
 +#include linux/sched.h
 +
  #define MWAIT_SUBSTATE_MASK  0xf
  #define MWAIT_CSTATE_MASK0xf
  #define MWAIT_SUBSTATE_SIZE  4
 @@ -13,4 +15,44 @@
  
  #define MWAIT_ECX_INTERRUPT_BREAK0x1
  
 +static inline void __monitor(const void *eax, unsigned long ecx,
 +  unsigned long edx)
 +{
 + /* monitor %eax, %ecx, %edx; */
 + asm volatile(.byte 0x0f, 0x01, 0xc8;
 +  :: a (eax), c (ecx), d(edx));
 +}
 +
 +static inline void __mwait(unsigned long eax, unsigned long ecx)
 +{
 + /* mwait %eax, %ecx; */
 + asm volatile(.byte 0x0f, 0x01, 0xc9;
 +  :: a (eax), c (ecx));
 +}
 +
 +/*
 + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
 + * which can obviate IPI to trigger checking of need_resched.
 + * We execute MONITOR against need_resched and enter optimized wait state
 + * through MWAIT. Whenever someone changes need_resched, we would be woken
 + * up from MWAIT (without an IPI).
 + *
 + * New with Core Duo processors, MWAIT can take some hints based on CPU
 + * capability.
 + */
 +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long 
 ecx)
 +{
 + if (need_resched())
 + return;
 +
 + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
 + clflush((void *)current_thread_info()-flags);
 +
 + __monitor((void *)current_thread_info()-flags, 0, 0);
 + if (!current_set_polling_and_test())
 + __mwait(eax, ecx);
 +
 + __current_clr_polling();
 +}
 +
  #endif /* _ASM_X86_MWAIT_H */
 diff --git a/arch/x86/include/asm/processor.h 
 b/arch/x86/include/asm/processor.h
 index 7b034a4057f9..24821f5768bc 100644
 --- a/arch/x86/include/asm/processor.h
 +++ b/arch/x86/include/asm/processor.h
 @@ -700,29 +700,6 @@ static inline void sync_core(void)
  #endif
  }
  
 -static inline void __monitor(const void *eax, unsigned long ecx,
 -  unsigned long edx)
 -{
 - /* monitor %eax, %ecx, %edx; */
 - asm volatile(.byte 0x0f, 0x01, 0xc8;
 -  :: a (eax), c (ecx), d(edx));
 -}
 -
 -static inline void __mwait(unsigned long eax, unsigned long ecx)
 -{
 - /* mwait %eax, %ecx; */
 - asm volatile(.byte 0x0f, 0x01, 0xc9;
 -  :: a (eax), c (ecx));
 -}
 -
 -static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 -{
 - trace_hardirqs_on();
 - /* mwait %eax, %ecx; */
 - asm volatile(sti; .byte 0x0f, 0x01, 0xc9;
 -  :: a (eax), c (ecx));
 -}
 -
  extern void select_idle_routine(const struct cpuinfo_x86 *c);
  extern void init_amd_e400_c1e_mask(void);
  
 diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
 index d2b7f27781bc..e69182fd01cf 100644
 --- a/arch/x86/kernel/acpi/cstate.c
 +++ b/arch/x86/kernel/acpi/cstate.c
 @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
  }
  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
  
 -/*
 - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
 - * which can obviate IPI to trigger checking of need_resched.
 - * We execute MONITOR against need_resched and enter optimized wait state
 - * through MWAIT. Whenever someone changes need_resched, we would be woken
 - * up from MWAIT (without an IPI).
 - *
 - * New with Core Duo processors, MWAIT can take some hints based on CPU
 - * capability.
 - */
 -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 -{
 - if (!need_resched()) {
 - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
 - clflush((void *)current_thread_info()-flags);
 -
 - __monitor((void *)current_thread_info()-flags, 0, 0);
 - smp_mb();
 - if 

Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Mike Galbraith
On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote: 
 People seem to delight in writing wrong and broken mwait idle routines;
 collapse the lot.
 
 This leaves mwait_play_dead() the sole remaining user of __mwait() and
 new __mwait() users are probably doing it wrong.
 
 Also remove __sti_mwait() as its unused.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 ---
 
 Mike, does this cure your core2?

Nope.  Maybe an acpi/bios thingy on this box since diags that fired on
lappy during boot did not fire on desktop, and both are core2 booting
same kernel.  I kinda suspect I'll either be stuck with default_idle or
have to resurrect mwait_idle and carry it locally if I want the thing to
work well.  Guess I'll find out if/when I have time to squabble with it.

Meanwhile, desktop box works fine modulo benchmarks, lappy works fine
too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I
don't care about much.  Neither box is wonderful at rt testing where I
usually boot max_cstate=1, both just became a bit _less_ wonderful :)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Arjan van de Ven

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..e8275f2df9af 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
 */
local_touch_nmi();
stop_critical_timings();
-   __monitor((void *)current_thread_info()-flags, 0, 0);
-   cpu_relax(); /* allow HT sibling to run */
-   __mwait(eax, ecx);
+   mwait_idle_with_hints(eax, ecx);
start_critical_timings();
atomic_inc(idle_wakeup_counter);
}



hmm I take it that mwait_idle_with_hints is the one that also checks 
need_resched() ?
if so... powerclamp may not want to use that (the whole point is to NOT give 
the cpu
to tasks!)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
 diff --git a/drivers/thermal/intel_powerclamp.c 
 b/drivers/thermal/intel_powerclamp.c
 index 8f181b3f842b..e8275f2df9af 100644
 --- a/drivers/thermal/intel_powerclamp.c
 +++ b/drivers/thermal/intel_powerclamp.c
 @@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
   */
  local_touch_nmi();
  stop_critical_timings();
 -__monitor((void *)current_thread_info()-flags, 0, 0);
 -cpu_relax(); /* allow HT sibling to run */
 -__mwait(eax, ecx);
 +mwait_idle_with_hints(eax, ecx);
  start_critical_timings();
  atomic_inc(idle_wakeup_counter);
  }
 
 
 hmm I take it that mwait_idle_with_hints is the one that also checks 
 need_resched() ?
 if so... powerclamp may not want to use that (the whole point is to NOT give 
 the cpu
 to tasks!)

Of course, you're very much not getting something that wins from
stop_machine or a fifo-99 task.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
 diff --git a/drivers/thermal/intel_powerclamp.c 
 b/drivers/thermal/intel_powerclamp.c
 index 8f181b3f842b..e8275f2df9af 100644
 --- a/drivers/thermal/intel_powerclamp.c
 +++ b/drivers/thermal/intel_powerclamp.c
 @@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
   */
  local_touch_nmi();
  stop_critical_timings();
 -__monitor((void *)current_thread_info()-flags, 0, 0);
 -cpu_relax(); /* allow HT sibling to run */
 -__mwait(eax, ecx);
 +mwait_idle_with_hints(eax, ecx);
  start_critical_timings();
  atomic_inc(idle_wakeup_counter);
  }
 
 
 hmm I take it that mwait_idle_with_hints is the one that also checks 
 need_resched() ?
 if so... powerclamp may not want to use that (the whole point is to NOT give 
 the cpu
 to tasks!)

That said, that drive is completely wrecked. It uses
preempt_enable_no_resched() wrong too, it has uncommented barriers..

Dude, wtf are you guys smoking?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Peter Zijlstra
On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
 That said, that drive is completely wrecked. It uses
 preempt_enable_no_resched() wrong too, it has uncommented barriers..
 
 Dude, wtf are you guys smoking?

---
Subject: sched: Take away preempt_enable_no_resched() and friends from modules

There is no way in hell modules are going to play preemption tricks like
this.

Cc: eliezer.ta...@linux.intel.com
Cc: ar...@linux.intel.com
Reviewed-by: Thomas Gleixner t...@linutronix.de
Signed-off-by: Peter Zijlstra pet...@infradead.org
---
 drivers/thermal/intel_powerclamp.c |  2 +-
 include/linux/preempt.h|  8 +++-
 include/net/busy_poll.h| 15 +++
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..0a12ddc2eb4c 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
atomic_inc(idle_wakeup_counter);
}
tick_nohz_idle_exit();
-   preempt_enable_no_resched();
+   preempt_enable();
}
del_timer_sync(wakeup_timer);
clear_bit(cpunr, cpu_clamping_mask);
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index a3d9dc8c2c00..3ed2b5335ab4 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -64,7 +64,7 @@ do { \
 } while (0)
 
 #else
-#define preempt_enable() preempt_enable_no_resched()
+#define preempt_enable() sched_preempt_enable_no_resched()
 #define preempt_check_resched() do { } while (0)
 #endif
 
@@ -116,6 +116,12 @@ do { \
 
 #endif /* CONFIG_PREEMPT_COUNT */
 
+#ifdef MODULE
+#undef preempt_enable_no_resched
+#undef preempt_enable_no_resched_notrace
+#undef preempt_check_resched
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 
 struct preempt_notifier;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 829627d7b846..756827a86c2d 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
return sysctl_net_busy_poll;
 }
 
-/* a wrapper to make debug_smp_processor_id() happy
- * we can use sched_clock() because we don't care much about precision
- * we only care that the average is bounded
- */
-#ifdef CONFIG_DEBUG_PREEMPT
 static inline u64 busy_loop_us_clock(void)
 {
u64 rc;
 
+   /* XXX with interrupts enabled sched_clock() can return utter garbage */
+
preempt_disable_notrace();
rc = sched_clock();
-   preempt_enable_no_resched_notrace();
+   preempt_enable_notrace();
 
return rc  10;
 }
-#else /* CONFIG_DEBUG_PREEMPT */
-static inline u64 busy_loop_us_clock(void)
-{
-   return sched_clock()  10;
-}
-#endif /* CONFIG_DEBUG_PREEMPT */
 
 static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
 {

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines

2013-11-19 Thread Jacob Pan
On Tue, 19 Nov 2013 16:13:38 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
  That said, that drive is completely wrecked. It uses
  preempt_enable_no_resched() wrong too, it has uncommented barriers..
  
  Dude, wtf are you guys smoking?
 
I applied this patch on top of upstream kernel (801a760) and found out
my machine completely failed to enter idle when nothing is running.
turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
frequently.

Both ACPI idle and intel_idle behaves the same way. I have to do the
following change to allow entering C-states again.

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 80014da..b51d1e1 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
long eax, unsigned long ecx) clflush((void
*)current_thread_info()-flags); 
__monitor((void *)current_thread_info()-flags, 0, 0);
-   if (!current_set_polling_and_test())
+if (!need_resched())
__mwait(eax, ecx);
-
-   __current_clr_polling();
 }
 
 #endif /* _ASM_X86_MWAIT_H */

Did i miss any other patches?

Jacob

 ---
 Subject: sched: Take away preempt_enable_no_resched() and friends
 from modules
 
 There is no way in hell modules are going to play preemption tricks
 like this.
 
 Cc: eliezer.ta...@linux.intel.com
 Cc: ar...@linux.intel.com
 Reviewed-by: Thomas Gleixner t...@linutronix.de
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 ---
  drivers/thermal/intel_powerclamp.c |  2 +-
  include/linux/preempt.h|  8 +++-
  include/net/busy_poll.h| 15 +++
  3 files changed, 11 insertions(+), 14 deletions(-)
 
 diff --git a/drivers/thermal/intel_powerclamp.c
 b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c
 100644 --- a/drivers/thermal/intel_powerclamp.c
 +++ b/drivers/thermal/intel_powerclamp.c
 @@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
   atomic_inc(idle_wakeup_counter);
   }
   tick_nohz_idle_exit();
 - preempt_enable_no_resched();
 + preempt_enable();
   }
   del_timer_sync(wakeup_timer);
   clear_bit(cpunr, cpu_clamping_mask);
 diff --git a/include/linux/preempt.h b/include/linux/preempt.h
 index a3d9dc8c2c00..3ed2b5335ab4 100644
 --- a/include/linux/preempt.h
 +++ b/include/linux/preempt.h
 @@ -64,7 +64,7 @@ do { \
  } while (0)
  
  #else
 -#define preempt_enable() preempt_enable_no_resched()
 +#define preempt_enable() sched_preempt_enable_no_resched()
  #define preempt_check_resched() do { } while (0)
  #endif
  
 @@ -116,6 +116,12 @@ do { \
  
  #endif /* CONFIG_PREEMPT_COUNT */
  
 +#ifdef MODULE
 +#undef preempt_enable_no_resched
 +#undef preempt_enable_no_resched_notrace
 +#undef preempt_check_resched
 +#endif
 +
  #ifdef CONFIG_PREEMPT_NOTIFIERS
  
  struct preempt_notifier;
 diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
 index 829627d7b846..756827a86c2d 100644
 --- a/include/net/busy_poll.h
 +++ b/include/net/busy_poll.h
 @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
   return sysctl_net_busy_poll;
  }
  
 -/* a wrapper to make debug_smp_processor_id() happy
 - * we can use sched_clock() because we don't care much about
 precision
 - * we only care that the average is bounded
 - */
 -#ifdef CONFIG_DEBUG_PREEMPT
  static inline u64 busy_loop_us_clock(void)
  {
   u64 rc;
  
 + /* XXX with interrupts enabled sched_clock() can return
 utter garbage */ +
   preempt_disable_notrace();
   rc = sched_clock();
 - preempt_enable_no_resched_notrace();
 + preempt_enable_notrace();
  
   return rc  10;
  }
 -#else /* CONFIG_DEBUG_PREEMPT */
 -static inline u64 busy_loop_us_clock(void)
 -{
 - return sched_clock()  10;
 -}
 -#endif /* CONFIG_DEBUG_PREEMPT */
  
  static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
  {
 

[Jacob Pan]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/