Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On 11/20/2013 8:33 AM, Peter Zijlstra wrote: On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: On 11/20/2013 2:58 AM, Peter Zijlstra wrote: So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) Ah indeed, the thread_info::status is in the same cacheline. you told it to wake on any write to that cacheline and then you write to it ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: > On 11/20/2013 2:58 AM, Peter Zijlstra wrote: > >So pretty silly actually; you cannot do a store (any store) in between > >monitor and mwait. > > you can > just not to the cacheline you are watching (or things that alias with that) Ah indeed, the thread_info::status is in the same cacheline. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On 11/20/2013 2:58 AM, Peter Zijlstra wrote: On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: > On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: > > I applied this patch on top of upstream kernel (801a760) and found out > > my machine completely failed to enter idle when nothing is running. > > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle > > frequently. > > > > Both ACPI idle and intel_idle behaves the same way. I have to do the > > following change to allow entering C-states again. > That doesn't make any sense; current_set_polling_and_test() returns the > same thing need_resched() does. > > But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% > C1 residency... most weird. So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. The below version seems to work properly again with both acpi_idle and intel_idle. Now to go make that preempt_disable_no_resched cleanup compile.. :-) --- Subject: x86, acpi, idle: Restructure the mwait idle routines From: Peter Zijlstra Date: Tue, 19 Nov 2013 12:31:53 +0100 People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Cc: ar...@linux.intel.com Cc: jacob.jun@linux.intel.com Cc: Mike Galbraith Cc: Ingo Molnar Cc: Thomas Gleixner Cc: h...@zytor.com Cc: l...@kernel.org Cc: shaohua...@intel.com Cc: rui.zh...@intel.com Acked-by: Rafael J. Wysocki Signed-off-by: Peter Zijlstra --- arch/x86/include/asm/mwait.h | 40 + arch/x86/include/asm/processor.h | 23 - arch/x86/kernel/acpi/cstate.c | 23 - drivers/acpi/acpi_pad.c|5 drivers/acpi/processor_idle.c | 15 - drivers/idle/intel_idle.c |8 --- drivers/thermal/intel_powerclamp.c |4 --- 7 files changed, 43 insertions(+), 75 deletions(-) --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MWAIT_H #define _ASM_X86_MWAIT_H +#include + #define MWAIT_SUBSTATE_MASK0xf #define MWAIT_CSTATE_MASK 0xf #define MWAIT_SUBSTATE_SIZE4 @@ -13,4 +15,42 @@ #define MWAIT_ECX_INTERRUPT_BREAK 0x1 +static inline void __monitor(const void *eax, unsigned long ecx, +unsigned long edx) +{ + /* "monitor %eax, %ecx, %edx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc8;" +:: "a" (eax), "c" (ecx), "d"(edx)); +} + +static inline void __mwait(unsigned long eax, unsigned long ecx) +{ + /* "mwait %eax, %ecx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc9;" +:: "a" (eax), "c" (ecx)); +} + +/* + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, + * which can obviate IPI to trigger checking of need_resched. + * We execute MONITOR against need_resched and enter optimized wait state + * through MWAIT. Whenever someone changes need_resched, we would be woken + * up from MWAIT (without an IPI). + * + * New with Core Duo processors, MWAIT can take some hints based on CPU + * capability. + */ +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) +{ + if (!current_set_polling_and_test()) { + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)_thread_info()->flags); + + __monitor((void *)_thread_info()->flags, 0, 0); + if (!need_resched()) + __mwait(eax, ecx); + } + __current_clr_polling(); +} + #endif /* _ASM_X86_MWAIT_H */ --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -700,29 +700,6 @@ static inline void sync_core(void) #endif } -static inline void __monitor(const void *eax, unsigned long ecx, -unsigned long edx) -{ - /* "monitor %eax, %ecx, %edx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc8;" -:: "a" (eax), "c" (ecx), "d"(edx)); -} - -static inline void __mwait(unsigned long eax, unsigned long ecx) -{ - /* "mwait %eax, %ecx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc9;" -:: "a" (eax), "c" (ecx)); -} - -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) -{ - trace_hardirqs_on(); - /* "mwait %eax, %ecx;" */ - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" -:: "a" (eax), "c" (ecx)); -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void init_amd_e400_c1e_mask(void); --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); -/* - * This uses new MONITOR/MWAIT
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: > On Tue, 19 Nov 2013 16:13:38 +0100 > Peter Zijlstra wrote: > > > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > > > That said, that drive is completely wrecked. It uses > > > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > > > > > Dude, wtf are you guys smoking? > > > I applied this patch on top of upstream kernel (801a760) and found out > my machine completely failed to enter idle when nothing is running. > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle > frequently. > > Both ACPI idle and intel_idle behaves the same way. I have to do the > following change to allow entering C-states again. > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 80014da..b51d1e1 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned > long eax, unsigned long ecx) clflush((void > *)_thread_info()->flags); > __monitor((void *)_thread_info()->flags, 0, 0); > - if (!current_set_polling_and_test()) > +if (!need_resched()) > __mwait(eax, ecx); > - > - __current_clr_polling(); > } > > #endif /* _ASM_X86_MWAIT_H */ That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. /me goes prod at it -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: On Tue, 19 Nov 2013 16:13:38 +0100 Peter Zijlstra pet...@infradead.org wrote: On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 80014da..b51d1e1 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) clflush((void *)current_thread_info()-flags); __monitor((void *)current_thread_info()-flags, 0, 0); - if (!current_set_polling_and_test()) +if (!need_resched()) __mwait(eax, ecx); - - __current_clr_polling(); } #endif /* _ASM_X86_MWAIT_H */ That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. /me goes prod at it -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. The below version seems to work properly again with both acpi_idle and intel_idle. Now to go make that preempt_disable_no_resched cleanup compile.. :-) --- Subject: x86, acpi, idle: Restructure the mwait idle routines From: Peter Zijlstra pet...@infradead.org Date: Tue, 19 Nov 2013 12:31:53 +0100 People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Cc: ar...@linux.intel.com Cc: jacob.jun@linux.intel.com Cc: Mike Galbraith bitbuc...@online.de Cc: Ingo Molnar mi...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: h...@zytor.com Cc: l...@kernel.org Cc: shaohua...@intel.com Cc: rui.zh...@intel.com Acked-by: Rafael J. Wysocki rafael.j.wyso...@intel.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/include/asm/mwait.h | 40 + arch/x86/include/asm/processor.h | 23 - arch/x86/kernel/acpi/cstate.c | 23 - drivers/acpi/acpi_pad.c|5 drivers/acpi/processor_idle.c | 15 - drivers/idle/intel_idle.c |8 --- drivers/thermal/intel_powerclamp.c |4 --- 7 files changed, 43 insertions(+), 75 deletions(-) --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MWAIT_H #define _ASM_X86_MWAIT_H +#include linux/sched.h + #define MWAIT_SUBSTATE_MASK0xf #define MWAIT_CSTATE_MASK 0xf #define MWAIT_SUBSTATE_SIZE4 @@ -13,4 +15,42 @@ #define MWAIT_ECX_INTERRUPT_BREAK 0x1 +static inline void __monitor(const void *eax, unsigned long ecx, +unsigned long edx) +{ + /* monitor %eax, %ecx, %edx; */ + asm volatile(.byte 0x0f, 0x01, 0xc8; +:: a (eax), c (ecx), d(edx)); +} + +static inline void __mwait(unsigned long eax, unsigned long ecx) +{ + /* mwait %eax, %ecx; */ + asm volatile(.byte 0x0f, 0x01, 0xc9; +:: a (eax), c (ecx)); +} + +/* + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, + * which can obviate IPI to trigger checking of need_resched. + * We execute MONITOR against need_resched and enter optimized wait state + * through MWAIT. Whenever someone changes need_resched, we would be woken + * up from MWAIT (without an IPI). + * + * New with Core Duo processors, MWAIT can take some hints based on CPU + * capability. + */ +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) +{ + if (!current_set_polling_and_test()) { + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)current_thread_info()-flags); + + __monitor((void *)current_thread_info()-flags, 0, 0); + if (!need_resched()) + __mwait(eax, ecx); + } + __current_clr_polling(); +} + #endif /* _ASM_X86_MWAIT_H */ --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -700,29 +700,6 @@ static inline void sync_core(void) #endif } -static inline void __monitor(const void *eax, unsigned long ecx, -unsigned long edx) -{ - /* monitor %eax, %ecx, %edx; */ - asm volatile(.byte 0x0f, 0x01, 0xc8; -:: a (eax), c (ecx), d(edx)); -} - -static inline void __mwait(unsigned long eax, unsigned long ecx) -{ - /* mwait %eax, %ecx; */ - asm volatile(.byte 0x0f, 0x01, 0xc9; -:: a (eax), c (ecx)); -} - -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) -{ - trace_hardirqs_on(); - /* mwait %eax, %ecx; */ - asm volatile(sti; .byte 0x0f, 0x01, 0xc9; -:: a (eax), c (ecx)); -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void init_amd_e400_c1e_mask(void); --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi }
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On 11/20/2013 2:58 AM, Peter Zijlstra wrote: On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: On 11/20/2013 2:58 AM, Peter Zijlstra wrote: So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) Ah indeed, the thread_info::status is in the same cacheline. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On 11/20/2013 8:33 AM, Peter Zijlstra wrote: On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: On 11/20/2013 2:58 AM, Peter Zijlstra wrote: So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) Ah indeed, the thread_info::status is in the same cacheline. you told it to wake on any write to that cacheline and then you write to it ;-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, 19 Nov 2013 16:13:38 +0100 Peter Zijlstra wrote: > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > > That said, that drive is completely wrecked. It uses > > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > > > Dude, wtf are you guys smoking? > I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 80014da..b51d1e1 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) clflush((void *)_thread_info()->flags); __monitor((void *)_thread_info()->flags, 0, 0); - if (!current_set_polling_and_test()) +if (!need_resched()) __mwait(eax, ecx); - - __current_clr_polling(); } #endif /* _ASM_X86_MWAIT_H */ Did i miss any other patches? Jacob > --- > Subject: sched: Take away preempt_enable_no_resched() and friends > from modules > > There is no way in hell modules are going to play preemption tricks > like this. > > Cc: eliezer.ta...@linux.intel.com > Cc: ar...@linux.intel.com > Reviewed-by: Thomas Gleixner > Signed-off-by: Peter Zijlstra > --- > drivers/thermal/intel_powerclamp.c | 2 +- > include/linux/preempt.h| 8 +++- > include/net/busy_poll.h| 15 +++ > 3 files changed, 11 insertions(+), 14 deletions(-) > > diff --git a/drivers/thermal/intel_powerclamp.c > b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c > 100644 --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) > atomic_inc(_wakeup_counter); > } > tick_nohz_idle_exit(); > - preempt_enable_no_resched(); > + preempt_enable(); > } > del_timer_sync(_timer); > clear_bit(cpunr, cpu_clamping_mask); > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > index a3d9dc8c2c00..3ed2b5335ab4 100644 > --- a/include/linux/preempt.h > +++ b/include/linux/preempt.h > @@ -64,7 +64,7 @@ do { \ > } while (0) > > #else > -#define preempt_enable() preempt_enable_no_resched() > +#define preempt_enable() sched_preempt_enable_no_resched() > #define preempt_check_resched() do { } while (0) > #endif > > @@ -116,6 +116,12 @@ do { \ > > #endif /* CONFIG_PREEMPT_COUNT */ > > +#ifdef MODULE > +#undef preempt_enable_no_resched > +#undef preempt_enable_no_resched_notrace > +#undef preempt_check_resched > +#endif > + > #ifdef CONFIG_PREEMPT_NOTIFIERS > > struct preempt_notifier; > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > index 829627d7b846..756827a86c2d 100644 > --- a/include/net/busy_poll.h > +++ b/include/net/busy_poll.h > @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) > return sysctl_net_busy_poll; > } > > -/* a wrapper to make debug_smp_processor_id() happy > - * we can use sched_clock() because we don't care much about > precision > - * we only care that the average is bounded > - */ > -#ifdef CONFIG_DEBUG_PREEMPT > static inline u64 busy_loop_us_clock(void) > { > u64 rc; > > + /* XXX with interrupts enabled sched_clock() can return > utter garbage */ + > preempt_disable_notrace(); > rc = sched_clock(); > - preempt_enable_no_resched_notrace(); > + preempt_enable_notrace(); > > return rc >> 10; > } > -#else /* CONFIG_DEBUG_PREEMPT */ > -static inline u64 busy_loop_us_clock(void) > -{ > - return sched_clock() >> 10; > -} > -#endif /* CONFIG_DEBUG_PREEMPT */ > > static inline unsigned long sk_busy_loop_end_time(struct sock *sk) > { > [Jacob Pan] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > That said, that drive is completely wrecked. It uses > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > Dude, wtf are you guys smoking? --- Subject: sched: Take away preempt_enable_no_resched() and friends from modules There is no way in hell modules are going to play preemption tricks like this. Cc: eliezer.ta...@linux.intel.com Cc: ar...@linux.intel.com Reviewed-by: Thomas Gleixner Signed-off-by: Peter Zijlstra --- drivers/thermal/intel_powerclamp.c | 2 +- include/linux/preempt.h| 8 +++- include/net/busy_poll.h| 15 +++ 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) atomic_inc(_wakeup_counter); } tick_nohz_idle_exit(); - preempt_enable_no_resched(); + preempt_enable(); } del_timer_sync(_timer); clear_bit(cpunr, cpu_clamping_mask); diff --git a/include/linux/preempt.h b/include/linux/preempt.h index a3d9dc8c2c00..3ed2b5335ab4 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -64,7 +64,7 @@ do { \ } while (0) #else -#define preempt_enable() preempt_enable_no_resched() +#define preempt_enable() sched_preempt_enable_no_resched() #define preempt_check_resched() do { } while (0) #endif @@ -116,6 +116,12 @@ do { \ #endif /* CONFIG_PREEMPT_COUNT */ +#ifdef MODULE +#undef preempt_enable_no_resched +#undef preempt_enable_no_resched_notrace +#undef preempt_check_resched +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS struct preempt_notifier; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 829627d7b846..756827a86c2d 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) return sysctl_net_busy_poll; } -/* a wrapper to make debug_smp_processor_id() happy - * we can use sched_clock() because we don't care much about precision - * we only care that the average is bounded - */ -#ifdef CONFIG_DEBUG_PREEMPT static inline u64 busy_loop_us_clock(void) { u64 rc; + /* XXX with interrupts enabled sched_clock() can return utter garbage */ + preempt_disable_notrace(); rc = sched_clock(); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace(); return rc >> 10; } -#else /* CONFIG_DEBUG_PREEMPT */ -static inline u64 busy_loop_us_clock(void) -{ - return sched_clock() >> 10; -} -#endif /* CONFIG_DEBUG_PREEMPT */ static inline unsigned long sk_busy_loop_end_time(struct sock *sk) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: > >diff --git a/drivers/thermal/intel_powerclamp.c > >b/drivers/thermal/intel_powerclamp.c > >index 8f181b3f842b..e8275f2df9af 100644 > >--- a/drivers/thermal/intel_powerclamp.c > >+++ b/drivers/thermal/intel_powerclamp.c > >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > > */ > > local_touch_nmi(); > > stop_critical_timings(); > >-__monitor((void *)_thread_info()->flags, 0, 0); > >-cpu_relax(); /* allow HT sibling to run */ > >-__mwait(eax, ecx); > >+mwait_idle_with_hints(eax, ecx); > > start_critical_timings(); > > atomic_inc(_wakeup_counter); > > } > > > > hmm I take it that mwait_idle_with_hints is the one that also checks > need_resched() ? > if so... powerclamp may not want to use that (the whole point is to NOT give > the cpu > to tasks!) That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: > >diff --git a/drivers/thermal/intel_powerclamp.c > >b/drivers/thermal/intel_powerclamp.c > >index 8f181b3f842b..e8275f2df9af 100644 > >--- a/drivers/thermal/intel_powerclamp.c > >+++ b/drivers/thermal/intel_powerclamp.c > >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > > */ > > local_touch_nmi(); > > stop_critical_timings(); > >-__monitor((void *)_thread_info()->flags, 0, 0); > >-cpu_relax(); /* allow HT sibling to run */ > >-__mwait(eax, ecx); > >+mwait_idle_with_hints(eax, ecx); > > start_critical_timings(); > > atomic_inc(_wakeup_counter); > > } > > > > hmm I take it that mwait_idle_with_hints is the one that also checks > need_resched() ? > if so... powerclamp may not want to use that (the whole point is to NOT give > the cpu > to tasks!) Of course, you're very much not getting something that wins from stop_machine or a fifo-99 task. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..e8275f2df9af 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); - __monitor((void *)_thread_info()->flags, 0, 0); - cpu_relax(); /* allow HT sibling to run */ - __mwait(eax, ecx); + mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(_wakeup_counter); } hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? if so... powerclamp may not want to use that (the whole point is to NOT give the cpu to tasks!) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote: > People seem to delight in writing wrong and broken mwait idle routines; > collapse the lot. > > This leaves mwait_play_dead() the sole remaining user of __mwait() and > new __mwait() users are probably doing it wrong. > > Also remove __sti_mwait() as its unused. > > Signed-off-by: Peter Zijlstra > --- > > Mike, does this cure your core2? Nope. Maybe an acpi/bios thingy on this box since diags that fired on lappy during boot did not fire on desktop, and both are core2 booting same kernel. I kinda suspect I'll either be stuck with default_idle or have to resurrect mwait_idle and carry it locally if I want the thing to work well. Guess I'll find out if/when I have time to squabble with it. Meanwhile, desktop box works fine modulo benchmarks, lappy works fine too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I don't care about much. Neither box is wonderful at rt testing where I usually boot max_cstate=1, both just became a bit _less_ wonderful :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote: > People seem to delight in writing wrong and broken mwait idle routines; > collapse the lot. > > This leaves mwait_play_dead() the sole remaining user of __mwait() and > new __mwait() users are probably doing it wrong. > > Also remove __sti_mwait() as its unused. > > Signed-off-by: Peter Zijlstra For the ACPI part: Acked-by: Rafael J. Wysocki > --- > > Mike, does this cure your core2? > > arch/x86/include/asm/mwait.h | 42 > ++ > arch/x86/include/asm/processor.h | 23 - > arch/x86/kernel/acpi/cstate.c | 23 - > drivers/acpi/acpi_pad.c| 5 + > drivers/acpi/processor_idle.c | 15 -- > drivers/idle/intel_idle.c | 8 +--- > drivers/thermal/intel_powerclamp.c | 4 +--- > 7 files changed, 45 insertions(+), 75 deletions(-) > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 2f366d0ac6b4..80014dade987 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -1,6 +1,8 @@ > #ifndef _ASM_X86_MWAIT_H > #define _ASM_X86_MWAIT_H > > +#include > + > #define MWAIT_SUBSTATE_MASK 0xf > #define MWAIT_CSTATE_MASK0xf > #define MWAIT_SUBSTATE_SIZE 4 > @@ -13,4 +15,44 @@ > > #define MWAIT_ECX_INTERRUPT_BREAK0x1 > > +static inline void __monitor(const void *eax, unsigned long ecx, > + unsigned long edx) > +{ > + /* "monitor %eax, %ecx, %edx;" */ > + asm volatile(".byte 0x0f, 0x01, 0xc8;" > + :: "a" (eax), "c" (ecx), "d"(edx)); > +} > + > +static inline void __mwait(unsigned long eax, unsigned long ecx) > +{ > + /* "mwait %eax, %ecx;" */ > + asm volatile(".byte 0x0f, 0x01, 0xc9;" > + :: "a" (eax), "c" (ecx)); > +} > + > +/* > + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, > + * which can obviate IPI to trigger checking of need_resched. > + * We execute MONITOR against need_resched and enter optimized wait state > + * through MWAIT. Whenever someone changes need_resched, we would be woken > + * up from MWAIT (without an IPI). > + * > + * New with Core Duo processors, MWAIT can take some hints based on CPU > + * capability. > + */ > +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long > ecx) > +{ > + if (need_resched()) > + return; > + > + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > + clflush((void *)_thread_info()->flags); > + > + __monitor((void *)_thread_info()->flags, 0, 0); > + if (!current_set_polling_and_test()) > + __mwait(eax, ecx); > + > + __current_clr_polling(); > +} > + > #endif /* _ASM_X86_MWAIT_H */ > diff --git a/arch/x86/include/asm/processor.h > b/arch/x86/include/asm/processor.h > index 7b034a4057f9..24821f5768bc 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -700,29 +700,6 @@ static inline void sync_core(void) > #endif > } > > -static inline void __monitor(const void *eax, unsigned long ecx, > - unsigned long edx) > -{ > - /* "monitor %eax, %ecx, %edx;" */ > - asm volatile(".byte 0x0f, 0x01, 0xc8;" > - :: "a" (eax), "c" (ecx), "d"(edx)); > -} > - > -static inline void __mwait(unsigned long eax, unsigned long ecx) > -{ > - /* "mwait %eax, %ecx;" */ > - asm volatile(".byte 0x0f, 0x01, 0xc9;" > - :: "a" (eax), "c" (ecx)); > -} > - > -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) > -{ > - trace_hardirqs_on(); > - /* "mwait %eax, %ecx;" */ > - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" > - :: "a" (eax), "c" (ecx)); > -} > - > extern void select_idle_routine(const struct cpuinfo_x86 *c); > extern void init_amd_e400_c1e_mask(void); > > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c > index d2b7f27781bc..e69182fd01cf 100644 > --- a/arch/x86/kernel/acpi/cstate.c > +++ b/arch/x86/kernel/acpi/cstate.c > @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu, > } > EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); > > -/* > - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, > - * which can obviate IPI to trigger checking of need_resched. > - * We execute MONITOR against need_resched and enter optimized wait state > - * through MWAIT. Whenever someone changes need_resched, we would be woken > - * up from MWAIT (without an IPI). > - * > - * New with Core Duo processors, MWAIT can take some hints based on CPU > - * capability. > - */ > -void mwait_idle_with_hints(unsigned long ax, unsigned long cx) > -{ > - if (!need_resched()) { > - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > - clflush((void *)_thread_info()->flags); > - > -
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote: People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Signed-off-by: Peter Zijlstra pet...@infradead.org For the ACPI part: Acked-by: Rafael J. Wysocki rafael.j.wyso...@intel.com --- Mike, does this cure your core2? arch/x86/include/asm/mwait.h | 42 ++ arch/x86/include/asm/processor.h | 23 - arch/x86/kernel/acpi/cstate.c | 23 - drivers/acpi/acpi_pad.c| 5 + drivers/acpi/processor_idle.c | 15 -- drivers/idle/intel_idle.c | 8 +--- drivers/thermal/intel_powerclamp.c | 4 +--- 7 files changed, 45 insertions(+), 75 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 2f366d0ac6b4..80014dade987 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MWAIT_H #define _ASM_X86_MWAIT_H +#include linux/sched.h + #define MWAIT_SUBSTATE_MASK 0xf #define MWAIT_CSTATE_MASK0xf #define MWAIT_SUBSTATE_SIZE 4 @@ -13,4 +15,44 @@ #define MWAIT_ECX_INTERRUPT_BREAK0x1 +static inline void __monitor(const void *eax, unsigned long ecx, + unsigned long edx) +{ + /* monitor %eax, %ecx, %edx; */ + asm volatile(.byte 0x0f, 0x01, 0xc8; + :: a (eax), c (ecx), d(edx)); +} + +static inline void __mwait(unsigned long eax, unsigned long ecx) +{ + /* mwait %eax, %ecx; */ + asm volatile(.byte 0x0f, 0x01, 0xc9; + :: a (eax), c (ecx)); +} + +/* + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, + * which can obviate IPI to trigger checking of need_resched. + * We execute MONITOR against need_resched and enter optimized wait state + * through MWAIT. Whenever someone changes need_resched, we would be woken + * up from MWAIT (without an IPI). + * + * New with Core Duo processors, MWAIT can take some hints based on CPU + * capability. + */ +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) +{ + if (need_resched()) + return; + + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)current_thread_info()-flags); + + __monitor((void *)current_thread_info()-flags, 0, 0); + if (!current_set_polling_and_test()) + __mwait(eax, ecx); + + __current_clr_polling(); +} + #endif /* _ASM_X86_MWAIT_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7b034a4057f9..24821f5768bc 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -700,29 +700,6 @@ static inline void sync_core(void) #endif } -static inline void __monitor(const void *eax, unsigned long ecx, - unsigned long edx) -{ - /* monitor %eax, %ecx, %edx; */ - asm volatile(.byte 0x0f, 0x01, 0xc8; - :: a (eax), c (ecx), d(edx)); -} - -static inline void __mwait(unsigned long eax, unsigned long ecx) -{ - /* mwait %eax, %ecx; */ - asm volatile(.byte 0x0f, 0x01, 0xc9; - :: a (eax), c (ecx)); -} - -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) -{ - trace_hardirqs_on(); - /* mwait %eax, %ecx; */ - asm volatile(sti; .byte 0x0f, 0x01, 0xc9; - :: a (eax), c (ecx)); -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void init_amd_e400_c1e_mask(void); diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index d2b7f27781bc..e69182fd01cf 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu, } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); -/* - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, - * which can obviate IPI to trigger checking of need_resched. - * We execute MONITOR against need_resched and enter optimized wait state - * through MWAIT. Whenever someone changes need_resched, we would be woken - * up from MWAIT (without an IPI). - * - * New with Core Duo processors, MWAIT can take some hints based on CPU - * capability. - */ -void mwait_idle_with_hints(unsigned long ax, unsigned long cx) -{ - if (!need_resched()) { - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) - clflush((void *)current_thread_info()-flags); - - __monitor((void *)current_thread_info()-flags, 0, 0); - smp_mb(); - if
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote: People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Signed-off-by: Peter Zijlstra pet...@infradead.org --- Mike, does this cure your core2? Nope. Maybe an acpi/bios thingy on this box since diags that fired on lappy during boot did not fire on desktop, and both are core2 booting same kernel. I kinda suspect I'll either be stuck with default_idle or have to resurrect mwait_idle and carry it locally if I want the thing to work well. Guess I'll find out if/when I have time to squabble with it. Meanwhile, desktop box works fine modulo benchmarks, lappy works fine too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I don't care about much. Neither box is wonderful at rt testing where I usually boot max_cstate=1, both just became a bit _less_ wonderful :) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..e8275f2df9af 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); - __monitor((void *)current_thread_info()-flags, 0, 0); - cpu_relax(); /* allow HT sibling to run */ - __mwait(eax, ecx); + mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(idle_wakeup_counter); } hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? if so... powerclamp may not want to use that (the whole point is to NOT give the cpu to tasks!) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..e8275f2df9af 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); -__monitor((void *)current_thread_info()-flags, 0, 0); -cpu_relax(); /* allow HT sibling to run */ -__mwait(eax, ecx); +mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(idle_wakeup_counter); } hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? if so... powerclamp may not want to use that (the whole point is to NOT give the cpu to tasks!) Of course, you're very much not getting something that wins from stop_machine or a fifo-99 task. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..e8275f2df9af 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); -__monitor((void *)current_thread_info()-flags, 0, 0); -cpu_relax(); /* allow HT sibling to run */ -__mwait(eax, ecx); +mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(idle_wakeup_counter); } hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? if so... powerclamp may not want to use that (the whole point is to NOT give the cpu to tasks!) That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? --- Subject: sched: Take away preempt_enable_no_resched() and friends from modules There is no way in hell modules are going to play preemption tricks like this. Cc: eliezer.ta...@linux.intel.com Cc: ar...@linux.intel.com Reviewed-by: Thomas Gleixner t...@linutronix.de Signed-off-by: Peter Zijlstra pet...@infradead.org --- drivers/thermal/intel_powerclamp.c | 2 +- include/linux/preempt.h| 8 +++- include/net/busy_poll.h| 15 +++ 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) atomic_inc(idle_wakeup_counter); } tick_nohz_idle_exit(); - preempt_enable_no_resched(); + preempt_enable(); } del_timer_sync(wakeup_timer); clear_bit(cpunr, cpu_clamping_mask); diff --git a/include/linux/preempt.h b/include/linux/preempt.h index a3d9dc8c2c00..3ed2b5335ab4 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -64,7 +64,7 @@ do { \ } while (0) #else -#define preempt_enable() preempt_enable_no_resched() +#define preempt_enable() sched_preempt_enable_no_resched() #define preempt_check_resched() do { } while (0) #endif @@ -116,6 +116,12 @@ do { \ #endif /* CONFIG_PREEMPT_COUNT */ +#ifdef MODULE +#undef preempt_enable_no_resched +#undef preempt_enable_no_resched_notrace +#undef preempt_check_resched +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS struct preempt_notifier; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 829627d7b846..756827a86c2d 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) return sysctl_net_busy_poll; } -/* a wrapper to make debug_smp_processor_id() happy - * we can use sched_clock() because we don't care much about precision - * we only care that the average is bounded - */ -#ifdef CONFIG_DEBUG_PREEMPT static inline u64 busy_loop_us_clock(void) { u64 rc; + /* XXX with interrupts enabled sched_clock() can return utter garbage */ + preempt_disable_notrace(); rc = sched_clock(); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace(); return rc 10; } -#else /* CONFIG_DEBUG_PREEMPT */ -static inline u64 busy_loop_us_clock(void) -{ - return sched_clock() 10; -} -#endif /* CONFIG_DEBUG_PREEMPT */ static inline unsigned long sk_busy_loop_end_time(struct sock *sk) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
On Tue, 19 Nov 2013 16:13:38 +0100 Peter Zijlstra pet...@infradead.org wrote: On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 80014da..b51d1e1 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) clflush((void *)current_thread_info()-flags); __monitor((void *)current_thread_info()-flags, 0, 0); - if (!current_set_polling_and_test()) +if (!need_resched()) __mwait(eax, ecx); - - __current_clr_polling(); } #endif /* _ASM_X86_MWAIT_H */ Did i miss any other patches? Jacob --- Subject: sched: Take away preempt_enable_no_resched() and friends from modules There is no way in hell modules are going to play preemption tricks like this. Cc: eliezer.ta...@linux.intel.com Cc: ar...@linux.intel.com Reviewed-by: Thomas Gleixner t...@linutronix.de Signed-off-by: Peter Zijlstra pet...@infradead.org --- drivers/thermal/intel_powerclamp.c | 2 +- include/linux/preempt.h| 8 +++- include/net/busy_poll.h| 15 +++ 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) atomic_inc(idle_wakeup_counter); } tick_nohz_idle_exit(); - preempt_enable_no_resched(); + preempt_enable(); } del_timer_sync(wakeup_timer); clear_bit(cpunr, cpu_clamping_mask); diff --git a/include/linux/preempt.h b/include/linux/preempt.h index a3d9dc8c2c00..3ed2b5335ab4 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -64,7 +64,7 @@ do { \ } while (0) #else -#define preempt_enable() preempt_enable_no_resched() +#define preempt_enable() sched_preempt_enable_no_resched() #define preempt_check_resched() do { } while (0) #endif @@ -116,6 +116,12 @@ do { \ #endif /* CONFIG_PREEMPT_COUNT */ +#ifdef MODULE +#undef preempt_enable_no_resched +#undef preempt_enable_no_resched_notrace +#undef preempt_check_resched +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS struct preempt_notifier; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 829627d7b846..756827a86c2d 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) return sysctl_net_busy_poll; } -/* a wrapper to make debug_smp_processor_id() happy - * we can use sched_clock() because we don't care much about precision - * we only care that the average is bounded - */ -#ifdef CONFIG_DEBUG_PREEMPT static inline u64 busy_loop_us_clock(void) { u64 rc; + /* XXX with interrupts enabled sched_clock() can return utter garbage */ + preempt_disable_notrace(); rc = sched_clock(); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace(); return rc 10; } -#else /* CONFIG_DEBUG_PREEMPT */ -static inline u64 busy_loop_us_clock(void) -{ - return sched_clock() 10; -} -#endif /* CONFIG_DEBUG_PREEMPT */ static inline unsigned long sk_busy_loop_end_time(struct sock *sk) { [Jacob Pan] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/