Re: [patch] x86, apic: use tsc deadline for oneshot when available
Patch looks good. Acked-by: Venkatesh Pallipadi On Mon, Oct 22, 2012 at 2:37 PM, Suresh Siddha wrote: > > Thomas, You wanted to run some tests with this, right? Please give it a > try and see if this is ok to be pushed to the -tip. > > thanks, > suresh > --8<-- > From: Suresh Siddha > Subject: x86, apic: use tsc deadline for oneshot when available > > If the TSC deadline mode is supported, LAPIC timer one-shot mode can be > implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated > when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE > MSR. > > This enables us to skip the APIC calibration during boot. Also, > in xapic mode, this enables us to skip the uncached apic access > to re-arm the APIC timer. > > As this timer ticks at the high frequency TSC rate, we use the > TSC_DIVISOR (32) to work with the 32-bit restrictions in the clockevent > API's to avoid 64-bit divides etc (frequency is u32 and "unsigned long" > in the set_next_event(), max_delta limits the next event to 32-bit for > 32-bit kernel). > > Signed-off-by: Suresh Siddha > --- > Documentation/kernel-parameters.txt |4 ++ > arch/x86/include/asm/msr-index.h|2 + > arch/x86/kernel/apic/apic.c | 66 > ++- > 3 files changed, 55 insertions(+), 17 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index 9776f06..4aa9ca0 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -1304,6 +1304,10 @@ bytes respectively. Such letter suffixes can also be > entirely omitted. > lapic [X86-32,APIC] Enable the local APIC even if BIOS > disabled it. > > + lapic= [x86,APIC] "notscdeadline" Do not use TSC deadline > + value for LAPIC timer one-shot implementation. Default > + back to the programmable timer unit in the LAPIC. > + > lapic_timer_c2_ok [X86,APIC] trust the local apic timer > in C2 power state. > > diff --git a/arch/x86/include/asm/msr-index.h > b/arch/x86/include/asm/msr-index.h > index 7f0edce..e400cdb 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -337,6 +337,8 @@ > #define MSR_IA32_MISC_ENABLE_TURBO_DISABLE (1ULL << 38) > #define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE (1ULL << 39) > > +#define MSR_IA32_TSC_DEADLINE 0x06E0 > + > /* P4/Xeon+ specific */ > #define MSR_IA32_MCG_EAX 0x0180 > #define MSR_IA32_MCG_EBX 0x0181 > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index b17416e..b0c49b1 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -90,21 +90,6 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid); > */ > DEFINE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid, BAD_APICID); > > -/* > - * Knob to control our willingness to enable the local APIC. > - * > - * +1=force-enable > - */ > -static int force_enable_local_apic __initdata; > -/* > - * APIC command line parameters > - */ > -static int __init parse_lapic(char *arg) > -{ > - force_enable_local_apic = 1; > - return 0; > -} > -early_param("lapic", parse_lapic); > /* Local APIC was disabled by the BIOS and enabled by the kernel */ > static int enabled_via_apicbase; > > @@ -133,6 +118,25 @@ static inline void imcr_apic_to_pic(void) > } > #endif > > +/* > + * Knob to control our willingness to enable the local APIC. > + * > + * +1=force-enable > + */ > +static int force_enable_local_apic __initdata; > +/* > + * APIC command line parameters > + */ > +static int __init parse_lapic(char *arg) > +{ > + if (config_enabled(CONFIG_X86_32) && !arg) > + force_enable_local_apic = 1; > + else if (!strncmp(arg, "notscdeadline", 13)) > + setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); > + return 0; > +} > +early_param("lapic", parse_lapic); > + > #ifdef CONFIG_X86_64 > static int apic_calibrate_pmtmr __initdata; > static __init int setup_apicpmtimer(char *s) > @@ -315,6 +319,7 @@ int lapic_get_maxlvt(void) > > /* Clock divisor */ > #define APIC_DIVISOR 16 > +#define TSC_DIVISOR 32 > > /* > * This function sets up the local APIC timer, with a timeout of > @@ -333,6 +338,9 @@ static void __setup_APIC_LVTT(unsigned int clocks, int > oneshot, int irqen) > lvtt_value = LOCAL_TIMER_VECTOR; > if (!oneshot) > lvtt_value |= APIC_LVT_TIMER_PERIODIC; > + else if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) > + lvtt_value |= APIC_LVT_TIMER_TSCDEADLINE; > + > if (!lapic_is_integrated()) > lvtt_value |= SET_APIC_TIMER_BASE(APIC_TIMER_BASE_DIV); > > @@ -341,6 +349,11 @@ static void __setup_APIC_LVTT(unsigned int clocks, int > oneshot, int irqen) > >
Re: [patch] x86, apic: use tsc deadline for oneshot when available
Patch looks good. Acked-by: Venkatesh Pallipadi ve...@google.com On Mon, Oct 22, 2012 at 2:37 PM, Suresh Siddha suresh.b.sid...@intel.com wrote: Thomas, You wanted to run some tests with this, right? Please give it a try and see if this is ok to be pushed to the -tip. thanks, suresh --8-- From: Suresh Siddha suresh.b.sid...@intel.com Subject: x86, apic: use tsc deadline for oneshot when available If the TSC deadline mode is supported, LAPIC timer one-shot mode can be implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE MSR. This enables us to skip the APIC calibration during boot. Also, in xapic mode, this enables us to skip the uncached apic access to re-arm the APIC timer. As this timer ticks at the high frequency TSC rate, we use the TSC_DIVISOR (32) to work with the 32-bit restrictions in the clockevent API's to avoid 64-bit divides etc (frequency is u32 and unsigned long in the set_next_event(), max_delta limits the next event to 32-bit for 32-bit kernel). Signed-off-by: Suresh Siddha suresh.b.sid...@intel.com --- Documentation/kernel-parameters.txt |4 ++ arch/x86/include/asm/msr-index.h|2 + arch/x86/kernel/apic/apic.c | 66 ++- 3 files changed, 55 insertions(+), 17 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9776f06..4aa9ca0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1304,6 +1304,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. lapic [X86-32,APIC] Enable the local APIC even if BIOS disabled it. + lapic= [x86,APIC] notscdeadline Do not use TSC deadline + value for LAPIC timer one-shot implementation. Default + back to the programmable timer unit in the LAPIC. + lapic_timer_c2_ok [X86,APIC] trust the local apic timer in C2 power state. diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 7f0edce..e400cdb 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -337,6 +337,8 @@ #define MSR_IA32_MISC_ENABLE_TURBO_DISABLE (1ULL 38) #define MSR_IA32_MISC_ENABLE_IP_PREF_DISABLE (1ULL 39) +#define MSR_IA32_TSC_DEADLINE 0x06E0 + /* P4/Xeon+ specific */ #define MSR_IA32_MCG_EAX 0x0180 #define MSR_IA32_MCG_EBX 0x0181 diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b17416e..b0c49b1 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -90,21 +90,6 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid); */ DEFINE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid, BAD_APICID); -/* - * Knob to control our willingness to enable the local APIC. - * - * +1=force-enable - */ -static int force_enable_local_apic __initdata; -/* - * APIC command line parameters - */ -static int __init parse_lapic(char *arg) -{ - force_enable_local_apic = 1; - return 0; -} -early_param(lapic, parse_lapic); /* Local APIC was disabled by the BIOS and enabled by the kernel */ static int enabled_via_apicbase; @@ -133,6 +118,25 @@ static inline void imcr_apic_to_pic(void) } #endif +/* + * Knob to control our willingness to enable the local APIC. + * + * +1=force-enable + */ +static int force_enable_local_apic __initdata; +/* + * APIC command line parameters + */ +static int __init parse_lapic(char *arg) +{ + if (config_enabled(CONFIG_X86_32) !arg) + force_enable_local_apic = 1; + else if (!strncmp(arg, notscdeadline, 13)) + setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); + return 0; +} +early_param(lapic, parse_lapic); + #ifdef CONFIG_X86_64 static int apic_calibrate_pmtmr __initdata; static __init int setup_apicpmtimer(char *s) @@ -315,6 +319,7 @@ int lapic_get_maxlvt(void) /* Clock divisor */ #define APIC_DIVISOR 16 +#define TSC_DIVISOR 32 /* * This function sets up the local APIC timer, with a timeout of @@ -333,6 +338,9 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) lvtt_value = LOCAL_TIMER_VECTOR; if (!oneshot) lvtt_value |= APIC_LVT_TIMER_PERIODIC; + else if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) + lvtt_value |= APIC_LVT_TIMER_TSCDEADLINE; + if (!lapic_is_integrated()) lvtt_value |= SET_APIC_TIMER_BASE(APIC_TIMER_BASE_DIV); @@ -341,6 +349,11 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); + if (lvtt_value
Re: 2.6.25-rc1 regression - suspend to ram
On Tue, Feb 12, 2008 at 12:10:54AM +0100, R. J. Wysocki wrote: > On Monday, 11 of February 2008, Lukas Hejtmanek wrote: > > Hello, > > Hi, > > > 2.6.25-rc1 takes really long time till it suspends (about 30-40secs, used to > > be about 5 secs at all) and it is resuming about few minutes. While > > resuming, > > capslock toggles the capslock led but with few secs delay. > > > > 2.6.24-git15 was OK. 2.6.24 is OK. > > > > I have Lenovo ThinkPad T61. > > If you have CONFIG_CPU_IDLE set, please try to boot with idle=poll and see if > that helps. > Just sent this patch to fix a regression in acpi processor_idle.c on another thread. Can you try the patch below and check whether that helps. Thanks, Venki Earlier patch (bc71bec91f9875ef825d12104acf3bf4ca215fa4) broke suspend resume on many laptops. The problem was reported by Carlos R. Mafra and Calvin Walton, who bisected the issue to above patch. The problem was because, C2 and C3 code were calling acpi_idle_enter_c1 directly, with C2 or C3 as state parameter, while suspend/resume was in progress. The patch bc71bec started making use of that state information, assuming that it would always be referring to C1 state. This caused the problem with suspend-resume as we ended up using C2/C3 state indirectly. Fix this by adding acpi_idle_suspend check in enter_c1. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.25-rc1/drivers/acpi/processor_idle.c === --- linux-2.6.25-rc1.orig/drivers/acpi/processor_idle.c +++ linux-2.6.25-rc1/drivers/acpi/processor_idle.c @@ -1420,6 +1420,14 @@ static int acpi_idle_enter_c1(struct cpu return 0; local_irq_disable(); + + /* Do not access any ACPI IO ports in suspend path */ + if (acpi_idle_suspend) { + acpi_safe_halt(); + local_irq_enable(); + return 0; + } + if (pr->flags.bm_check) acpi_idle_update_bm_rld(pr, cx); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25-rc1 regression] Suspend to RAM (bisected)
On Mon, Feb 11, 2008 at 12:06:50PM -0800, Venki Pallipadi wrote: > On Mon, Feb 11, 2008 at 05:37:04PM -0200, Carlos R. Mafra wrote: > > Pallipadi, Venkatesh wrote: > > > > > > Can you send me the output of acpidump and full dmesg to me. Looks like > > > it is a platform issue due to which we cannot use C1 mwait idle during > > > suspend resume, something similar to issue we had with using C2/C3 state > > > during idle. > > > > Full dmesg and acpidump outputs are attached. > > Above acpidump doesnt have all info, as it is loading some SSDT at run time. > Can you get the output of > > # acpidump --addr 0x7F6D8709 --length 0x04B7 > # acpidump --addr 0x7F6D8BC0 --length 0x0092 > Thanks for sending the dumps Carlos. The patch below (on top of rc1) should fix the problem. Can you please check it. Thanks, Venki Earlier patch (bc71bec91f9875ef825d12104acf3bf4ca215fa4) broke suspend resume on many laptops. The problem was reported by Carlos R. Mafra and Calvin Walton, who bisected the issue to above patch. The problem was because, C2 and C3 code were calling acpi_idle_enter_c1 directly, with C2 or C3 as state parameter, while suspend/resume was in progress. The patch bc71bec started making use of that state information, assuming that it would always be referring to C1 state. This caused the problem with suspend-resume as we ended up using C2/C3 state indirectly. Fix this by adding acpi_idle_suspend check in enter_c1. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.25-rc1/drivers/acpi/processor_idle.c === --- linux-2.6.25-rc1.orig/drivers/acpi/processor_idle.c +++ linux-2.6.25-rc1/drivers/acpi/processor_idle.c @@ -1420,6 +1420,14 @@ static int acpi_idle_enter_c1(struct cpu return 0; local_irq_disable(); + + /* Do not access any ACPI IO ports in suspend path */ + if (acpi_idle_suspend) { + acpi_safe_halt(); + local_irq_enable(); + return 0; + } + if (pr->flags.bm_check) acpi_idle_update_bm_rld(pr, cx); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25-rc1 regression] Suspend to RAM (bisected)
On Mon, Feb 11, 2008 at 05:37:04PM -0200, Carlos R. Mafra wrote: > Pallipadi, Venkatesh wrote: > > > > Can you send me the output of acpidump and full dmesg to me. Looks like > > it is a platform issue due to which we cannot use C1 mwait idle during > > suspend resume, something similar to issue we had with using C2/C3 state > > during idle. > > Full dmesg and acpidump outputs are attached. Above acpidump doesnt have all info, as it is loading some SSDT at run time. Can you get the output of # acpidump --addr 0x7F6D8709 --length 0x04B7 # acpidump --addr 0x7F6D8BC0 --length 0x0092 Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25-rc1 regression] Suspend to RAM (bisected)
On Mon, Feb 11, 2008 at 05:37:04PM -0200, Carlos R. Mafra wrote: Pallipadi, Venkatesh wrote: Can you send me the output of acpidump and full dmesg to me. Looks like it is a platform issue due to which we cannot use C1 mwait idle during suspend resume, something similar to issue we had with using C2/C3 state during idle. Full dmesg and acpidump outputs are attached. Above acpidump doesnt have all info, as it is loading some SSDT at run time. Can you get the output of # acpidump --addr 0x7F6D8709 --length 0x04B7 # acpidump --addr 0x7F6D8BC0 --length 0x0092 Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 regression - suspend to ram
On Tue, Feb 12, 2008 at 12:10:54AM +0100, R. J. Wysocki wrote: On Monday, 11 of February 2008, Lukas Hejtmanek wrote: Hello, Hi, 2.6.25-rc1 takes really long time till it suspends (about 30-40secs, used to be about 5 secs at all) and it is resuming about few minutes. While resuming, capslock toggles the capslock led but with few secs delay. 2.6.24-git15 was OK. 2.6.24 is OK. I have Lenovo ThinkPad T61. If you have CONFIG_CPU_IDLE set, please try to boot with idle=poll and see if that helps. Just sent this patch to fix a regression in acpi processor_idle.c on another thread. Can you try the patch below and check whether that helps. Thanks, Venki Earlier patch (bc71bec91f9875ef825d12104acf3bf4ca215fa4) broke suspend resume on many laptops. The problem was reported by Carlos R. Mafra and Calvin Walton, who bisected the issue to above patch. The problem was because, C2 and C3 code were calling acpi_idle_enter_c1 directly, with C2 or C3 as state parameter, while suspend/resume was in progress. The patch bc71bec started making use of that state information, assuming that it would always be referring to C1 state. This caused the problem with suspend-resume as we ended up using C2/C3 state indirectly. Fix this by adding acpi_idle_suspend check in enter_c1. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.25-rc1/drivers/acpi/processor_idle.c === --- linux-2.6.25-rc1.orig/drivers/acpi/processor_idle.c +++ linux-2.6.25-rc1/drivers/acpi/processor_idle.c @@ -1420,6 +1420,14 @@ static int acpi_idle_enter_c1(struct cpu return 0; local_irq_disable(); + + /* Do not access any ACPI IO ports in suspend path */ + if (acpi_idle_suspend) { + acpi_safe_halt(); + local_irq_enable(); + return 0; + } + if (pr-flags.bm_check) acpi_idle_update_bm_rld(pr, cx); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.25-rc1 regression] Suspend to RAM (bisected)
On Mon, Feb 11, 2008 at 12:06:50PM -0800, Venki Pallipadi wrote: On Mon, Feb 11, 2008 at 05:37:04PM -0200, Carlos R. Mafra wrote: Pallipadi, Venkatesh wrote: Can you send me the output of acpidump and full dmesg to me. Looks like it is a platform issue due to which we cannot use C1 mwait idle during suspend resume, something similar to issue we had with using C2/C3 state during idle. Full dmesg and acpidump outputs are attached. Above acpidump doesnt have all info, as it is loading some SSDT at run time. Can you get the output of # acpidump --addr 0x7F6D8709 --length 0x04B7 # acpidump --addr 0x7F6D8BC0 --length 0x0092 Thanks for sending the dumps Carlos. The patch below (on top of rc1) should fix the problem. Can you please check it. Thanks, Venki Earlier patch (bc71bec91f9875ef825d12104acf3bf4ca215fa4) broke suspend resume on many laptops. The problem was reported by Carlos R. Mafra and Calvin Walton, who bisected the issue to above patch. The problem was because, C2 and C3 code were calling acpi_idle_enter_c1 directly, with C2 or C3 as state parameter, while suspend/resume was in progress. The patch bc71bec started making use of that state information, assuming that it would always be referring to C1 state. This caused the problem with suspend-resume as we ended up using C2/C3 state indirectly. Fix this by adding acpi_idle_suspend check in enter_c1. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.25-rc1/drivers/acpi/processor_idle.c === --- linux-2.6.25-rc1.orig/drivers/acpi/processor_idle.c +++ linux-2.6.25-rc1/drivers/acpi/processor_idle.c @@ -1420,6 +1420,14 @@ static int acpi_idle_enter_c1(struct cpu return 0; local_irq_disable(); + + /* Do not access any ACPI IO ports in suspend path */ + if (acpi_idle_suspend) { + acpi_safe_halt(); + local_irq_enable(); + return 0; + } + if (pr-flags.bm_check) acpi_idle_update_bm_rld(pr, cx); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Simplify cpu_idle_wait
On Fri, Feb 08, 2008 at 11:28:48AM +0100, Andi Kleen wrote: > > > - set_cpus_allowed(current, tmp); > > + smp_mb(); > > + /* kick all the CPUs so that they exit out of pm_idle */ > > + smp_call_function(do_nothing, NULL, 0, 0); > > I think the last argument (wait) needs to be 1 to make sure it is > synchronous (for 32/64) Otherwise the patch looks great. Yes. Below is the updated patch Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.25-rc/arch/x86/kernel/process_32.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_32.c +++ linux-2.6.25-rc/arch/x86/kernel/process_32.c @@ -82,7 +82,6 @@ unsigned long thread_saved_pc(struct tas */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); void disable_hlt(void) { @@ -181,9 +180,6 @@ void cpu_idle(void) while (!need_resched()) { void (*idle)(void); - if (__get_cpu_var(cpu_idle_state)) - __get_cpu_var(cpu_idle_state) = 0; - check_pgt_cache(); rmb(); idle = pm_idle; @@ -208,40 +204,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current->cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) && !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ - smp_call_function_mask(map, do_nothing, 0, 0); - } while (!cpus_empty(map)); - - set_cpus_allowed(current, tmp); + smp_mb(); + /* kick all the CPUs so that they exit out of pm_idle */ + smp_call_function(do_nothing, NULL, 0, 1); } EXPORT_SYMBOL_GPL(cpu_idle_wait); Index: linux-2.6.25-rc/arch/x86/kernel/process_64.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_64.c +++ linux-2.6.25-rc/arch/x86/kernel/process_64.c @@ -64,7 +64,6 @@ EXPORT_SYMBOL(boot_option_idle_override) */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); static ATOMIC_NOTIFIER_HEAD(idle_notifier); @@ -139,41 +138,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current->cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) && - !per_cpu(cpu_idle_state, cpu)) -
Re: [PATCH] x86: Simplify cpu_idle_wait
On Fri, Feb 08, 2008 at 11:28:48AM +0100, Andi Kleen wrote: - set_cpus_allowed(current, tmp); + smp_mb(); + /* kick all the CPUs so that they exit out of pm_idle */ + smp_call_function(do_nothing, NULL, 0, 0); I think the last argument (wait) needs to be 1 to make sure it is synchronous (for 32/64) Otherwise the patch looks great. Yes. Below is the updated patch Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.25-rc/arch/x86/kernel/process_32.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_32.c +++ linux-2.6.25-rc/arch/x86/kernel/process_32.c @@ -82,7 +82,6 @@ unsigned long thread_saved_pc(struct tas */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); void disable_hlt(void) { @@ -181,9 +180,6 @@ void cpu_idle(void) while (!need_resched()) { void (*idle)(void); - if (__get_cpu_var(cpu_idle_state)) - __get_cpu_var(cpu_idle_state) = 0; - check_pgt_cache(); rmb(); idle = pm_idle; @@ -208,40 +204,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current-cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ - smp_call_function_mask(map, do_nothing, 0, 0); - } while (!cpus_empty(map)); - - set_cpus_allowed(current, tmp); + smp_mb(); + /* kick all the CPUs so that they exit out of pm_idle */ + smp_call_function(do_nothing, NULL, 0, 1); } EXPORT_SYMBOL_GPL(cpu_idle_wait); Index: linux-2.6.25-rc/arch/x86/kernel/process_64.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_64.c +++ linux-2.6.25-rc/arch/x86/kernel/process_64.c @@ -64,7 +64,6 @@ EXPORT_SYMBOL(boot_option_idle_override) */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); static ATOMIC_NOTIFIER_HEAD(idle_notifier); @@ -139,41 +138,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current-cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) - !per_cpu(cpu_idle_state, cpu)) -
[PATCH] x86: Simplify cpu_idle_wait
Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.25-rc/arch/x86/kernel/process_32.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_32.c +++ linux-2.6.25-rc/arch/x86/kernel/process_32.c @@ -82,7 +82,6 @@ unsigned long thread_saved_pc(struct tas */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); void disable_hlt(void) { @@ -181,9 +180,6 @@ void cpu_idle(void) while (!need_resched()) { void (*idle)(void); - if (__get_cpu_var(cpu_idle_state)) - __get_cpu_var(cpu_idle_state) = 0; - check_pgt_cache(); rmb(); idle = pm_idle; @@ -208,40 +204,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current->cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) && !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ - smp_call_function_mask(map, do_nothing, 0, 0); - } while (!cpus_empty(map)); - - set_cpus_allowed(current, tmp); + smp_mb(); + /* kick all the CPUs so that they exit out of pm_idle */ + smp_call_function(do_nothing, NULL, 0, 0); } EXPORT_SYMBOL_GPL(cpu_idle_wait); Index: linux-2.6.25-rc/arch/x86/kernel/process_64.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_64.c +++ linux-2.6.25-rc/arch/x86/kernel/process_64.c @@ -64,7 +64,6 @@ EXPORT_SYMBOL(boot_option_idle_override) */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); static ATOMIC_NOTIFIER_HEAD(idle_notifier); @@ -139,41 +138,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current->cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) && - !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ -
[PATCH] x86: Simplify cpu_idle_wait
Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.25-rc/arch/x86/kernel/process_32.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_32.c +++ linux-2.6.25-rc/arch/x86/kernel/process_32.c @@ -82,7 +82,6 @@ unsigned long thread_saved_pc(struct tas */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); void disable_hlt(void) { @@ -181,9 +180,6 @@ void cpu_idle(void) while (!need_resched()) { void (*idle)(void); - if (__get_cpu_var(cpu_idle_state)) - __get_cpu_var(cpu_idle_state) = 0; - check_pgt_cache(); rmb(); idle = pm_idle; @@ -208,40 +204,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current-cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ - smp_call_function_mask(map, do_nothing, 0, 0); - } while (!cpus_empty(map)); - - set_cpus_allowed(current, tmp); + smp_mb(); + /* kick all the CPUs so that they exit out of pm_idle */ + smp_call_function(do_nothing, NULL, 0, 0); } EXPORT_SYMBOL_GPL(cpu_idle_wait); Index: linux-2.6.25-rc/arch/x86/kernel/process_64.c === --- linux-2.6.25-rc.orig/arch/x86/kernel/process_64.c +++ linux-2.6.25-rc/arch/x86/kernel/process_64.c @@ -64,7 +64,6 @@ EXPORT_SYMBOL(boot_option_idle_override) */ void (*pm_idle)(void); EXPORT_SYMBOL(pm_idle); -static DEFINE_PER_CPU(unsigned int, cpu_idle_state); static ATOMIC_NOTIFIER_HEAD(idle_notifier); @@ -139,41 +138,19 @@ static void do_nothing(void *unused) { } +/* + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of + * pm_idle and update to new pm_idle value. Required while changing pm_idle + * handler on SMP systems. + * + * Caller must have changed pm_idle to the new value before the call. Old + * pm_idle value will not be used by any CPU after the return of this function. + */ void cpu_idle_wait(void) { - unsigned int cpu, this_cpu = get_cpu(); - cpumask_t map, tmp = current-cpus_allowed; - - set_cpus_allowed(current, cpumask_of_cpu(this_cpu)); - put_cpu(); - - cpus_clear(map); - for_each_online_cpu(cpu) { - per_cpu(cpu_idle_state, cpu) = 1; - cpu_set(cpu, map); - } - - __get_cpu_var(cpu_idle_state) = 0; - - wmb(); - do { - ssleep(1); - for_each_online_cpu(cpu) { - if (cpu_isset(cpu, map) - !per_cpu(cpu_idle_state, cpu)) - cpu_clear(cpu, map); - } - cpus_and(map, map, cpu_online_map); - /* -* We waited 1 sec, if a CPU still did not call idle -* it may be because it is in idle and not waking up -* because it has nothing to do. -* Give all the remaining CPUS a kick. -*/ -
Re: 2.6.24-rc8-mm1
On Thu, Jan 17, 2008 at 11:40:32AM -0800, Andrew Morton wrote: > On Thu, 17 Jan 2008 11:22:19 -0800 "Pallipadi, Venkatesh" <[EMAIL PROTECTED]> > wrote: > > > > > The problem is > > >> modprobe:2584 conflicting cache attribute 5000-50001000 > > >> uncached<->default > > > > Some address range here is being mapped with conflicting types. > > Somewhere the range was mapped with default (write-back). Later > > pci_iomap() is mapping that region as uncacheable which is basically > > aliasing. PAT code detects the aliasing and fails the second uncacheable > > request which leads in the failure. > > It sounds to me like you need considerably more runtime debugging and > reporting support in that code. Ensure that it generates enough output > both during regular operation and during failures for you to be able to > diagnose things in a single iteration. > > We can always take it out later. > > Patch below makes the interesting printks from PAT non DEBUG. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.git/arch/x86/mm/ioremap.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap.c2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap.c 2008-01-17 08:11:51.0 -0800 @@ -25,10 +25,13 @@ */ void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size) { - if (pat_wc_enabled) + if (pat_wc_enabled) { + printk(KERN_INFO "ioremap_wc: addr %lx, size %lx\n", + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_WC); - else + } else { return ioremap_nocache(phys_addr, size); + } } EXPORT_SYMBOL(ioremap_wc); Index: linux-2.6.git/arch/x86/mm/ioremap_32.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_32.c 2008-01-17 08:10:58.0 -0800 @@ -164,6 +164,8 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { + printk(KERN_INFO "ioremap_nocache: addr %lx, size %lx\n", + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-17 08:10:13.0 -0800 @@ -144,7 +144,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - printk(KERN_DEBUG "ioremap_nocache: addr %lx, size %lx\n", + printk(KERN_INFO "ioremap_nocache: addr %lx, size %lx\n", phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } Index: linux-2.6.git/arch/x86/mm/pat.c === --- linux-2.6.git.orig/arch/x86/mm/pat.c2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/pat.c 2008-01-17 08:06:23.0 -0800 @@ -170,7 +170,7 @@ if (!fattr && attr != ml->attr) { printk( - KERN_DEBUG "%s:%d conflicting cache attribute %Lx-%Lx %s<->%s\n", + KERN_WARNING "%s:%d conflicting cache attribute %Lx-%Lx %s<->%s\n", current->comm, current->pid, start, end, cattr_name(attr), cattr_name(ml->attr)); @@ -205,7 +205,7 @@ list_for_each_entry(ml, _list, nd) { if (ml->start == start && ml->end == end) { if (ml->attr != attr) - printk(KERN_DEBUG + printk(KERN_WARNING "%s:%d conflicting cache attributes on free %Lx-%Lx %s<->%s\n", current->comm, current->pid, start, end, cattr_name(attr), cattr_name(ml->attr)); @@ -217,7 +217,7 @@ } spin_unlock(_lock); if (err) - printk(KERN_DEBUG "%s:%d freeing invalid mattr %Lx-%Lx %s\n", + printk(KERN_WARNING "%s:%d freeing invalid mattr %Lx-%Lx %s\n", current->comm, current->pid, start, end, cattr_name(attr)); return err; Index: linux-2.6.git/include/asm-x86/io_32.h === --- linux-2.6.git.orig/include/asm-x86/io_32.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-x86/io_32.h 2008-01-17 08:09:30.0 -0800 @@ -113,6 +113,8 @@ static inline void __iomem * ioremap(unsigned long offset, unsigned long size) { + printk(KERN_INFO "ioremap: addr %lx, size %lx\n", + offset, size);
Re: [patch 0/4] x86: PAT followup - Incremental changes and bug fixes
On Thu, Jan 17, 2008 at 11:52:43PM +0100, Andreas Herrmann3 wrote: > On Thu, Jan 17, 2008 at 11:15:05PM +0100, Ingo Molnar wrote: > > > > * Andreas Herrmann3 <[EMAIL PROTECTED]> wrote: > > > > > On Thu, Jan 17, 2008 at 10:42:09PM +0100, Ingo Molnar wrote: > > > > > > > > * Siddha, Suresh B <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Thu, Jan 17, 2008 at 10:13:08PM +0100, Ingo Molnar wrote: > > > > > > but in general we must be robust enough in this case and just > > > > > > degrade > > > > > > any overlapping page to UC (and emit a warning perhaps) - instead > > > > > > of > > > > > > failing the ioremap and thus failing the driver (and the bootup). > > > > > > > > > > But then, this will cause an attribute conflicit. Old one was > > > > > specifying WB in PAT (ioremap with noflags) and the new ioremap > > > > > specifies UC. > > > > > > > > we could fix up all aliases of that page as well and degrade them to UC? > > > > > > Yes, we must fix all aliases or reject the conflicting mapping. But > > > fixing all aliases might not be that easy. (I've just seen a panic > > > when using your patch ;-( > > > > yes, indeed my patch is bad if you have PAT enabled: conflicting cache > > attributes might be present. I'll go with your patch for now. > > I think the best is to just reject conflicting mappings. (Because now > I am too tired to think about a safe way how to change the aliases to the > most restrictive memory type. ;-) > > But then of course such boot-time problems like I've seen on my test > machines should be avoided somehow. > > Below is another potential fix for the problem here. Going through ACPI ioremap usages, we found at one place the mapping is cached for possible optimization reason and not unmapped later. Patch below always unmaps ioremap at this place in ACPICA. Thanks, Venki Index: linux-2.6.git/drivers/acpi/executer/exregion.c === --- linux-2.6.git.orig/drivers/acpi/executer/exregion.c 2008-01-17 03:18:39.0 -0800 +++ linux-2.6.git/drivers/acpi/executer/exregion.c 2008-01-17 07:34:33.0 -0800 @@ -48,6 +48,8 @@ #define _COMPONENT ACPI_EXECUTER ACPI_MODULE_NAME("exregion") +static int ioremap_cache; + /*** * * FUNCTION:acpi_ex_system_memory_space_handler @@ -249,6 +251,13 @@ break; } + if (!ioremap_cache) { + acpi_os_unmap_memory(mem_info->mapped_logical_address, +window_size); + mem_info->mapped_logical_address = 0; + mem_info->mapped_physical_address = 0; + mem_info->mapped_length = 0; + } return_ACPI_STATUS(status); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm Patch] uml: fix a building error
On Thu, Jan 17, 2008 at 04:14:37PM -0500, Jeff Dike wrote: > On Thu, Jan 17, 2008 at 11:38:53AM -0800, Pallipadi, Venkatesh wrote: > > Apart from unxlate, there is also ioremap_wc which is defined in the > > same way. > > And while we're on the subject, what's the deal with these, in > include/asm-x86/io.h? > > #define ioremap_wc ioremap_wc > #define unxlate_dev_mem_ptr unxlate_dev_mem_ptr > If archs want to override the defaults for these two functions, they define the above and then include asm-generic/iomap.h. Archs which doesnt want to implement anything in these new funcs just have to include asm-generic/iomap.h which has the proper stubs. So, a patch like the below is what is required here for all archs to include asm-generic iomap.h (without the other patch that defines null unxlate in asm specific header). Totally untested. Thanks, Venki Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.git/include/asm-arm/io.h === --- linux-2.6.git.orig/include/asm-arm/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-arm/io.h 2008-01-17 06:39:13.0 -0800 @@ -27,6 +27,8 @@ #include #include +#include + /* * ISA I/O bus memory addresses are 1:1 with the physical address. */ Index: linux-2.6.git/include/asm-avr32/io.h === --- linux-2.6.git.orig/include/asm-avr32/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-avr32/io.h2008-01-17 06:39:13.0 -0800 @@ -10,6 +10,8 @@ #include +#include + /* virt_to_phys will only work when address is in P1 or P2 */ static __inline__ unsigned long virt_to_phys(volatile void *address) { Index: linux-2.6.git/include/asm-blackfin/io.h === --- linux-2.6.git.orig/include/asm-blackfin/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-blackfin/io.h 2008-01-17 06:39:13.0 -0800 @@ -8,6 +8,8 @@ #endif #include +#include + /* * These are for ISA/PCI shared memory _only_ and should never be used * on any other type of memory, including Zorro memory. They are meant to Index: linux-2.6.git/include/asm-cris/io.h === --- linux-2.6.git.orig/include/asm-cris/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-cris/io.h 2008-01-17 06:39:13.0 -0800 @@ -5,6 +5,8 @@ #include #include +#include + struct cris_io_operations { u32 (*read_mem)(void *addr, int size); Index: linux-2.6.git/include/asm-frv/io.h === --- linux-2.6.git.orig/include/asm-frv/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-frv/io.h 2008-01-17 06:39:13.0 -0800 @@ -23,6 +23,8 @@ #include #include +#include + /* * swap functions are sometimes needed to interface little-endian hardware */ Index: linux-2.6.git/include/asm-h8300/io.h === --- linux-2.6.git.orig/include/asm-h8300/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-h8300/io.h2008-01-17 06:39:13.0 -0800 @@ -13,6 +13,8 @@ #error UNKNOWN CPU TYPE #endif +#include + /* * These are for ISA/PCI shared memory _only_ and should never be used Index: linux-2.6.git/include/asm-m32r/io.h === --- linux-2.6.git.orig/include/asm-m32r/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-m32r/io.h 2008-01-17 06:39:13.0 -0800 @@ -5,6 +5,8 @@ #include #include /* __va */ +#include + #ifdef __KERNEL__ #define IO_SPACE_LIMIT 0x Index: linux-2.6.git/include/asm-m68knommu/io.h === --- linux-2.6.git.orig/include/asm-m68knommu/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-m68knommu/io.h2008-01-17 06:39:13.0 -0800 @@ -1,6 +1,8 @@ #ifndef _M68KNOMMU_IO_H #define _M68KNOMMU_IO_H +#include + #ifdef __KERNEL__ Index: linux-2.6.git/include/asm-ppc/io.h === --- linux-2.6.git.orig/include/asm-ppc/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-ppc/io.h 2008-01-17 06:39:13.0 -0800 @@ -10,6 +10,8 @@ #include #include +#include + #define SIO_CONFIG_RA 0x398 #define SIO_CONFIG_RD 0x399 Index: linux-2.6.git/include/asm-s390/io.h === --- linux-2.6.git.orig/include/asm-s390/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-s390/io.h 2008-01-17 06:39:13.0 -0800 @@ -15,6 +15,8 @@
Re: [-mm Patch] uml: fix a building error
On Thu, Jan 17, 2008 at 04:14:37PM -0500, Jeff Dike wrote: On Thu, Jan 17, 2008 at 11:38:53AM -0800, Pallipadi, Venkatesh wrote: Apart from unxlate, there is also ioremap_wc which is defined in the same way. And while we're on the subject, what's the deal with these, in include/asm-x86/io.h? #define ioremap_wc ioremap_wc #define unxlate_dev_mem_ptr unxlate_dev_mem_ptr If archs want to override the defaults for these two functions, they define the above and then include asm-generic/iomap.h. Archs which doesnt want to implement anything in these new funcs just have to include asm-generic/iomap.h which has the proper stubs. So, a patch like the below is what is required here for all archs to include asm-generic iomap.h (without the other patch that defines null unxlate in asm specific header). Totally untested. Thanks, Venki Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.git/include/asm-arm/io.h === --- linux-2.6.git.orig/include/asm-arm/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-arm/io.h 2008-01-17 06:39:13.0 -0800 @@ -27,6 +27,8 @@ #include asm/byteorder.h #include asm/memory.h +#include asm-generic/iomap.h + /* * ISA I/O bus memory addresses are 1:1 with the physical address. */ Index: linux-2.6.git/include/asm-avr32/io.h === --- linux-2.6.git.orig/include/asm-avr32/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-avr32/io.h2008-01-17 06:39:13.0 -0800 @@ -10,6 +10,8 @@ #include asm/arch/io.h +#include asm-generic/iomap.h + /* virt_to_phys will only work when address is in P1 or P2 */ static __inline__ unsigned long virt_to_phys(volatile void *address) { Index: linux-2.6.git/include/asm-blackfin/io.h === --- linux-2.6.git.orig/include/asm-blackfin/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-blackfin/io.h 2008-01-17 06:39:13.0 -0800 @@ -8,6 +8,8 @@ #endif #include linux/compiler.h +#include asm-generic/iomap.h + /* * These are for ISA/PCI shared memory _only_ and should never be used * on any other type of memory, including Zorro memory. They are meant to Index: linux-2.6.git/include/asm-cris/io.h === --- linux-2.6.git.orig/include/asm-cris/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-cris/io.h 2008-01-17 06:39:13.0 -0800 @@ -5,6 +5,8 @@ #include asm/arch/io.h #include linux/kernel.h +#include asm-generic/iomap.h + struct cris_io_operations { u32 (*read_mem)(void *addr, int size); Index: linux-2.6.git/include/asm-frv/io.h === --- linux-2.6.git.orig/include/asm-frv/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-frv/io.h 2008-01-17 06:39:13.0 -0800 @@ -23,6 +23,8 @@ #include asm/mb-regs.h #include linux/delay.h +#include asm-generic/iomap.h + /* * swap functions are sometimes needed to interface little-endian hardware */ Index: linux-2.6.git/include/asm-h8300/io.h === --- linux-2.6.git.orig/include/asm-h8300/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-h8300/io.h2008-01-17 06:39:13.0 -0800 @@ -13,6 +13,8 @@ #error UNKNOWN CPU TYPE #endif +#include asm-generic/iomap.h + /* * These are for ISA/PCI shared memory _only_ and should never be used Index: linux-2.6.git/include/asm-m32r/io.h === --- linux-2.6.git.orig/include/asm-m32r/io.h2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-m32r/io.h 2008-01-17 06:39:13.0 -0800 @@ -5,6 +5,8 @@ #include linux/compiler.h #include asm/page.h /* __va */ +#include asm-generic/iomap.h + #ifdef __KERNEL__ #define IO_SPACE_LIMIT 0x Index: linux-2.6.git/include/asm-m68knommu/io.h === --- linux-2.6.git.orig/include/asm-m68knommu/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-m68knommu/io.h2008-01-17 06:39:13.0 -0800 @@ -1,6 +1,8 @@ #ifndef _M68KNOMMU_IO_H #define _M68KNOMMU_IO_H +#include asm-generic/iomap.h + #ifdef __KERNEL__ Index: linux-2.6.git/include/asm-ppc/io.h === --- linux-2.6.git.orig/include/asm-ppc/io.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-ppc/io.h 2008-01-17 06:39:13.0 -0800 @@ -10,6 +10,8 @@ #include asm/synch.h #include asm/mmu.h +#include asm-generic/iomap.h + #define SIO_CONFIG_RA 0x398 #define
Re: [patch 0/4] x86: PAT followup - Incremental changes and bug fixes
On Thu, Jan 17, 2008 at 11:52:43PM +0100, Andreas Herrmann3 wrote: On Thu, Jan 17, 2008 at 11:15:05PM +0100, Ingo Molnar wrote: * Andreas Herrmann3 [EMAIL PROTECTED] wrote: On Thu, Jan 17, 2008 at 10:42:09PM +0100, Ingo Molnar wrote: * Siddha, Suresh B [EMAIL PROTECTED] wrote: On Thu, Jan 17, 2008 at 10:13:08PM +0100, Ingo Molnar wrote: but in general we must be robust enough in this case and just degrade any overlapping page to UC (and emit a warning perhaps) - instead of failing the ioremap and thus failing the driver (and the bootup). But then, this will cause an attribute conflicit. Old one was specifying WB in PAT (ioremap with noflags) and the new ioremap specifies UC. we could fix up all aliases of that page as well and degrade them to UC? Yes, we must fix all aliases or reject the conflicting mapping. But fixing all aliases might not be that easy. (I've just seen a panic when using your patch ;-( yes, indeed my patch is bad if you have PAT enabled: conflicting cache attributes might be present. I'll go with your patch for now. I think the best is to just reject conflicting mappings. (Because now I am too tired to think about a safe way how to change the aliases to the most restrictive memory type. ;-) But then of course such boot-time problems like I've seen on my test machines should be avoided somehow. Below is another potential fix for the problem here. Going through ACPI ioremap usages, we found at one place the mapping is cached for possible optimization reason and not unmapped later. Patch below always unmaps ioremap at this place in ACPICA. Thanks, Venki Index: linux-2.6.git/drivers/acpi/executer/exregion.c === --- linux-2.6.git.orig/drivers/acpi/executer/exregion.c 2008-01-17 03:18:39.0 -0800 +++ linux-2.6.git/drivers/acpi/executer/exregion.c 2008-01-17 07:34:33.0 -0800 @@ -48,6 +48,8 @@ #define _COMPONENT ACPI_EXECUTER ACPI_MODULE_NAME(exregion) +static int ioremap_cache; + /*** * * FUNCTION:acpi_ex_system_memory_space_handler @@ -249,6 +251,13 @@ break; } + if (!ioremap_cache) { + acpi_os_unmap_memory(mem_info-mapped_logical_address, +window_size); + mem_info-mapped_logical_address = 0; + mem_info-mapped_physical_address = 0; + mem_info-mapped_length = 0; + } return_ACPI_STATUS(status); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc8-mm1
On Thu, Jan 17, 2008 at 11:40:32AM -0800, Andrew Morton wrote: On Thu, 17 Jan 2008 11:22:19 -0800 Pallipadi, Venkatesh [EMAIL PROTECTED] wrote: The problem is modprobe:2584 conflicting cache attribute 5000-50001000 uncached-default Some address range here is being mapped with conflicting types. Somewhere the range was mapped with default (write-back). Later pci_iomap() is mapping that region as uncacheable which is basically aliasing. PAT code detects the aliasing and fails the second uncacheable request which leads in the failure. It sounds to me like you need considerably more runtime debugging and reporting support in that code. Ensure that it generates enough output both during regular operation and during failures for you to be able to diagnose things in a single iteration. We can always take it out later. Patch below makes the interesting printks from PAT non DEBUG. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.git/arch/x86/mm/ioremap.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap.c2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap.c 2008-01-17 08:11:51.0 -0800 @@ -25,10 +25,13 @@ */ void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size) { - if (pat_wc_enabled) + if (pat_wc_enabled) { + printk(KERN_INFO ioremap_wc: addr %lx, size %lx\n, + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_WC); - else + } else { return ioremap_nocache(phys_addr, size); + } } EXPORT_SYMBOL(ioremap_wc); Index: linux-2.6.git/arch/x86/mm/ioremap_32.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_32.c 2008-01-17 08:10:58.0 -0800 @@ -164,6 +164,8 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { + printk(KERN_INFO ioremap_nocache: addr %lx, size %lx\n, + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-17 08:10:13.0 -0800 @@ -144,7 +144,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - printk(KERN_DEBUG ioremap_nocache: addr %lx, size %lx\n, + printk(KERN_INFO ioremap_nocache: addr %lx, size %lx\n, phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } Index: linux-2.6.git/arch/x86/mm/pat.c === --- linux-2.6.git.orig/arch/x86/mm/pat.c2008-01-17 03:18:59.0 -0800 +++ linux-2.6.git/arch/x86/mm/pat.c 2008-01-17 08:06:23.0 -0800 @@ -170,7 +170,7 @@ if (!fattr attr != ml-attr) { printk( - KERN_DEBUG %s:%d conflicting cache attribute %Lx-%Lx %s-%s\n, + KERN_WARNING %s:%d conflicting cache attribute %Lx-%Lx %s-%s\n, current-comm, current-pid, start, end, cattr_name(attr), cattr_name(ml-attr)); @@ -205,7 +205,7 @@ list_for_each_entry(ml, mattr_list, nd) { if (ml-start == start ml-end == end) { if (ml-attr != attr) - printk(KERN_DEBUG + printk(KERN_WARNING %s:%d conflicting cache attributes on free %Lx-%Lx %s-%s\n, current-comm, current-pid, start, end, cattr_name(attr), cattr_name(ml-attr)); @@ -217,7 +217,7 @@ } spin_unlock(mattr_lock); if (err) - printk(KERN_DEBUG %s:%d freeing invalid mattr %Lx-%Lx %s\n, + printk(KERN_WARNING %s:%d freeing invalid mattr %Lx-%Lx %s\n, current-comm, current-pid, start, end, cattr_name(attr)); return err; Index: linux-2.6.git/include/asm-x86/io_32.h === --- linux-2.6.git.orig/include/asm-x86/io_32.h 2008-01-17 06:28:06.0 -0800 +++ linux-2.6.git/include/asm-x86/io_32.h 2008-01-17 08:09:30.0 -0800 @@ -113,6 +113,8 @@ static inline void __iomem * ioremap(unsigned long offset, unsigned long size) { + printk(KERN_INFO ioremap: addr %lx, size %lx\n, + offset, size); return __ioremap(offset, size, 0); } Index:
Re: [patch 2/4] x86: PAT followup - Remove KERNPG_TABLE from pte entry
On Wed, Jan 16, 2008 at 10:14:00AM +0200, Mika Penttilä wrote: > [EMAIL PROTECTED] kirjoitti: > >KERNPG_TABLE was a bug in earlier patch. Remove it from pte. > >pte_val() check is redundant as this routine is called immediately after a > >ptepage is allocated afresh. > > > >Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> > >Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> > > > >Index: linux-2.6.git/arch/x86/mm/init_64.c > >=== > >--- linux-2.6.git.orig/arch/x86/mm/init_64.c 2008-01-15 > >11:02:23.0 -0800 > >+++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-15 > >11:06:37.0 -0800 > >@@ -541,9 +541,6 @@ > > if (address >= end) > > break; > > > >-if (pte_val(*pte)) > >-continue; > >- > > /* Nothing to map. Map the null page */ > > if (!(address & (~PAGE_MASK)) && > > (address + PAGE_SIZE <= end) && > >@@ -561,9 +558,9 @@ > > } > > > > if (exec) > >-entry = _PAGE_NX|_KERNPG_TABLE|_PAGE_GLOBAL|address; > >+entry = _PAGE_NX|_PAGE_GLOBAL|address; > > else > >-entry = _KERNPG_TABLE|_PAGE_GLOBAL|address; > >+entry = _PAGE_GLOBAL|address; > > entry &= __supported_pte_mask; > > set_pte(pte, __pte(entry)); > > } > > > > > > Hmm then what's the point of mapping not present 4k pages for valid mem > here? > Ingo, Below incremental patch fixes this pte entry setting correctly. Thanks to Mika for catching this. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.git/arch/x86/mm/init_64.c === --- linux-2.6.git.orig/arch/x86/mm/init_64.c2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-16 03:51:34.0 -0800 @@ -515,9 +515,9 @@ } if (exec) - entry = _PAGE_NX|_PAGE_GLOBAL|address; + entry = __PAGE_KERNEL_EXEC | _PAGE_GLOBAL | address; else - entry = _PAGE_GLOBAL|address; + entry = __PAGE_KERNEL | _PAGE_GLOBAL | address; entry &= __supported_pte_mask; set_pte(pte, __pte(entry)); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/4] x86: PAT followup - Incremental changes and bug fixes
On Wed, Jan 16, 2008 at 07:57:48PM +0100, Andreas Herrmann wrote: > Hi, > > I just want to report that the PAT support in x86/mm causes crashes > on two of my test machines. On both boxes the SATA detection does > not work when the PAT support is patched into the kernel. > > Symptoms are as follows -- best described by a diff between the > two boot.logs: > > # diff boot-failing.log boot-working.log > > -Linux version 2.6.24-rc8-ga9f7faa5 ([EMAIL PROTECTED]) (gcc version ... > +Linux version 2.6.24-rc8-g2ea3cf43 ([EMAIL PROTECTED]) (gcc version ... > ... > early_iounmap(82a0b000, 1000) > -early_ioremap(c000, 1000) => -02103394304 > -early_iounmap(82a0c000, 1000) This does not look to be the problem here. We just mapped some new low address due to possibly a different code path. But, seems to have worked fine. > early_iounmap(82808000, 1000) > ... > -ACPI: PCI interrupt for device :00:12.0 disabled > -sata_sil: probe of :00:12.0 failed with error -12 > +scsi0 : sata_sil > +scsi1 : sata_sil > +ata1: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xc0403080 irq 22 > ... > -AC'97 space ioremap problem > -ACPI: PCI interrupt for device :00:14.5 disabled > -ATI IXP AC97 controller: probe of :00:14.5 failed with error -5 This ioremap failing seems to be the real problem. This can be due to new tracking of ioremaps introduced by PAT patches. We do not allow conflicting ioremaps to same region. Probably that is happening in both Sound and sata initialization which results in driver init failing. Can you please try the debug patch below over latest x86/mm and boot kernel with debug boot option and send us the dmesg from the failure. That will give us better info about ioremaps. Thanks, Venki Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-16 05:16:28.0 -0800 @@ -150,6 +150,8 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { + printk(KERN_DEBUG "ioremap_nocache: addr %lx, size %lx\n", + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/include/asm-x86/io_64.h === --- linux-2.6.git.orig/include/asm-x86/io_64.h 2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/include/asm-x86/io_64.h 2008-01-16 05:16:57.0 -0800 @@ -154,6 +154,8 @@ static inline void __iomem * ioremap (unsigned long offset, unsigned long size) { + printk(KERN_DEBUG "ioremap: addr %lx, size %lx\n", + offset, size); return __ioremap(offset, size, 0); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/4] x86: PAT followup - Incremental changes and bug fixes
On Wed, Jan 16, 2008 at 07:57:48PM +0100, Andreas Herrmann wrote: Hi, I just want to report that the PAT support in x86/mm causes crashes on two of my test machines. On both boxes the SATA detection does not work when the PAT support is patched into the kernel. Symptoms are as follows -- best described by a diff between the two boot.logs: # diff boot-failing.log boot-working.log -Linux version 2.6.24-rc8-ga9f7faa5 ([EMAIL PROTECTED]) (gcc version ... +Linux version 2.6.24-rc8-g2ea3cf43 ([EMAIL PROTECTED]) (gcc version ... ... early_iounmap(82a0b000, 1000) -early_ioremap(c000, 1000) = -02103394304 -early_iounmap(82a0c000, 1000) This does not look to be the problem here. We just mapped some new low address due to possibly a different code path. But, seems to have worked fine. early_iounmap(82808000, 1000) ... -ACPI: PCI interrupt for device :00:12.0 disabled -sata_sil: probe of :00:12.0 failed with error -12 +scsi0 : sata_sil +scsi1 : sata_sil +ata1: SATA max UDMA/100 mmio [EMAIL PROTECTED] tf 0xc0403080 irq 22 ... -AC'97 space ioremap problem -ACPI: PCI interrupt for device :00:14.5 disabled -ATI IXP AC97 controller: probe of :00:14.5 failed with error -5 This ioremap failing seems to be the real problem. This can be due to new tracking of ioremaps introduced by PAT patches. We do not allow conflicting ioremaps to same region. Probably that is happening in both Sound and sata initialization which results in driver init failing. Can you please try the debug patch below over latest x86/mm and boot kernel with debug boot option and send us the dmesg from the failure. That will give us better info about ioremaps. Thanks, Venki Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-16 05:16:28.0 -0800 @@ -150,6 +150,8 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { + printk(KERN_DEBUG ioremap_nocache: addr %lx, size %lx\n, + phys_addr, size); return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/include/asm-x86/io_64.h === --- linux-2.6.git.orig/include/asm-x86/io_64.h 2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/include/asm-x86/io_64.h 2008-01-16 05:16:57.0 -0800 @@ -154,6 +154,8 @@ static inline void __iomem * ioremap (unsigned long offset, unsigned long size) { + printk(KERN_DEBUG ioremap: addr %lx, size %lx\n, + offset, size); return __ioremap(offset, size, 0); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/4] x86: PAT followup - Remove KERNPG_TABLE from pte entry
On Wed, Jan 16, 2008 at 10:14:00AM +0200, Mika Penttilä wrote: [EMAIL PROTECTED] kirjoitti: KERNPG_TABLE was a bug in earlier patch. Remove it from pte. pte_val() check is redundant as this routine is called immediately after a ptepage is allocated afresh. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Signed-off-by: Suresh Siddha [EMAIL PROTECTED] Index: linux-2.6.git/arch/x86/mm/init_64.c === --- linux-2.6.git.orig/arch/x86/mm/init_64.c 2008-01-15 11:02:23.0 -0800 +++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-15 11:06:37.0 -0800 @@ -541,9 +541,6 @@ if (address = end) break; -if (pte_val(*pte)) -continue; - /* Nothing to map. Map the null page */ if (!(address (~PAGE_MASK)) (address + PAGE_SIZE = end) @@ -561,9 +558,9 @@ } if (exec) -entry = _PAGE_NX|_KERNPG_TABLE|_PAGE_GLOBAL|address; +entry = _PAGE_NX|_PAGE_GLOBAL|address; else -entry = _KERNPG_TABLE|_PAGE_GLOBAL|address; +entry = _PAGE_GLOBAL|address; entry = __supported_pte_mask; set_pte(pte, __pte(entry)); } Hmm then what's the point of mapping not present 4k pages for valid mem here? Ingo, Below incremental patch fixes this pte entry setting correctly. Thanks to Mika for catching this. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.git/arch/x86/mm/init_64.c === --- linux-2.6.git.orig/arch/x86/mm/init_64.c2008-01-16 03:38:32.0 -0800 +++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-16 03:51:34.0 -0800 @@ -515,9 +515,9 @@ } if (exec) - entry = _PAGE_NX|_PAGE_GLOBAL|address; + entry = __PAGE_KERNEL_EXEC | _PAGE_GLOBAL | address; else - entry = _PAGE_GLOBAL|address; + entry = __PAGE_KERNEL | _PAGE_GLOBAL | address; entry = __supported_pte_mask; set_pte(pte, __pte(entry)); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Folding _PAGE_PWT into _PAGE_PCD (was Re: unify pagetable accessors patch causes double fault II)
On Tue, Jan 15, 2008 at 09:16:50AM -0800, Jeremy Fitzhardinge wrote: > Ingo Molnar wrote: > >-#define _PAGE_PRESENT (_AC(1, UL)<<_PAGE_BIT_PRESENT) > >-#define _PAGE_RW(_AC(1, UL)<<_PAGE_BIT_RW) > >-#define _PAGE_USER (_AC(1, UL)<<_PAGE_BIT_USER) > >-#define _PAGE_PWT (_AC(1, UL)<<_PAGE_BIT_PWT) > >-#define _PAGE_PCD ((_AC(1, UL)<<_PAGE_BIT_PCD) | _PAGE_PWT) > > > > BTW, I just noticed that _PAGE_PWT has been folded into _PAGE_PCD. This > seems like a really bad idea to me, since it breaks the rule that > _PAGE_X == 1 << _PAGE_BIT_X. I can't think of a specific place where > this would cause problems, but this kind of non-uniformity always ends > up biting someone in the arse. > > I think having a specific _PAGE_NOCACHE which combines these bits is a > better approach. > >J How about the patch below. It defines new _PAGE_UC. One concern is drivers continuing to use _PAGE_PCD and getting wrong attributes. May be we need to rename _PAGE_PCD to catch those errors as well? Thanks, Venki Do not fold PCD and PWT bits in _PAGE_PCD. Instead, introduce a new _PAGE_UC which defines uncached mappings and use it in place of _PAGE_PCD. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.git/arch/x86/mm/ioremap_32.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_32.c 2008-01-15 04:42:59.0 -0800 @@ -173,7 +173,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_PCD); + return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-15 04:43:07.0 -0800 @@ -150,7 +150,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_PCD); + return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/pat.c === --- linux-2.6.git.orig/arch/x86/mm/pat.c2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/pat.c 2008-01-15 05:01:43.0 -0800 @@ -64,7 +64,7 @@ if (smp_processor_id() && !pat_wc_enabled) return; - /* Set PWT+PCD to Write-Combining. All other bits stay the same */ + /* Set PCD to Write-Combining. All other bits stay the same */ /* PTE encoding used in Linux: PAT |PCD @@ -72,7 +72,7 @@ ||| 000 WB default 010 WC _PAGE_WC - 011 UC _PAGE_PCD + 011 UC _PAGE_UC PAT bit unused */ pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) | PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC); @@ -97,7 +97,7 @@ { switch (flags & _PAGE_CACHE_MASK) { case _PAGE_WC: return "write combining"; - case _PAGE_PCD: return "uncached"; + case _PAGE_UC: return "uncached"; case 0: return "default"; default:return "broken"; } @@ -144,7 +144,7 @@ if (!fattr) return -EINVAL; else - *fattr = _PAGE_PCD; + *fattr = _PAGE_UC; } return 0; @@ -227,13 +227,13 @@ unsigned long flags; unsigned long want_flags = 0; if (file->f_flags & O_SYNC) - want_flags = _PAGE_PCD; + want_flags = _PAGE_UC; #ifdef CONFIG_X86_32 /* * On the PPro and successors, the MTRRs are used to set * memory types for physical addresses outside main memory, -* so blindly setting PCD or PWT on those pages is wrong. +* so blindly setting UC or PWT on those pages is wrong. * For Pentiums and earlier, the surround logic should disable * caching for the high addresses through the KEN pin, but * we maintain the tradition of paranoia in this code. @@ -244,7 +244,7 @@ test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) || test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability)) && offset >= __pa(high_memory)) - want_flags = _PAGE_PCD; + want_flags = _PAGE_UC; #endif /* ignore error because we can't handle it here */ Index: linux-2.6.git/arch/x86/pci/i386.c === ---
Re: Folding _PAGE_PWT into _PAGE_PCD (was Re: unify pagetable accessors patch causes double fault II)
On Tue, Jan 15, 2008 at 09:16:50AM -0800, Jeremy Fitzhardinge wrote: Ingo Molnar wrote: -#define _PAGE_PRESENT (_AC(1, UL)_PAGE_BIT_PRESENT) -#define _PAGE_RW(_AC(1, UL)_PAGE_BIT_RW) -#define _PAGE_USER (_AC(1, UL)_PAGE_BIT_USER) -#define _PAGE_PWT (_AC(1, UL)_PAGE_BIT_PWT) -#define _PAGE_PCD ((_AC(1, UL)_PAGE_BIT_PCD) | _PAGE_PWT) BTW, I just noticed that _PAGE_PWT has been folded into _PAGE_PCD. This seems like a really bad idea to me, since it breaks the rule that _PAGE_X == 1 _PAGE_BIT_X. I can't think of a specific place where this would cause problems, but this kind of non-uniformity always ends up biting someone in the arse. I think having a specific _PAGE_NOCACHE which combines these bits is a better approach. J How about the patch below. It defines new _PAGE_UC. One concern is drivers continuing to use _PAGE_PCD and getting wrong attributes. May be we need to rename _PAGE_PCD to catch those errors as well? Thanks, Venki Do not fold PCD and PWT bits in _PAGE_PCD. Instead, introduce a new _PAGE_UC which defines uncached mappings and use it in place of _PAGE_PCD. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.git/arch/x86/mm/ioremap_32.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_32.c 2008-01-15 04:42:59.0 -0800 @@ -173,7 +173,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_PCD); + return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/ioremap_64.c === --- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/ioremap_64.c 2008-01-15 04:43:07.0 -0800 @@ -150,7 +150,7 @@ void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_PCD); + return __ioremap(phys_addr, size, _PAGE_UC); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-2.6.git/arch/x86/mm/pat.c === --- linux-2.6.git.orig/arch/x86/mm/pat.c2008-01-15 03:29:38.0 -0800 +++ linux-2.6.git/arch/x86/mm/pat.c 2008-01-15 05:01:43.0 -0800 @@ -64,7 +64,7 @@ if (smp_processor_id() !pat_wc_enabled) return; - /* Set PWT+PCD to Write-Combining. All other bits stay the same */ + /* Set PCD to Write-Combining. All other bits stay the same */ /* PTE encoding used in Linux: PAT |PCD @@ -72,7 +72,7 @@ ||| 000 WB default 010 WC _PAGE_WC - 011 UC _PAGE_PCD + 011 UC _PAGE_UC PAT bit unused */ pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) | PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC); @@ -97,7 +97,7 @@ { switch (flags _PAGE_CACHE_MASK) { case _PAGE_WC: return write combining; - case _PAGE_PCD: return uncached; + case _PAGE_UC: return uncached; case 0: return default; default:return broken; } @@ -144,7 +144,7 @@ if (!fattr) return -EINVAL; else - *fattr = _PAGE_PCD; + *fattr = _PAGE_UC; } return 0; @@ -227,13 +227,13 @@ unsigned long flags; unsigned long want_flags = 0; if (file-f_flags O_SYNC) - want_flags = _PAGE_PCD; + want_flags = _PAGE_UC; #ifdef CONFIG_X86_32 /* * On the PPro and successors, the MTRRs are used to set * memory types for physical addresses outside main memory, -* so blindly setting PCD or PWT on those pages is wrong. +* so blindly setting UC or PWT on those pages is wrong. * For Pentiums and earlier, the surround logic should disable * caching for the high addresses through the KEN pin, but * we maintain the tradition of paranoia in this code. @@ -244,7 +244,7 @@ test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) || test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability)) offset = __pa(high_memory)) - want_flags = _PAGE_PCD; + want_flags = _PAGE_UC; #endif /* ignore error because we can't handle it here */ Index: linux-2.6.git/arch/x86/pci/i386.c === --- linux-2.6.git.orig/arch/x86/pci/i386.c 2008-01-15
Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree
Reintroduce run time configurable max_cstate for !CPU_IDLE case. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -76,7 +76,11 @@ static void (*pm_idle_save) (void) __rea #define PM_TIMER_TICKS_TO_US(p)(((p) * 1000)/(PM_TIMER_FREQUENCY/1000)) static unsigned int max_cstate __read_mostly = ACPI_PROCESSOR_MAX_POWER; +#ifdef CONFIG_CPU_IDLE module_param(max_cstate, uint, ); +#else +module_param(max_cstate, uint, 0644); +#endif static unsigned int nocst __read_mostly; module_param(nocst, uint, ); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + restore-missing-sysfs-max_cstate-attr.patch added to -mm tree
Reintroduce run time configurable max_cstate for !CPU_IDLE case. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -76,7 +76,11 @@ static void (*pm_idle_save) (void) __rea #define PM_TIMER_TICKS_TO_US(p)(((p) * 1000)/(PM_TIMER_FREQUENCY/1000)) static unsigned int max_cstate __read_mostly = ACPI_PROCESSOR_MAX_POWER; +#ifdef CONFIG_CPU_IDLE module_param(max_cstate, uint, ); +#else +module_param(max_cstate, uint, 0644); +#endif static unsigned int nocst __read_mostly; module_param(nocst, uint, ); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:32:55PM +0100, Ingo Molnar wrote: > > * Venki Pallipadi <[EMAIL PROTECTED]> wrote: > > > Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before > > entering C3. > > > > The performance impact of TLB flush on C3 should not be significant > > with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in > > hardware while in C3 anyways. > > > > On a 8 logical CPU system, running make -j2, the number of tlbflush > > IPIs goes down from 40 per second to ~ 0. Total number of interrupts > > during the run of this workload was ~1200 per second, which makes it > > ~3% savings in wakeups. > > > > There was no measurable performance or power impact however. > > thanks, applied to x86.git. Nice and elegant patch! > > Btw., since the TLB flush state machine is really subtle and fragile, > could you try to run the following mmap stresstest i wrote some time > ago: > >http://redhat.com/~mingo/threaded-mmap-stresstest/ > > for a couple of hours. It runs nr_cpus threads which then do a "random > crazy mix" of mappings/unmappings/remappings of a 800 MB memory window. > The more sockets/cores, the crazier the TLB races get ;-) > Ingo, I ran this stress test on two systems (8 cores and 2 cores) for over 4 hours without any issues. There was more than 20% C3 time during the run. So, this C3 tlbflush path must have been stressed well during the run. And sorry about the patch not working on UP config. That was a silly oversight on my part. Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 11:48:14AM -0800, H. Peter Anvin wrote: > Ingo Molnar wrote: > > > >i dont think it's required for C3 to even turn off any portion of the > >CPU - if an interrupt arrives after the C3 sequence is initiated but > >just before dirty cachelines have been flushed then the CPU can just > >return without touching anything (such as the TLB) - right? So i dont > >think there's any implicit guarantee of TLB flushing (nor should there > >be), but in practice, a good C3 sequence would (statistically) turn off > >large portions of the CPU and hence the TLB as well. > > > > I think C3 guarantees that the cache contents stay intact, and thus it > might make sense in some technology to preserve the TLB as well (being a > kind of cache.) > > Otherwise, what you say here of course is absolutely correct. > C3 does not guarantee all cache contents. Infact, atleast on Intel, L1 will be almost always flushed. Newer more power efficient CPUs does dynamic cache sizing [1] C3 just guarantees that the caches are coherent. That is, if they are intact, then DMA will keep cache consistent. Thanks, Venki [1] - http://download.intel.com/products/processor/core2duo/mobile_prod_brief.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:40:32PM +0100, Ingo Molnar wrote: > > * H. Peter Anvin <[EMAIL PROTECTED]> wrote: > > > Ingo Molnar wrote: > >> * Venki Pallipadi <[EMAIL PROTECTED]> wrote: > >> > >>> Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before > >>> entering C3. > >>> > >>> The performance impact of TLB flush on C3 should not be significant with > >>> respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware > >>> while in C3 anyways. > >>> > > > > Are there any CPUs around which *don't* flush the TLB across C3? (I > > guess it's not guaranteed by the spec, though, and as TLBs grow larger > > there might be incentive to keep them online.) > > i dont think it's required for C3 to even turn off any portion of the > CPU - if an interrupt arrives after the C3 sequence is initiated but > just before dirty cachelines have been flushed then the CPU can just > return without touching anything (such as the TLB) - right? So i dont > think there's any implicit guarantee of TLB flushing (nor should there > be), but in practice, a good C3 sequence would (statistically) turn off > large portions of the CPU and hence the TLB as well. > Yes. There are cases where hardware/BIOS can do C-state changes behind OS, with things like be in C1 for a while and then go to C2/C3 after a while etc. In such cases, there will be times when TLBs are not really flushed in hardware. But ideally, if C3 results in deep idle TLBs would be turned off. And in cases where we wake up earlier than expected, C-state policy should identify that and choose a lower C-state next time around. I also tried one variation of this, where in I only do flush if there are more than one CPU sharing the mm. But, that did not help the test case I was using (which is probably the worst case). What I would see is: Process runs on CPU x and mm is not shared Goes idle (C3) waiting on something Wakes up on CPU y which will now start sharing mm and would send flush IPI anyway Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Voluntary leave_mm before entering ACPI C3
Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.24-rc/arch/x86/kernel/smp_64.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_64.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_64.c @@ -70,7 +70,7 @@ static DEFINE_PER_CPU(union smp_flush_st * We cannot call mmdrop() because we are in interrupt context, * instead update mm->cpu_vm_mask. */ -static inline void leave_mm(int cpu) +void leave_mm(int cpu) { if (read_pda(mmu_state) == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/acpi_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_32.h +++ linux-2.6.24-rc/include/asm-x86/acpi_32.h @@ -31,6 +31,7 @@ #include #include /* defines cmpxchg */ +#include #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -138,6 +139,8 @@ static inline void disable_acpi(void) { #define ARCH_HAS_POWER_INIT1 +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/acpi_64.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_64.h +++ linux-2.6.24-rc/include/asm-x86/acpi_64.h @@ -30,6 +30,7 @@ #include #include +#include #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -148,6 +149,8 @@ static inline void acpi_fake_nodes(const } #endif +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -530,6 +530,7 @@ static void acpi_processor_idle(void) break; case ACPI_STATE_C3: + acpi_unlazy_tlb(smp_processor_id()); /* * disable bus master * bm_check implies we need ARB_DIS @@ -1485,6 +1486,7 @@ static int acpi_idle_enter_bm(struct cpu return 0; } + acpi_unlazy_tlb(smp_processor_id()); /* * Must be done before busmaster disable as we might need to * access HPET ! Index: linux-2.6.24-rc/include/asm-ia64/acpi.h === --- linux-2.6.24-rc.orig/include/asm-ia64/acpi.h +++ linux-2.6.24-rc/include/asm-ia64/acpi.h @@ -126,6 +126,8 @@ extern int __devinitdata pxm_to_nid_map[ extern int __initdata nid_to_pxm_map[MAX_NUMNODES]; #endif +#define acpi_unlazy_tlb(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/mmu.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu.h +++ linux-2.6.24-rc/include/asm-x86/mmu.h @@ -20,4 +20,6 @@ typedef struct { void *vdso; } mm_context_t; +void leave_mm(int cpu); + #endif /* _ASM_X86_MMU_H */ Index: linux-2.6.24-rc/arch/x86/kernel/smp_32.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_32.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_32.c @@ -256,7 +256,7 @@ static DEFINE_SPINLOCK(tlbstate_lock); * We need to reload %cr3 since the page tables may be going * away from under us.. */ -void leave_mm(unsigned long cpu) +void leave_mm(int cpu) { if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/mmu_context_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu_context_32.h +++ linux-2.6.24-rc/include/asm-x86/mmu_context_32.h @@ -32,8 +32,6 @@ static inline void enter_lazy_tlb(struct #endif } -void leave_mm(unsigned long cpu); - static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Voluntary leave_mm before entering ACPI C3
Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.24-rc/arch/x86/kernel/smp_64.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_64.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_64.c @@ -70,7 +70,7 @@ static DEFINE_PER_CPU(union smp_flush_st * We cannot call mmdrop() because we are in interrupt context, * instead update mm-cpu_vm_mask. */ -static inline void leave_mm(int cpu) +void leave_mm(int cpu) { if (read_pda(mmu_state) == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/acpi_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_32.h +++ linux-2.6.24-rc/include/asm-x86/acpi_32.h @@ -31,6 +31,7 @@ #include acpi/pdc_intel.h #include asm/system.h/* defines cmpxchg */ +#include asm/mmu.h #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -138,6 +139,8 @@ static inline void disable_acpi(void) { #define ARCH_HAS_POWER_INIT1 +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/acpi_64.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_64.h +++ linux-2.6.24-rc/include/asm-x86/acpi_64.h @@ -30,6 +30,7 @@ #include acpi/pdc_intel.h #include asm/numa.h +#include asm/mmu.h #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -148,6 +149,8 @@ static inline void acpi_fake_nodes(const } #endif +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -530,6 +530,7 @@ static void acpi_processor_idle(void) break; case ACPI_STATE_C3: + acpi_unlazy_tlb(smp_processor_id()); /* * disable bus master * bm_check implies we need ARB_DIS @@ -1485,6 +1486,7 @@ static int acpi_idle_enter_bm(struct cpu return 0; } + acpi_unlazy_tlb(smp_processor_id()); /* * Must be done before busmaster disable as we might need to * access HPET ! Index: linux-2.6.24-rc/include/asm-ia64/acpi.h === --- linux-2.6.24-rc.orig/include/asm-ia64/acpi.h +++ linux-2.6.24-rc/include/asm-ia64/acpi.h @@ -126,6 +126,8 @@ extern int __devinitdata pxm_to_nid_map[ extern int __initdata nid_to_pxm_map[MAX_NUMNODES]; #endif +#define acpi_unlazy_tlb(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/mmu.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu.h +++ linux-2.6.24-rc/include/asm-x86/mmu.h @@ -20,4 +20,6 @@ typedef struct { void *vdso; } mm_context_t; +void leave_mm(int cpu); + #endif /* _ASM_X86_MMU_H */ Index: linux-2.6.24-rc/arch/x86/kernel/smp_32.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_32.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_32.c @@ -256,7 +256,7 @@ static DEFINE_SPINLOCK(tlbstate_lock); * We need to reload %cr3 since the page tables may be going * away from under us.. */ -void leave_mm(unsigned long cpu) +void leave_mm(int cpu) { if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/mmu_context_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu_context_32.h +++ linux-2.6.24-rc/include/asm-x86/mmu_context_32.h @@ -32,8 +32,6 @@ static inline void enter_lazy_tlb(struct #endif } -void leave_mm(unsigned long cpu); - static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:40:32PM +0100, Ingo Molnar wrote: * H. Peter Anvin [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. Are there any CPUs around which *don't* flush the TLB across C3? (I guess it's not guaranteed by the spec, though, and as TLBs grow larger there might be incentive to keep them online.) i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. Yes. There are cases where hardware/BIOS can do C-state changes behind OS, with things like be in C1 for a while and then go to C2/C3 after a while etc. In such cases, there will be times when TLBs are not really flushed in hardware. But ideally, if C3 results in deep idle TLBs would be turned off. And in cases where we wake up earlier than expected, C-state policy should identify that and choose a lower C-state next time around. I also tried one variation of this, where in I only do flush if there are more than one CPU sharing the mm. But, that did not help the test case I was using (which is probably the worst case). What I would see is: Process runs on CPU x and mm is not shared Goes idle (C3) waiting on something Wakes up on CPU y which will now start sharing mm and would send flush IPI anyway Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 11:48:14AM -0800, H. Peter Anvin wrote: Ingo Molnar wrote: i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) Otherwise, what you say here of course is absolutely correct. C3 does not guarantee all cache contents. Infact, atleast on Intel, L1 will be almost always flushed. Newer more power efficient CPUs does dynamic cache sizing [1] C3 just guarantees that the caches are coherent. That is, if they are intact, then DMA will keep cache consistent. Thanks, Venki [1] - http://download.intel.com/products/processor/core2duo/mobile_prod_brief.pdf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:32:55PM +0100, Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. thanks, applied to x86.git. Nice and elegant patch! Btw., since the TLB flush state machine is really subtle and fragile, could you try to run the following mmap stresstest i wrote some time ago: http://redhat.com/~mingo/threaded-mmap-stresstest/ for a couple of hours. It runs nr_cpus threads which then do a random crazy mix of mappings/unmappings/remappings of a 800 MB memory window. The more sockets/cores, the crazier the TLB races get ;-) Ingo, I ran this stress test on two systems (8 cores and 2 cores) for over 4 hours without any issues. There was more than 20% C3 time during the run. So, this C3 tlbflush path must have been stressed well during the run. And sorry about the patch not working on UP config. That was a silly oversight on my part. Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Fri, Dec 14, 2007 at 01:42:12AM +0100, Andi Kleen wrote: > > +void __cpuinit pat_init(void) > > +{ > > + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ > > + if (cpu_has_pat) { > > All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag > now in their CPU init functions. It is fine to be aggressive there > because these old systems have lived so long without PAT they can do > so forever. So perhaps it's best to just white list it only for newer > CPUs on the Intel side at least. Yes. Enabling this only on relatively newer CPUs is safer. Will do that in next iteration of the patches. > Another problem is that there are some popular modules (ATI, Nvidia for once) > who reprogram the PAT registers on their own, likely different. Need some way > to detect > that case I guess, otherwise lots of users will see strange malfunctions. > Maybe recheck after module load? Yes. We can check that at load time. But they can still do bad things at runt ime, like say when 3D gets enabled etc?? > > + ||| > > + 000 WB default > > + 010 UC_MINUS _PAGE_PCD > > + 011 WC _PAGE_WC > > + PAT bit unused */ > > + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | > > + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); > > + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); > > + wrmsrl(MSR_IA32_CR_PAT, pat); > > + __flush_tlb_all(); > > + asm volatile("wbinvd"); > > Have you double checked this is the full procedure from the manual? iirc there > were some steps missing. Checking the manual for this. You are right, we had missed some steps here. Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when they are not really using it in their page tables. So, this will change the init and shutdown parts significantly and there may be some challenges with CPU offline and KEXEC. We will redo this part in next iteration. Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 02/12] PAT 64b: Basic PAT implementation
On Fri, Dec 14, 2007 at 01:42:12AM +0100, Andi Kleen wrote: +void __cpuinit pat_init(void) +{ + /* Set PWT+PCD to Write-Combining. All other bits stay the same */ + if (cpu_has_pat) { All the old CPUs (PPro etc.) with known PAT bugs need to clear this flag now in their CPU init functions. It is fine to be aggressive there because these old systems have lived so long without PAT they can do so forever. So perhaps it's best to just white list it only for newer CPUs on the Intel side at least. Yes. Enabling this only on relatively newer CPUs is safer. Will do that in next iteration of the patches. Another problem is that there are some popular modules (ATI, Nvidia for once) who reprogram the PAT registers on their own, likely different. Need some way to detect that case I guess, otherwise lots of users will see strange malfunctions. Maybe recheck after module load? Yes. We can check that at load time. But they can still do bad things at runt ime, like say when 3D gets enabled etc?? + ||| + 000 WB default + 010 UC_MINUS _PAGE_PCD + 011 WC _PAGE_WC + PAT bit unused */ + pat = PAT(0,WB) | PAT(1,WT) | PAT(2,UC_MINUS) | PAT(3,WC) | + PAT(4,WB) | PAT(5,WT) | PAT(6,UC_MINUS) | PAT(7,WC); + rdmsrl(MSR_IA32_CR_PAT, boot_pat_state); + wrmsrl(MSR_IA32_CR_PAT, pat); + __flush_tlb_all(); + asm volatile(wbinvd); Have you double checked this is the full procedure from the manual? iirc there were some steps missing. Checking the manual for this. You are right, we had missed some steps here. Actually, manual says on MP, PAT MSR on all CPUs must be consistent (even when they are not really using it in their page tables. So, this will change the init and shutdown parts significantly and there may be some challenges with CPU offline and KEXEC. We will redo this part in next iteration. Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 and 2.6.24.rc2 hangs while running udev on my laptop
On Fri, Nov 09, 2007 at 10:10:43AM -0800, Pallipadi, Venkatesh wrote: > > > >-Original Message- > >From: Andrew Morton [mailto:[EMAIL PROTECTED] > >Sent: Friday, November 09, 2007 2:03 AM > >To: SANGOI DINO LEONARDO > >Cc: linux-kernel@vger.kernel.org; Rafael J. Wysocki; Brown, > >Len; Pallipadi, Venkatesh; [EMAIL PROTECTED] > >Subject: Re: 2.6.24-rc1 and 2.6.24.rc2 hangs while running > >udev on my laptop > > > > > >(cc's added) > > > >On Fri, 9 Nov 2007 09:47:02 +0100 SANGOI DINO LEONARDO > ><[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> My laptop (an HP nx6125) doesn't boot with kernels 2.6.24-rc1 and > >> 2.6.24.rc2. > >> It works fine with 2.6.23 and older. > >> > >> I seen this bug first while running fedora rawhide, so you > >can find hardware > >> > >> info and boot logs at > >https://bugzilla.redhat.com/show_bug.cgi?id=312201. > >> > >> I did a git bisect, and got this: > >> > >> $ git bisect bad > >> 4f86d3a8e297205780cca027e974fd5f81064780 is first bad commit > >> commit 4f86d3a8e297205780cca027e974fd5f81064780 > >> Author: Len Brown <[EMAIL PROTECTED]> > >> Date: Wed Oct 3 18:58:00 2007 -0400 > >> > >> cpuidle: consolidate 2.6.22 cpuidle branch into one patch > >> [SNIP full commit log] > >> > > > > >> > >> Config is taken from Fedora kernel. CONFIG_CPU_IDLE is set > >to y (tell me if > >> full config is needed). > >> > >> If I use 'nolapic' parameter, kernel 2.6.24-rc1 boots fine. > >> Setting CONFIG_CPU_IDLE=n also gives me a working kernel. > >> > >> Ask me if more info is needed (please CC me). > >> > >> Thanks, > >> > >> Dino > > Dino, Can you try the patch below over rc2 and see whether it fixes the problem. Looking at the code, it should fix the problem. If it does not, can you send me the output of acpidump from your system. That will help to look further into this. You can get acpidump from latest pmtools package here. www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ Thanks, Venki Test patch for the bug report at https://bugzilla.redhat.com/show_bug.cgi?id=312201 Signed-off-by: Venki Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -1502,23 +1502,28 @@ static int acpi_idle_enter_bm(struct cpu } else { acpi_idle_update_bm_rld(pr, cx); - spin_lock(_lock); - c3_cpu_count++; - /* Disable bus master arbitration when all CPUs are in C3 */ - if (c3_cpu_count == num_online_cpus()) - acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1); - spin_unlock(_lock); + if (pr->flags.bm_check && pr->flags.bm_control) { + spin_lock(_lock); + c3_cpu_count++; + /* Disable bus master arbitration when all CPUs are in C3 */ + if (c3_cpu_count == num_online_cpus()) + acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1); + spin_unlock(_lock); + } else if (!pr->flags.bm_check) { + ACPI_FLUSH_CPU_CACHE(); + } t1 = inl(acpi_gbl_FADT.xpm_timer_block.address); acpi_idle_do_entry(cx); t2 = inl(acpi_gbl_FADT.xpm_timer_block.address); - spin_lock(_lock); /* Re-enable bus master arbitration */ - if (c3_cpu_count == num_online_cpus()) + if (pr->flags.bm_check && pr->flags.bm_control) { + spin_lock(_lock); acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0); - c3_cpu_count--; - spin_unlock(_lock); + c3_cpu_count--; + spin_unlock(_lock); + } } #if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 and 2.6.24.rc2 hangs while running udev on my laptop
On Fri, Nov 09, 2007 at 10:10:43AM -0800, Pallipadi, Venkatesh wrote: -Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 2:03 AM To: SANGOI DINO LEONARDO Cc: linux-kernel@vger.kernel.org; Rafael J. Wysocki; Brown, Len; Pallipadi, Venkatesh; [EMAIL PROTECTED] Subject: Re: 2.6.24-rc1 and 2.6.24.rc2 hangs while running udev on my laptop (cc's added) On Fri, 9 Nov 2007 09:47:02 +0100 SANGOI DINO LEONARDO [EMAIL PROTECTED] wrote: Hi, My laptop (an HP nx6125) doesn't boot with kernels 2.6.24-rc1 and 2.6.24.rc2. It works fine with 2.6.23 and older. I seen this bug first while running fedora rawhide, so you can find hardware info and boot logs at https://bugzilla.redhat.com/show_bug.cgi?id=312201. I did a git bisect, and got this: $ git bisect bad 4f86d3a8e297205780cca027e974fd5f81064780 is first bad commit commit 4f86d3a8e297205780cca027e974fd5f81064780 Author: Len Brown [EMAIL PROTECTED] Date: Wed Oct 3 18:58:00 2007 -0400 cpuidle: consolidate 2.6.22 cpuidle branch into one patch [SNIP full commit log] snip Config is taken from Fedora kernel. CONFIG_CPU_IDLE is set to y (tell me if full config is needed). If I use 'nolapic' parameter, kernel 2.6.24-rc1 boots fine. Setting CONFIG_CPU_IDLE=n also gives me a working kernel. Ask me if more info is needed (please CC me). Thanks, Dino Dino, Can you try the patch below over rc2 and see whether it fixes the problem. Looking at the code, it should fix the problem. If it does not, can you send me the output of acpidump from your system. That will help to look further into this. You can get acpidump from latest pmtools package here. www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ Thanks, Venki Test patch for the bug report at https://bugzilla.redhat.com/show_bug.cgi?id=312201 Signed-off-by: Venki Pallipadi [EMAIL PROTECTED] Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -1502,23 +1502,28 @@ static int acpi_idle_enter_bm(struct cpu } else { acpi_idle_update_bm_rld(pr, cx); - spin_lock(c3_lock); - c3_cpu_count++; - /* Disable bus master arbitration when all CPUs are in C3 */ - if (c3_cpu_count == num_online_cpus()) - acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1); - spin_unlock(c3_lock); + if (pr-flags.bm_check pr-flags.bm_control) { + spin_lock(c3_lock); + c3_cpu_count++; + /* Disable bus master arbitration when all CPUs are in C3 */ + if (c3_cpu_count == num_online_cpus()) + acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1); + spin_unlock(c3_lock); + } else if (!pr-flags.bm_check) { + ACPI_FLUSH_CPU_CACHE(); + } t1 = inl(acpi_gbl_FADT.xpm_timer_block.address); acpi_idle_do_entry(cx); t2 = inl(acpi_gbl_FADT.xpm_timer_block.address); - spin_lock(c3_lock); /* Re-enable bus master arbitration */ - if (c3_cpu_count == num_online_cpus()) + if (pr-flags.bm_check pr-flags.bm_control) { + spin_lock(c3_lock); acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0); - c3_cpu_count--; - spin_unlock(c3_lock); + c3_cpu_count--; + spin_unlock(c3_lock); + } } #if defined (CONFIG_GENERIC_TIME) defined (CONFIG_X86_TSC) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Track accurate idle time with tick_sched.idle_sleeptime
Current idle time in kstat is based on jiffies and is coarse grained. tick_sched.idle_sleeptime is making some attempt to keep track of idle time in a fine grained manner. But, it is not handling the time spent in interrupts fully. Make tick_sched.idle_sleeptime accurate with respect to time spent on handling interrupts and also add tick_sched.idle_lastupdate, which keeps track of last time when idle_sleeptime was updated. This statistics will be crucial for cpufreq-ondemand governor, which can shed some conservative gaurd band that is uses today while setting the frequency. The ondemand changes that uses the exact idle time is coming soon. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22/kernel/time/tick-sched.c === --- linux-2.6.22.orig/kernel/time/tick-sched.c +++ linux-2.6.22/kernel/time/tick-sched.c @@ -141,6 +141,43 @@ void tick_nohz_update_jiffies(void) local_irq_restore(flags); } +void tick_nohz_stop_idle(int cpu) +{ + struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); + + if (ts->idle_active) { + ktime_t now, delta; + now = ktime_get(); + delta = ktime_sub(now, ts->idle_entrytime); + ts->idle_lastupdate = now; + ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); + ts->idle_active = 0; + } +} + +static void tick_nohz_start_idle(int cpu) +{ + struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); + ktime_t now, delta; + + now = ktime_get(); + if (ts->idle_active) { + delta = ktime_sub(now, ts->idle_entrytime); + ts->idle_lastupdate = now; + ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); + } + ts->idle_entrytime = now; + ts->idle_active = 1; +} + +u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) +{ + struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); + + *last_update_time = ktime_to_us(ts->idle_lastupdate); + return ktime_to_us(ts->idle_sleeptime); +} + /** * tick_nohz_stop_sched_tick - stop the idle tick from the idle task * @@ -152,13 +189,15 @@ void tick_nohz_stop_sched_tick(void) { unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags; struct tick_sched *ts; - ktime_t last_update, expires, now, delta; + ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; local_irq_save(flags); cpu = smp_processor_id(); + tick_nohz_start_idle(cpu); + ts = _cpu(tick_cpu_sched, cpu); if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) @@ -178,19 +217,7 @@ void tick_nohz_stop_sched_tick(void) } } - now = ktime_get(); - /* -* When called from irq_exit we need to account the idle sleep time -* correctly. -*/ - if (ts->tick_stopped) { - delta = ktime_sub(now, ts->idle_entrytime); - ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); - } - - ts->idle_entrytime = now; ts->idle_calls++; - /* Read jiffies and the time when jiffies were updated last */ do { seq = read_seqbegin(_lock); @@ -320,23 +347,22 @@ void tick_nohz_restart_sched_tick(void) int cpu = smp_processor_id(); struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); unsigned long ticks; - ktime_t now, delta; + ktime_t now; + + local_irq_disable(); + tick_nohz_stop_idle(cpu); - if (!ts->tick_stopped) + if (!ts->tick_stopped) { + local_irq_enable(); return; + } /* Update jiffies first */ - now = ktime_get(); - - local_irq_disable(); select_nohz_load_balancer(0); + now = ktime_get(); tick_do_update_jiffies64(now); cpu_clear(cpu, nohz_cpu_mask); - /* Account the idle time */ - delta = ktime_sub(now, ts->idle_entrytime); - ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); - /* * We stopped the tick in idle. Update process times would miss the * time we slept as update_process_times does only a 1 tick Index: linux-2.6.22/include/linux/tick.h === --- linux-2.6.22.orig/include/linux/tick.h +++ linux-2.6.22/include/linux/tick.h @@ -51,8 +51,10 @@ struct tick_sched { unsigned long idle_jiffies; unsigned long idle_calls; unsigned long idle_sleeps; + int idle_active; ktime_t idle_entrytime; ktime_t idle_sleeptime; + ktime_t idle_lastupdate; ktime_t
[PATCH] Track accurate idle time with tick_sched.idle_sleeptime
Current idle time in kstat is based on jiffies and is coarse grained. tick_sched.idle_sleeptime is making some attempt to keep track of idle time in a fine grained manner. But, it is not handling the time spent in interrupts fully. Make tick_sched.idle_sleeptime accurate with respect to time spent on handling interrupts and also add tick_sched.idle_lastupdate, which keeps track of last time when idle_sleeptime was updated. This statistics will be crucial for cpufreq-ondemand governor, which can shed some conservative gaurd band that is uses today while setting the frequency. The ondemand changes that uses the exact idle time is coming soon. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22/kernel/time/tick-sched.c === --- linux-2.6.22.orig/kernel/time/tick-sched.c +++ linux-2.6.22/kernel/time/tick-sched.c @@ -141,6 +141,43 @@ void tick_nohz_update_jiffies(void) local_irq_restore(flags); } +void tick_nohz_stop_idle(int cpu) +{ + struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); + + if (ts-idle_active) { + ktime_t now, delta; + now = ktime_get(); + delta = ktime_sub(now, ts-idle_entrytime); + ts-idle_lastupdate = now; + ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, delta); + ts-idle_active = 0; + } +} + +static void tick_nohz_start_idle(int cpu) +{ + struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); + ktime_t now, delta; + + now = ktime_get(); + if (ts-idle_active) { + delta = ktime_sub(now, ts-idle_entrytime); + ts-idle_lastupdate = now; + ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, delta); + } + ts-idle_entrytime = now; + ts-idle_active = 1; +} + +u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) +{ + struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); + + *last_update_time = ktime_to_us(ts-idle_lastupdate); + return ktime_to_us(ts-idle_sleeptime); +} + /** * tick_nohz_stop_sched_tick - stop the idle tick from the idle task * @@ -152,13 +189,15 @@ void tick_nohz_stop_sched_tick(void) { unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags; struct tick_sched *ts; - ktime_t last_update, expires, now, delta; + ktime_t last_update, expires, now; struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev; int cpu; local_irq_save(flags); cpu = smp_processor_id(); + tick_nohz_start_idle(cpu); + ts = per_cpu(tick_cpu_sched, cpu); if (unlikely(ts-nohz_mode == NOHZ_MODE_INACTIVE)) @@ -178,19 +217,7 @@ void tick_nohz_stop_sched_tick(void) } } - now = ktime_get(); - /* -* When called from irq_exit we need to account the idle sleep time -* correctly. -*/ - if (ts-tick_stopped) { - delta = ktime_sub(now, ts-idle_entrytime); - ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, delta); - } - - ts-idle_entrytime = now; ts-idle_calls++; - /* Read jiffies and the time when jiffies were updated last */ do { seq = read_seqbegin(xtime_lock); @@ -320,23 +347,22 @@ void tick_nohz_restart_sched_tick(void) int cpu = smp_processor_id(); struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); unsigned long ticks; - ktime_t now, delta; + ktime_t now; + + local_irq_disable(); + tick_nohz_stop_idle(cpu); - if (!ts-tick_stopped) + if (!ts-tick_stopped) { + local_irq_enable(); return; + } /* Update jiffies first */ - now = ktime_get(); - - local_irq_disable(); select_nohz_load_balancer(0); + now = ktime_get(); tick_do_update_jiffies64(now); cpu_clear(cpu, nohz_cpu_mask); - /* Account the idle time */ - delta = ktime_sub(now, ts-idle_entrytime); - ts-idle_sleeptime = ktime_add(ts-idle_sleeptime, delta); - /* * We stopped the tick in idle. Update process times would miss the * time we slept as update_process_times does only a 1 tick Index: linux-2.6.22/include/linux/tick.h === --- linux-2.6.22.orig/include/linux/tick.h +++ linux-2.6.22/include/linux/tick.h @@ -51,8 +51,10 @@ struct tick_sched { unsigned long idle_jiffies; unsigned long idle_calls; unsigned long idle_sleeps; + int idle_active; ktime_t idle_entrytime; ktime_t idle_sleeptime; + ktime_t idle_lastupdate; ktime_t
Re: Cpu-Hotplug and Real-Time
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: > On 08/07, Gautham R Shenoy wrote: > > > > After some debugging, I saw that the hang occured because > > the high prio process was stuck in a loop doing yield() inside > > wait_task_inactive(). Description follows: > > > > Say a high-prio task (A) does a kthread_create(B), > > followed by a kthread_bind(B, cpu1). At this moment, > > only cpu0 is online. > > > > Now, immediately after being created, B would > > do a > > complete(>started) [kernel/kthread.c: kthread()], > > before scheduling itself out. > > > > This complete() will wake up kthreadd, which had spawned B. > > It is possible that during the wakeup, kthreadd might preempt B. > > Thus, B is still on the runqueue, and not yet called schedule(). > > > > kthreadd, will inturn do a > > complete(>done); [kernel/kthread.c: create_kthread()] > > which will wake up the thread which had called kthread_create(). > > In our case it's task A, which will run immediately, since its priority > > is higher. > > > > A will now call kthread_bind(B, cpu1). > > kthread_bind(), calls wait_task_inactive(B), to ensures that > > B has scheduled itself out. > > > > B is still on the runqueue, so A calls yield() in wait_task_inactive(). > > But since A is the task with the highest prio, scheduler schedules it > > back again. > > > > Thus B never gets to run to schedule itself out. > > A loops waiting for B to schedule out leading to system hang. > > As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just > an optimization, and easy to "fix": > > --- kernel/kthread.c 2007-07-28 16:58:17.0 +0400 > +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 > @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, > WARN_ON(1); > return; > } > - /* Must have done schedule() in kthread() before we set_task_cpu */ > - wait_task_inactive(k); > - set_task_cpu(k, cpu); > - k->cpus_allowed = cpumask_of_cpu(cpu); > + set_cpus_allowed(current, cpumask_of_cpu(cpu)); > } > EXPORT_SYMBOL(kthread_bind); > Not sure whether set_cpus_allowed() will work here. Looks like, it needs the CPU to be online during the call and in kthread_bind() case CPU may be offline. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cpu-Hotplug and Real-Time
On Tue, Aug 07, 2007 at 07:13:36PM +0400, Oleg Nesterov wrote: On 08/07, Gautham R Shenoy wrote: After some debugging, I saw that the hang occured because the high prio process was stuck in a loop doing yield() inside wait_task_inactive(). Description follows: Say a high-prio task (A) does a kthread_create(B), followed by a kthread_bind(B, cpu1). At this moment, only cpu0 is online. Now, immediately after being created, B would do a complete(create-started) [kernel/kthread.c: kthread()], before scheduling itself out. This complete() will wake up kthreadd, which had spawned B. It is possible that during the wakeup, kthreadd might preempt B. Thus, B is still on the runqueue, and not yet called schedule(). kthreadd, will inturn do a complete(create-done); [kernel/kthread.c: create_kthread()] which will wake up the thread which had called kthread_create(). In our case it's task A, which will run immediately, since its priority is higher. A will now call kthread_bind(B, cpu1). kthread_bind(), calls wait_task_inactive(B), to ensures that B has scheduled itself out. B is still on the runqueue, so A calls yield() in wait_task_inactive(). But since A is the task with the highest prio, scheduler schedules it back again. Thus B never gets to run to schedule itself out. A loops waiting for B to schedule out leading to system hang. As for kthread_bind(), I think wait_task_inactive+set_task_cpu is just an optimization, and easy to fix: --- kernel/kthread.c 2007-07-28 16:58:17.0 +0400 +++ /proc/self/fd/0 2007-08-07 18:56:54.248073547 +0400 @@ -166,10 +166,7 @@ void kthread_bind(struct task_struct *k, WARN_ON(1); return; } - /* Must have done schedule() in kthread() before we set_task_cpu */ - wait_task_inactive(k); - set_task_cpu(k, cpu); - k-cpus_allowed = cpumask_of_cpu(cpu); + set_cpus_allowed(current, cpumask_of_cpu(cpu)); } EXPORT_SYMBOL(kthread_bind); Not sure whether set_cpus_allowed() will work here. Looks like, it needs the CPU to be online during the call and in kthread_bind() case CPU may be offline. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Time Problems with 2.6.23-rc1-gf695baf2
On Tue, Jul 31, 2007 at 05:38:08PM +0200, Eric Sesterhenn / Snakebyte wrote: > * Pallipadi, Venkatesh ([EMAIL PROTECTED]) wrote: > > This means things should work fine with processor.max_cstate=2 boot > > option > > as well. Can you please double check that. > > yes, system boots fine with this kernel parameter > > > Also, please send in the acpidump from your system. > > here we go, if you need some parameters to acpidump, just say so. > Eric, Can you check the test patch below (over latest git) and let me know whether it resolves the issue. Thanks, Venki Enable C3 without bm control only for CST based C3. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6/drivers/acpi/processor_idle.c === --- linux-2.6.orig/drivers/acpi/processor_idle.c2007-07-31 04:29:26.0 -0700 +++ linux-2.6/drivers/acpi/processor_idle.c 2007-07-31 04:52:50.0 -0700 @@ -969,11 +969,17 @@ } if (pr->flags.bm_check) { - /* bus mastering control is necessary */ if (!pr->flags.bm_control) { - /* In this case we enter C3 without bus mastering */ - ACPI_DEBUG_PRINT((ACPI_DB_INFO, - "C3 support without bus mastering control\n")); + if (pr->flags.has_cst != 1) { + /* bus mastering control is necessary */ + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + "C3 support requires BM control\n")); + return; + } else { + /* Here we enter C3 without bus mastering */ + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + "C3 support without BM control\n")); + } } } else { /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Time Problems with 2.6.23-rc1-gf695baf2
On Tue, Jul 31, 2007 at 05:38:08PM +0200, Eric Sesterhenn / Snakebyte wrote: * Pallipadi, Venkatesh ([EMAIL PROTECTED]) wrote: This means things should work fine with processor.max_cstate=2 boot option as well. Can you please double check that. yes, system boots fine with this kernel parameter Also, please send in the acpidump from your system. here we go, if you need some parameters to acpidump, just say so. Eric, Can you check the test patch below (over latest git) and let me know whether it resolves the issue. Thanks, Venki Enable C3 without bm control only for CST based C3. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6/drivers/acpi/processor_idle.c === --- linux-2.6.orig/drivers/acpi/processor_idle.c2007-07-31 04:29:26.0 -0700 +++ linux-2.6/drivers/acpi/processor_idle.c 2007-07-31 04:52:50.0 -0700 @@ -969,11 +969,17 @@ } if (pr-flags.bm_check) { - /* bus mastering control is necessary */ if (!pr-flags.bm_control) { - /* In this case we enter C3 without bus mastering */ - ACPI_DEBUG_PRINT((ACPI_DB_INFO, - C3 support without bus mastering control\n)); + if (pr-flags.has_cst != 1) { + /* bus mastering control is necessary */ + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + C3 support requires BM control\n)); + return; + } else { + /* Here we enter C3 without bus mastering */ + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + C3 support without BM control\n)); + } } } else { /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] ICH Force HPET: Add ICH7_0 pciid to quirk list
Add another PCI ID for ICH7 force hpet. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.21/arch/i386/kernel/quirks.c === --- linux-2.6.21.orig/arch/i386/kernel/quirks.c +++ linux-2.6.21/arch/i386/kernel/quirks.c @@ -149,6 +149,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_0, + ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] ICH Force HPET: Late initialization of hpet after quirk
Enable HPET later during boot, after the force detect in PCI quirks. Also add a call to repeat the force enabling at resume time. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- arch/i386/kernel/hpet.c | 50 +++- include/asm-i386/hpet.h |1 2 files changed, 46 insertions(+), 5 deletions(-) Index: linux-2.6.22-rc5/include/asm-i386/hpet.h === --- linux-2.6.22-rc5.orig/include/asm-i386/hpet.h 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/include/asm-i386/hpet.h2007-06-17 08:52:10.0 +0200 @@ -64,6 +64,7 @@ /* hpet memory map physical address */ extern unsigned long hpet_address; +extern unsigned long force_hpet_address; extern int is_hpet_enabled(void); extern int hpet_enable(void); Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -25,6 +25,8 @@ extern struct clock_event_device *global */ unsigned long hpet_address; +static void __iomem * hpet_virt_address; + #ifdef CONFIG_X86_64 #include @@ -34,19 +36,22 @@ static inline void hpet_set_mapping(void { set_fixmap_nocache(FIX_HPET_BASE, hpet_address); __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE); + hpet_virt_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE); + } static inline void __iomem *hpet_get_virt_address(void) { - return (void __iomem *)fix_to_virt(FIX_HPET_BASE); + return hpet_virt_address; } -static inline void hpet_clear_mapping(void) { } +static inline void hpet_clear_mapping(void) +{ + hpet_virt_address = NULL; +} #else -static void __iomem * hpet_virt_address; - static inline unsigned long hpet_readl(unsigned long a) { return readl(hpet_virt_address + a); @@ -173,6 +178,7 @@ static struct clock_event_device hpet_cl .set_next_event = hpet_legacy_next_event, .shift = 32, .irq= 0, + .rating = 50, }; static void hpet_start_counter(void) @@ -187,6 +193,17 @@ static void hpet_start_counter(void) hpet_writel(cfg, HPET_CFG); } +static void hpet_resume_device(void) +{ + ich_force_hpet_resume(); +} + +static void hpet_restart_counter(void) +{ + hpet_resume_device(); + hpet_start_counter(); +} + static void hpet_enable_legacy_int(void) { unsigned long cfg = hpet_readl(HPET_CFG); @@ -308,7 +325,7 @@ static struct clocksource clocksource_hp .mask = HPET_MASK, .shift = HPET_SHIFT, .flags = CLOCK_SOURCE_IS_CONTINUOUS, - .resume = hpet_start_counter, + .resume = hpet_restart_counter, #ifdef CONFIG_X86_64 .vread = vread_hpet, #endif @@ -372,6 +389,9 @@ int __init hpet_enable(void) { unsigned long id; + if (hpet_get_virt_address()) + return 0; + if (!is_hpet_capable()) return 0; @@ -416,6 +436,26 @@ out_nohpet: } +static int __init hpet_late_init(void) +{ + if (boot_hpet_disable) + return -ENODEV; + + if (!hpet_address) { + if (!force_hpet_address) + return -ENODEV; + + hpet_address = force_hpet_address; + hpet_enable(); + if (!hpet_get_virt_address()) + return -ENODEV; + } + + return 0; +} +fs_initcall(hpet_late_init); + + #ifdef CONFIG_HPET_EMULATE_RTC /* HPET in LegacyReplacement Mode eats up RTC interrupt line. When, HPET - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] ICH Force HPET: ICH5 quirk to force detect enable
force_enable hpet for ICH5. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- arch/i386/kernel/hpet.c |2 arch/i386/kernel/quirks.c | 101 +- include/asm-i386/hpet.h |2 include/linux/pci_ids.h |1 4 files changed, 103 insertions(+), 3 deletions(-) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -54,9 +54,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN #if defined(CONFIG_HPET_TIMER) unsigned long force_hpet_address; +static enum { + NONE_FORCE_HPET_RESUME, + OLD_ICH_FORCE_HPET_RESUME, + ICH_FORCE_HPET_RESUME +} force_hpet_resume_type; + static void __iomem *rcba_base; -void ich_force_hpet_resume(void) +static void ich_force_hpet_resume(void) { u32 val; @@ -133,6 +139,7 @@ static void ich_force_enable_hpet(struct iounmap(rcba_base); printk(KERN_DEBUG "Failed to force enable HPET\n"); } else { + force_hpet_resume_type = ICH_FORCE_HPET_RESUME; printk(KERN_DEBUG "Force enabled HPET at base address 0x%lx\n", force_hpet_address); } @@ -148,4 +155,96 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH8_1, ich_force_enable_hpet); + + +static struct pci_dev *cached_dev; + +static void old_ich_force_hpet_resume(void) +{ + u32 val, gen_cntl; + + if (!force_hpet_address || !cached_dev) + return; + + pci_read_config_dword(cached_dev, 0xD0, _cntl); + gen_cntl &= (~(0x7 << 15)); + gen_cntl |= (0x4 << 15); + + pci_write_config_dword(cached_dev, 0xD0, gen_cntl); + pci_read_config_dword(cached_dev, 0xD0, _cntl); + val = gen_cntl >> 15; + val &= 0x7; + if (val == 0x4) + printk(KERN_DEBUG "Force enabled HPET at resume\n"); + else + BUG(); +} + +static void old_ich_force_enable_hpet(struct pci_dev *dev) +{ + u32 val, gen_cntl; + + if (hpet_address || force_hpet_address) + return; + + pci_read_config_dword(dev, 0xD0, _cntl); + /* +* Bit 17 is HPET enable bit. +* Bit 16:15 control the HPET base address. +*/ + val = gen_cntl >> 15; + val &= 0x7; + if (val & 0x4) { + val &= 0x3; + force_hpet_address = 0xFED0 | (val << 12); + printk(KERN_DEBUG "HPET at base address 0x%lx\n", + force_hpet_address); + cached_dev = dev; + return; + } + + /* +* HPET is disabled. Trying enabling at FED0 and check +* whether it sticks +*/ + gen_cntl &= (~(0x7 << 15)); + gen_cntl |= (0x4 << 15); + pci_write_config_dword(dev, 0xD0, gen_cntl); + + pci_read_config_dword(dev, 0xD0, _cntl); + + val = gen_cntl >> 15; + val &= 0x7; + if (val & 0x4) { + /* HPET is enabled in HPTC. Just not reported by BIOS */ + val &= 0x3; + force_hpet_address = 0xFED0 | (val << 12); + printk(KERN_DEBUG "Force enabled HPET at base address 0x%lx\n", + force_hpet_address); + force_hpet_resume_type = OLD_ICH_FORCE_HPET_RESUME; + return; + } + + printk(KERN_DEBUG "Failed to force enable HPET\n"); +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801EB_0, + old_ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801EB_12, + old_ich_force_enable_hpet); + +void force_hpet_resume(void) +{ + switch (force_hpet_resume_type) { + case ICH_FORCE_HPET_RESUME: + return ich_force_hpet_resume(); + + case OLD_ICH_FORCE_HPET_RESUME: + return old_ich_force_hpet_resume(); + + default: + break; + } +} + #endif Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -195,7 +195,7 @@ static void hpet_start_counter(void) static void hpet_resume_device(void) { - ich_force_hpet_resume(); + force_hpet_resume(); } static void hpet_restart_counter(void) Index: linux-2.6.22-rc5/include/asm-i386/hpet.h
[PATCH 6/7] ICH Force HPET: ICH5 fix a bug with suspend/resume
A bugfix in ich5 hpet force detect which caused resumes to fail. Thanks to Udo A Steinberg for reporting the problem. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- arch/i386/kernel/quirks.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -199,7 +199,6 @@ static void old_ich_force_enable_hpet(st force_hpet_address = 0xFED0 | (val << 12); printk(KERN_DEBUG "HPET at base address 0x%lx\n", force_hpet_address); - cached_dev = dev; return; } @@ -221,6 +220,7 @@ static void old_ich_force_enable_hpet(st force_hpet_address = 0xFED0 | (val << 12); printk(KERN_DEBUG "Force enabled HPET at base address 0x%lx\n", force_hpet_address); + cached_dev = dev; force_hpet_resume_type = OLD_ICH_FORCE_HPET_RESUME; return; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] ICH Force HPET: ICH7 or later quirk to force detect enable
Force detect and/or enable HPET on ICH chipsets. This patch just handles the detection part and following patches use this information. Adds a function to repeat the force enabling during resume time. Using HPET this way, instead of PIT increases the time CPUs can reside in C-state when system is totally idle. On my test system with Core 2 Duo, average C-state residency goes up from ~20mS to ~80mS. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- arch/i386/kernel/quirks.c | 101 ++ include/asm-i386/hpet.h |2 2 files changed, 103 insertions(+) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:51:58.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -4,6 +4,8 @@ #include #include +#include + #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) static void __devinit quirk_intel_irqbalance(struct pci_dev *dev) @@ -48,3 +50,102 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, quirk_intel_irqbalance); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7520_MCH, quirk_intel_irqbalance); #endif + +#if defined(CONFIG_HPET_TIMER) +unsigned long force_hpet_address; + +static void __iomem *rcba_base; + +void ich_force_hpet_resume(void) +{ + u32 val; + + if (!force_hpet_address) + return; + + if (rcba_base == NULL) + BUG(); + + /* read the Function Disable register, dword mode only */ + val = readl(rcba_base + 0x3404); + if (!(val & 0x80)) { + /* HPET disabled in HPTC. Trying to enable */ + writel(val | 0x80, rcba_base + 0x3404); + } + + val = readl(rcba_base + 0x3404); + if (!(val & 0x80)) + BUG(); + else + printk(KERN_DEBUG "Force enabled HPET at resume\n"); + + return; +} + +static void ich_force_enable_hpet(struct pci_dev *dev) +{ + u32 val, rcba; + int err = 0; + + if (hpet_address || force_hpet_address) + return; + + pci_read_config_dword(dev, 0xF0, ); + rcba &= 0xC000; + if (rcba == 0) { + printk(KERN_DEBUG "RCBA disabled. Cannot force enable HPET\n"); + return; + } + + /* use bits 31:14, 16 kB aligned */ + rcba_base = ioremap_nocache(rcba, 0x4000); + if (rcba_base == NULL) { + printk(KERN_DEBUG "ioremap failed. Cannot force enable HPET\n"); + return; + } + + /* read the Function Disable register, dword mode only */ + val = readl(rcba_base + 0x3404); + + if (val & 0x80) { + /* HPET is enabled in HPTC. Just not reported by BIOS */ + val = val & 0x3; + force_hpet_address = 0xFED0 | (val << 12); + printk(KERN_DEBUG "Force enabled HPET at base address 0x%lx\n", + force_hpet_address); + iounmap(rcba_base); + return; + } + + /* HPET disabled in HPTC. Trying to enable */ + writel(val | 0x80, rcba_base + 0x3404); + + val = readl(rcba_base + 0x3404); + if (!(val & 0x80)) { + err = 1; + } else { + val = val & 0x3; + force_hpet_address = 0xFED0 | (val << 12); + } + + if (err) { + force_hpet_address = 0; + iounmap(rcba_base); + printk(KERN_DEBUG "Failed to force enable HPET\n"); + } else { + printk(KERN_DEBUG "Force enabled HPET at base address 0x%lx\n", + force_hpet_address); + } +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ESB2_0, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH8_1, + ich_force_enable_hpet); +#endif Index: linux-2.6.22-rc5/include/asm-i386/hpet.h === --- linux-2.6.22-rc5.orig/include/asm-i386/hpet.h 2007-06-17 08:52:09.0 +0200 +++ linux-2.6.22-rc5/include/asm-i386/hpet.h2007-06-17 08:52:10.0 +0200 @@ -72,6 +72,8 @@ extern int hpet_enable(void); #include #endif +void ich_force_hpet_resume(void); + #ifdef CONFIG_HPET_EMULATE_RTC
[PATCH 2/7] ICH Force HPET: Restructure hpet generic clock code
Restructure and rename legacy replacement mode HPET timer support. Just the code structural changes and should be zero functionality change. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:09.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -158,9 +158,9 @@ static void hpet_reserve_platform_timers */ static unsigned long hpet_period; -static void hpet_set_mode(enum clock_event_mode mode, +static void hpet_legacy_set_mode(enum clock_event_mode mode, struct clock_event_device *evt); -static int hpet_next_event(unsigned long delta, +static int hpet_legacy_next_event(unsigned long delta, struct clock_event_device *evt); /* @@ -169,8 +169,8 @@ static int hpet_next_event(unsigned long static struct clock_event_device hpet_clockevent = { .name = "hpet", .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, - .set_mode = hpet_set_mode, - .set_next_event = hpet_next_event, + .set_mode = hpet_legacy_set_mode, + .set_next_event = hpet_legacy_next_event, .shift = 32, .irq= 0, }; @@ -187,7 +187,7 @@ static void hpet_start_counter(void) hpet_writel(cfg, HPET_CFG); } -static void hpet_enable_int(void) +static void hpet_enable_legacy_int(void) { unsigned long cfg = hpet_readl(HPET_CFG); @@ -196,7 +196,39 @@ static void hpet_enable_int(void) hpet_legacy_int_enabled = 1; } -static void hpet_set_mode(enum clock_event_mode mode, +static void hpet_legacy_clockevent_register(void) +{ + uint64_t hpet_freq; + + /* Start HPET legacy interrupts */ + hpet_enable_legacy_int(); + + /* +* The period is a femto seconds value. We need to calculate the +* scaled math multiplication factor for nanosecond to hpet tick +* conversion. +*/ + hpet_freq = 1000ULL; + do_div(hpet_freq, hpet_period); + hpet_clockevent.mult = div_sc((unsigned long) hpet_freq, + NSEC_PER_SEC, 32); + /* Calculate the min / max delta */ + hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFF, + _clockevent); + hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, + _clockevent); + + /* +* Start hpet with the boot cpu mask and make it +* global after the IO_APIC has been initialized. +*/ + hpet_clockevent.cpumask = cpumask_of_cpu(smp_processor_id()); + clockevents_register_device(_clockevent); + global_clock_event = _clockevent; + printk(KERN_DEBUG "hpet clockevent registered\n"); +} + +static void hpet_legacy_set_mode(enum clock_event_mode mode, struct clock_event_device *evt) { unsigned long cfg, cmp, now; @@ -237,12 +269,12 @@ static void hpet_set_mode(enum clock_eve break; case CLOCK_EVT_MODE_RESUME: - hpet_enable_int(); + hpet_enable_legacy_int(); break; } } -static int hpet_next_event(unsigned long delta, +static int hpet_legacy_next_event(unsigned long delta, struct clock_event_device *evt) { unsigned long cnt; @@ -282,58 +314,11 @@ static struct clocksource clocksource_hp #endif }; -/* - * Try to setup the HPET timer - */ -int __init hpet_enable(void) +static int hpet_clocksource_register(void) { - unsigned long id; - uint64_t hpet_freq; u64 tmp, start, now; cycle_t t1; - if (!is_hpet_capable()) - return 0; - - hpet_set_mapping(); - - /* -* Read the period and check for a sane value: -*/ - hpet_period = hpet_readl(HPET_PERIOD); - if (hpet_period < HPET_MIN_PERIOD || hpet_period > HPET_MAX_PERIOD) - goto out_nohpet; - - /* -* The period is a femto seconds value. We need to calculate the -* scaled math multiplication factor for nanosecond to hpet tick -* conversion. -*/ - hpet_freq = 1000ULL; - do_div(hpet_freq, hpet_period); - hpet_clockevent.mult = div_sc((unsigned long) hpet_freq, - NSEC_PER_SEC, 32); - /* Calculate the min / max delta */ - hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFF, - _clockevent); - hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, - _clockevent); - -
[PATCH 1/7] ICH Force HPET: Make generic time capable of switching broadcast timer
Auto-detect the presence of HPET on ICH5 or newer platforms and enable HPET for broadcast timer. This gives a bigger upperlimit for tickless time tick and improves the power consumption in comparison to PIT as broadcast timer. This patch: Change the broadcast timer, if a timer with higher rating becomes available. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- Applies over linux-2.6.22-rc4-mm2 + tglx's patch-2.6.22-rc4-mm2-hrt4 patch The patchset had been baking for a while along with patch-2.6.22-rc*-hrt* for a while without breaking anything and redusing the number of timer interrupts with tickless on various platforms. kernel/time/tick-broadcast.c | 13 ++--- kernel/time/tick-common.c|4 ++-- 2 files changed, 8 insertions(+), 9 deletions(-) Index: linux-2.6.22-rc5/kernel/time/tick-common.c === --- linux-2.6.22-rc5.orig/kernel/time/tick-common.c 2007-06-17 08:52:07.0 +0200 +++ linux-2.6.22-rc5/kernel/time/tick-common.c 2007-06-17 08:52:10.0 +0200 @@ -200,7 +200,7 @@ static int tick_check_new_device(struct cpu = smp_processor_id(); if (!cpu_isset(cpu, newdev->cpumask)) - goto out; + goto out_bc; td = _cpu(tick_cpu_device, cpu); curdev = td->evtdev; @@ -265,7 +265,7 @@ out_bc: */ if (tick_check_broadcast_device(newdev)) ret = NOTIFY_STOP; -out: + spin_unlock_irqrestore(_device_lock, flags); return ret; Index: linux-2.6.22-rc5/kernel/time/tick-broadcast.c === --- linux-2.6.22-rc5.orig/kernel/time/tick-broadcast.c 2007-06-17 08:52:07.0 +0200 +++ linux-2.6.22-rc5/kernel/time/tick-broadcast.c 2007-06-17 08:52:10.0 +0200 @@ -64,8 +64,9 @@ static void tick_broadcast_start_periodi */ int tick_check_broadcast_device(struct clock_event_device *dev) { - if (tick_broadcast_device.evtdev || - (dev->features & CLOCK_EVT_FEAT_C3STOP)) + if ((tick_broadcast_device.evtdev && +tick_broadcast_device.evtdev->rating >= dev->rating) || +(dev->features & CLOCK_EVT_FEAT_C3STOP)) return 0; clockevents_exchange_device(NULL, dev); @@ -519,11 +520,9 @@ static void tick_broadcast_clear_oneshot */ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) { - if (bc->mode != CLOCK_EVT_MODE_ONESHOT) { - bc->event_handler = tick_handle_oneshot_broadcast; - clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); - bc->next_event.tv64 = KTIME_MAX; - } + bc->event_handler = tick_handle_oneshot_broadcast; + clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); + bc->next_event.tv64 = KTIME_MAX; } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] ICH Force HPET: Make generic time capable of switching broadcast timer
Auto-detect the presence of HPET on ICH5 or newer platforms and enable HPET for broadcast timer. This gives a bigger upperlimit for tickless time tick and improves the power consumption in comparison to PIT as broadcast timer. This patch: Change the broadcast timer, if a timer with higher rating becomes available. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- Applies over linux-2.6.22-rc4-mm2 + tglx's patch-2.6.22-rc4-mm2-hrt4 patch The patchset had been baking for a while along with patch-2.6.22-rc*-hrt* for a while without breaking anything and redusing the number of timer interrupts with tickless on various platforms. kernel/time/tick-broadcast.c | 13 ++--- kernel/time/tick-common.c|4 ++-- 2 files changed, 8 insertions(+), 9 deletions(-) Index: linux-2.6.22-rc5/kernel/time/tick-common.c === --- linux-2.6.22-rc5.orig/kernel/time/tick-common.c 2007-06-17 08:52:07.0 +0200 +++ linux-2.6.22-rc5/kernel/time/tick-common.c 2007-06-17 08:52:10.0 +0200 @@ -200,7 +200,7 @@ static int tick_check_new_device(struct cpu = smp_processor_id(); if (!cpu_isset(cpu, newdev-cpumask)) - goto out; + goto out_bc; td = per_cpu(tick_cpu_device, cpu); curdev = td-evtdev; @@ -265,7 +265,7 @@ out_bc: */ if (tick_check_broadcast_device(newdev)) ret = NOTIFY_STOP; -out: + spin_unlock_irqrestore(tick_device_lock, flags); return ret; Index: linux-2.6.22-rc5/kernel/time/tick-broadcast.c === --- linux-2.6.22-rc5.orig/kernel/time/tick-broadcast.c 2007-06-17 08:52:07.0 +0200 +++ linux-2.6.22-rc5/kernel/time/tick-broadcast.c 2007-06-17 08:52:10.0 +0200 @@ -64,8 +64,9 @@ static void tick_broadcast_start_periodi */ int tick_check_broadcast_device(struct clock_event_device *dev) { - if (tick_broadcast_device.evtdev || - (dev-features CLOCK_EVT_FEAT_C3STOP)) + if ((tick_broadcast_device.evtdev +tick_broadcast_device.evtdev-rating = dev-rating) || +(dev-features CLOCK_EVT_FEAT_C3STOP)) return 0; clockevents_exchange_device(NULL, dev); @@ -519,11 +520,9 @@ static void tick_broadcast_clear_oneshot */ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) { - if (bc-mode != CLOCK_EVT_MODE_ONESHOT) { - bc-event_handler = tick_handle_oneshot_broadcast; - clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); - bc-next_event.tv64 = KTIME_MAX; - } + bc-event_handler = tick_handle_oneshot_broadcast; + clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); + bc-next_event.tv64 = KTIME_MAX; } /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/7] ICH Force HPET: Restructure hpet generic clock code
Restructure and rename legacy replacement mode HPET timer support. Just the code structural changes and should be zero functionality change. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:09.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -158,9 +158,9 @@ static void hpet_reserve_platform_timers */ static unsigned long hpet_period; -static void hpet_set_mode(enum clock_event_mode mode, +static void hpet_legacy_set_mode(enum clock_event_mode mode, struct clock_event_device *evt); -static int hpet_next_event(unsigned long delta, +static int hpet_legacy_next_event(unsigned long delta, struct clock_event_device *evt); /* @@ -169,8 +169,8 @@ static int hpet_next_event(unsigned long static struct clock_event_device hpet_clockevent = { .name = hpet, .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, - .set_mode = hpet_set_mode, - .set_next_event = hpet_next_event, + .set_mode = hpet_legacy_set_mode, + .set_next_event = hpet_legacy_next_event, .shift = 32, .irq= 0, }; @@ -187,7 +187,7 @@ static void hpet_start_counter(void) hpet_writel(cfg, HPET_CFG); } -static void hpet_enable_int(void) +static void hpet_enable_legacy_int(void) { unsigned long cfg = hpet_readl(HPET_CFG); @@ -196,7 +196,39 @@ static void hpet_enable_int(void) hpet_legacy_int_enabled = 1; } -static void hpet_set_mode(enum clock_event_mode mode, +static void hpet_legacy_clockevent_register(void) +{ + uint64_t hpet_freq; + + /* Start HPET legacy interrupts */ + hpet_enable_legacy_int(); + + /* +* The period is a femto seconds value. We need to calculate the +* scaled math multiplication factor for nanosecond to hpet tick +* conversion. +*/ + hpet_freq = 1000ULL; + do_div(hpet_freq, hpet_period); + hpet_clockevent.mult = div_sc((unsigned long) hpet_freq, + NSEC_PER_SEC, 32); + /* Calculate the min / max delta */ + hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFF, + hpet_clockevent); + hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, + hpet_clockevent); + + /* +* Start hpet with the boot cpu mask and make it +* global after the IO_APIC has been initialized. +*/ + hpet_clockevent.cpumask = cpumask_of_cpu(smp_processor_id()); + clockevents_register_device(hpet_clockevent); + global_clock_event = hpet_clockevent; + printk(KERN_DEBUG hpet clockevent registered\n); +} + +static void hpet_legacy_set_mode(enum clock_event_mode mode, struct clock_event_device *evt) { unsigned long cfg, cmp, now; @@ -237,12 +269,12 @@ static void hpet_set_mode(enum clock_eve break; case CLOCK_EVT_MODE_RESUME: - hpet_enable_int(); + hpet_enable_legacy_int(); break; } } -static int hpet_next_event(unsigned long delta, +static int hpet_legacy_next_event(unsigned long delta, struct clock_event_device *evt) { unsigned long cnt; @@ -282,58 +314,11 @@ static struct clocksource clocksource_hp #endif }; -/* - * Try to setup the HPET timer - */ -int __init hpet_enable(void) +static int hpet_clocksource_register(void) { - unsigned long id; - uint64_t hpet_freq; u64 tmp, start, now; cycle_t t1; - if (!is_hpet_capable()) - return 0; - - hpet_set_mapping(); - - /* -* Read the period and check for a sane value: -*/ - hpet_period = hpet_readl(HPET_PERIOD); - if (hpet_period HPET_MIN_PERIOD || hpet_period HPET_MAX_PERIOD) - goto out_nohpet; - - /* -* The period is a femto seconds value. We need to calculate the -* scaled math multiplication factor for nanosecond to hpet tick -* conversion. -*/ - hpet_freq = 1000ULL; - do_div(hpet_freq, hpet_period); - hpet_clockevent.mult = div_sc((unsigned long) hpet_freq, - NSEC_PER_SEC, 32); - /* Calculate the min / max delta */ - hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFF, - hpet_clockevent); - hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, -
[PATCH 3/7] ICH Force HPET: ICH7 or later quirk to force detect enable
Force detect and/or enable HPET on ICH chipsets. This patch just handles the detection part and following patches use this information. Adds a function to repeat the force enabling during resume time. Using HPET this way, instead of PIT increases the time CPUs can reside in C-state when system is totally idle. On my test system with Core 2 Duo, average C-state residency goes up from ~20mS to ~80mS. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- arch/i386/kernel/quirks.c | 101 ++ include/asm-i386/hpet.h |2 2 files changed, 103 insertions(+) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:51:58.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -4,6 +4,8 @@ #include linux/pci.h #include linux/irq.h +#include asm/hpet.h + #if defined(CONFIG_X86_IO_APIC) defined(CONFIG_SMP) defined(CONFIG_PCI) static void __devinit quirk_intel_irqbalance(struct pci_dev *dev) @@ -48,3 +50,102 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, quirk_intel_irqbalance); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7520_MCH, quirk_intel_irqbalance); #endif + +#if defined(CONFIG_HPET_TIMER) +unsigned long force_hpet_address; + +static void __iomem *rcba_base; + +void ich_force_hpet_resume(void) +{ + u32 val; + + if (!force_hpet_address) + return; + + if (rcba_base == NULL) + BUG(); + + /* read the Function Disable register, dword mode only */ + val = readl(rcba_base + 0x3404); + if (!(val 0x80)) { + /* HPET disabled in HPTC. Trying to enable */ + writel(val | 0x80, rcba_base + 0x3404); + } + + val = readl(rcba_base + 0x3404); + if (!(val 0x80)) + BUG(); + else + printk(KERN_DEBUG Force enabled HPET at resume\n); + + return; +} + +static void ich_force_enable_hpet(struct pci_dev *dev) +{ + u32 val, rcba; + int err = 0; + + if (hpet_address || force_hpet_address) + return; + + pci_read_config_dword(dev, 0xF0, rcba); + rcba = 0xC000; + if (rcba == 0) { + printk(KERN_DEBUG RCBA disabled. Cannot force enable HPET\n); + return; + } + + /* use bits 31:14, 16 kB aligned */ + rcba_base = ioremap_nocache(rcba, 0x4000); + if (rcba_base == NULL) { + printk(KERN_DEBUG ioremap failed. Cannot force enable HPET\n); + return; + } + + /* read the Function Disable register, dword mode only */ + val = readl(rcba_base + 0x3404); + + if (val 0x80) { + /* HPET is enabled in HPTC. Just not reported by BIOS */ + val = val 0x3; + force_hpet_address = 0xFED0 | (val 12); + printk(KERN_DEBUG Force enabled HPET at base address 0x%lx\n, + force_hpet_address); + iounmap(rcba_base); + return; + } + + /* HPET disabled in HPTC. Trying to enable */ + writel(val | 0x80, rcba_base + 0x3404); + + val = readl(rcba_base + 0x3404); + if (!(val 0x80)) { + err = 1; + } else { + val = val 0x3; + force_hpet_address = 0xFED0 | (val 12); + } + + if (err) { + force_hpet_address = 0; + iounmap(rcba_base); + printk(KERN_DEBUG Failed to force enable HPET\n); + } else { + printk(KERN_DEBUG Force enabled HPET at base address 0x%lx\n, + force_hpet_address); + } +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ESB2_0, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, + ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH8_1, + ich_force_enable_hpet); +#endif Index: linux-2.6.22-rc5/include/asm-i386/hpet.h === --- linux-2.6.22-rc5.orig/include/asm-i386/hpet.h 2007-06-17 08:52:09.0 +0200 +++ linux-2.6.22-rc5/include/asm-i386/hpet.h2007-06-17 08:52:10.0 +0200 @@ -72,6 +72,8 @@ extern int hpet_enable(void); #include asm/vsyscall.h #endif +void ich_force_hpet_resume(void); + #ifdef
[PATCH 4/7] ICH Force HPET: Late initialization of hpet after quirk
Enable HPET later during boot, after the force detect in PCI quirks. Also add a call to repeat the force enabling at resume time. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- arch/i386/kernel/hpet.c | 50 +++- include/asm-i386/hpet.h |1 2 files changed, 46 insertions(+), 5 deletions(-) Index: linux-2.6.22-rc5/include/asm-i386/hpet.h === --- linux-2.6.22-rc5.orig/include/asm-i386/hpet.h 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/include/asm-i386/hpet.h2007-06-17 08:52:10.0 +0200 @@ -64,6 +64,7 @@ /* hpet memory map physical address */ extern unsigned long hpet_address; +extern unsigned long force_hpet_address; extern int is_hpet_enabled(void); extern int hpet_enable(void); Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -25,6 +25,8 @@ extern struct clock_event_device *global */ unsigned long hpet_address; +static void __iomem * hpet_virt_address; + #ifdef CONFIG_X86_64 #include asm/pgtable.h @@ -34,19 +36,22 @@ static inline void hpet_set_mapping(void { set_fixmap_nocache(FIX_HPET_BASE, hpet_address); __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE); + hpet_virt_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE); + } static inline void __iomem *hpet_get_virt_address(void) { - return (void __iomem *)fix_to_virt(FIX_HPET_BASE); + return hpet_virt_address; } -static inline void hpet_clear_mapping(void) { } +static inline void hpet_clear_mapping(void) +{ + hpet_virt_address = NULL; +} #else -static void __iomem * hpet_virt_address; - static inline unsigned long hpet_readl(unsigned long a) { return readl(hpet_virt_address + a); @@ -173,6 +178,7 @@ static struct clock_event_device hpet_cl .set_next_event = hpet_legacy_next_event, .shift = 32, .irq= 0, + .rating = 50, }; static void hpet_start_counter(void) @@ -187,6 +193,17 @@ static void hpet_start_counter(void) hpet_writel(cfg, HPET_CFG); } +static void hpet_resume_device(void) +{ + ich_force_hpet_resume(); +} + +static void hpet_restart_counter(void) +{ + hpet_resume_device(); + hpet_start_counter(); +} + static void hpet_enable_legacy_int(void) { unsigned long cfg = hpet_readl(HPET_CFG); @@ -308,7 +325,7 @@ static struct clocksource clocksource_hp .mask = HPET_MASK, .shift = HPET_SHIFT, .flags = CLOCK_SOURCE_IS_CONTINUOUS, - .resume = hpet_start_counter, + .resume = hpet_restart_counter, #ifdef CONFIG_X86_64 .vread = vread_hpet, #endif @@ -372,6 +389,9 @@ int __init hpet_enable(void) { unsigned long id; + if (hpet_get_virt_address()) + return 0; + if (!is_hpet_capable()) return 0; @@ -416,6 +436,26 @@ out_nohpet: } +static int __init hpet_late_init(void) +{ + if (boot_hpet_disable) + return -ENODEV; + + if (!hpet_address) { + if (!force_hpet_address) + return -ENODEV; + + hpet_address = force_hpet_address; + hpet_enable(); + if (!hpet_get_virt_address()) + return -ENODEV; + } + + return 0; +} +fs_initcall(hpet_late_init); + + #ifdef CONFIG_HPET_EMULATE_RTC /* HPET in LegacyReplacement Mode eats up RTC interrupt line. When, HPET - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] ICH Force HPET: ICH5 quirk to force detect enable
force_enable hpet for ICH5. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- arch/i386/kernel/hpet.c |2 arch/i386/kernel/quirks.c | 101 +- include/asm-i386/hpet.h |2 include/linux/pci_ids.h |1 4 files changed, 103 insertions(+), 3 deletions(-) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -54,9 +54,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN #if defined(CONFIG_HPET_TIMER) unsigned long force_hpet_address; +static enum { + NONE_FORCE_HPET_RESUME, + OLD_ICH_FORCE_HPET_RESUME, + ICH_FORCE_HPET_RESUME +} force_hpet_resume_type; + static void __iomem *rcba_base; -void ich_force_hpet_resume(void) +static void ich_force_hpet_resume(void) { u32 val; @@ -133,6 +139,7 @@ static void ich_force_enable_hpet(struct iounmap(rcba_base); printk(KERN_DEBUG Failed to force enable HPET\n); } else { + force_hpet_resume_type = ICH_FORCE_HPET_RESUME; printk(KERN_DEBUG Force enabled HPET at base address 0x%lx\n, force_hpet_address); } @@ -148,4 +155,96 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH8_1, ich_force_enable_hpet); + + +static struct pci_dev *cached_dev; + +static void old_ich_force_hpet_resume(void) +{ + u32 val, gen_cntl; + + if (!force_hpet_address || !cached_dev) + return; + + pci_read_config_dword(cached_dev, 0xD0, gen_cntl); + gen_cntl = (~(0x7 15)); + gen_cntl |= (0x4 15); + + pci_write_config_dword(cached_dev, 0xD0, gen_cntl); + pci_read_config_dword(cached_dev, 0xD0, gen_cntl); + val = gen_cntl 15; + val = 0x7; + if (val == 0x4) + printk(KERN_DEBUG Force enabled HPET at resume\n); + else + BUG(); +} + +static void old_ich_force_enable_hpet(struct pci_dev *dev) +{ + u32 val, gen_cntl; + + if (hpet_address || force_hpet_address) + return; + + pci_read_config_dword(dev, 0xD0, gen_cntl); + /* +* Bit 17 is HPET enable bit. +* Bit 16:15 control the HPET base address. +*/ + val = gen_cntl 15; + val = 0x7; + if (val 0x4) { + val = 0x3; + force_hpet_address = 0xFED0 | (val 12); + printk(KERN_DEBUG HPET at base address 0x%lx\n, + force_hpet_address); + cached_dev = dev; + return; + } + + /* +* HPET is disabled. Trying enabling at FED0 and check +* whether it sticks +*/ + gen_cntl = (~(0x7 15)); + gen_cntl |= (0x4 15); + pci_write_config_dword(dev, 0xD0, gen_cntl); + + pci_read_config_dword(dev, 0xD0, gen_cntl); + + val = gen_cntl 15; + val = 0x7; + if (val 0x4) { + /* HPET is enabled in HPTC. Just not reported by BIOS */ + val = 0x3; + force_hpet_address = 0xFED0 | (val 12); + printk(KERN_DEBUG Force enabled HPET at base address 0x%lx\n, + force_hpet_address); + force_hpet_resume_type = OLD_ICH_FORCE_HPET_RESUME; + return; + } + + printk(KERN_DEBUG Failed to force enable HPET\n); +} + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801EB_0, + old_ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801EB_12, + old_ich_force_enable_hpet); + +void force_hpet_resume(void) +{ + switch (force_hpet_resume_type) { + case ICH_FORCE_HPET_RESUME: + return ich_force_hpet_resume(); + + case OLD_ICH_FORCE_HPET_RESUME: + return old_ich_force_hpet_resume(); + + default: + break; + } +} + #endif Index: linux-2.6.22-rc5/arch/i386/kernel/hpet.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/hpet.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/hpet.c2007-06-17 08:52:10.0 +0200 @@ -195,7 +195,7 @@ static void hpet_start_counter(void) static void hpet_resume_device(void) { - ich_force_hpet_resume(); + force_hpet_resume(); } static void hpet_restart_counter(void) Index: linux-2.6.22-rc5/include/asm-i386/hpet.h === ---
[PATCH 6/7] ICH Force HPET: ICH5 fix a bug with suspend/resume
A bugfix in ich5 hpet force detect which caused resumes to fail. Thanks to Udo A Steinberg for reporting the problem. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] --- arch/i386/kernel/quirks.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.22-rc5/arch/i386/kernel/quirks.c === --- linux-2.6.22-rc5.orig/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 +++ linux-2.6.22-rc5/arch/i386/kernel/quirks.c 2007-06-17 08:52:10.0 +0200 @@ -199,7 +199,6 @@ static void old_ich_force_enable_hpet(st force_hpet_address = 0xFED0 | (val 12); printk(KERN_DEBUG HPET at base address 0x%lx\n, force_hpet_address); - cached_dev = dev; return; } @@ -221,6 +220,7 @@ static void old_ich_force_enable_hpet(st force_hpet_address = 0xFED0 | (val 12); printk(KERN_DEBUG Force enabled HPET at base address 0x%lx\n, force_hpet_address); + cached_dev = dev; force_hpet_resume_type = OLD_ICH_FORCE_HPET_RESUME; return; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] ICH Force HPET: Add ICH7_0 pciid to quirk list
Add another PCI ID for ICH7 force hpet. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.21/arch/i386/kernel/quirks.c === --- linux-2.6.21.orig/arch/i386/kernel/quirks.c +++ linux-2.6.21/arch/i386/kernel/quirks.c @@ -149,6 +149,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_0, + ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] cpuidle: first round of documentation updates
Documentation changes based on Pavel's feedback. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/Documentation/cpuidle/sysfs.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/sysfs.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/sysfs.txt 2007-06-06 11:35:37.0 -0700 @@ -4,14 +4,22 @@ cpuidle sysfs -System global cpuidle information are under +System global cpuidle related information and tunables are under /sys/devices/system/cpu/cpuidle The current interfaces in this directory has self-explanatory names: +* current_driver_ro +* current_governor_ro + +With cpuidle_sysfs_switch boot option (meant for developer testing) +following objects are visible instead. * available_drivers * available_governors * current_driver * current_governor +In this case user can switch the driver, governor at run time by writing +onto current_driver and current_governor. + Per logical CPU specific cpuidle information are under /sys/devices/system/cpu/cpuX/cpuidle @@ -19,9 +27,9 @@ Under this percpu directory, there is a directory for each idle state supported by the driver, which in turn has -* latency -* power -* time -* usage +* latency : Latency to exit out of this idle state (in microseconds) +* power : Power consumed while in this idle state (in milliwatts) +* time : Total time spent in this idle state (in microseconds) +* usage : Number of times this state was entered (count) Index: linux-2.6.22-rc-mm/Documentation/cpuidle/governor.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/governor.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/governor.txt 2007-06-06 11:33:34.0 -0700 @@ -11,12 +11,16 @@ cpuidle governor is policy routine that decides what idle state to enter at any given time. cpuidle core uses different callbacks to governor while handling idle entry. -* select_state callback where governor can determine next idle state to enter -* prepare_idle callback is called before entering an idle state -* scan callback is called after a driver forces redetection of the states +* select_state() callback where governor can determine next idle state to enter +* prepare_idle() callback is called before entering an idle state +* scan() callback is called after a driver forces redetection of the states More than one governor can be registered at the same time and -user can switch between drivers using /sysfs interface. +user can switch between drivers using /sysfs interface (when supported). + +More than one governor part is supported for developers to easily experiment +with different governors. By default, most optimal governor based on your +kernel configuration and platform will be selected by cpuidle. Interfaces: int cpuidle_register_governor(struct cpuidle_governor *gov); Index: linux-2.6.22-rc-mm/Documentation/cpuidle/core.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/core.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/core.txt 2007-06-06 11:33:34.0 -0700 @@ -12,6 +12,6 @@ standardized infrastructure to support independent development of governors and drivers. -cpuidle resides under /drivers/cpuidle. +cpuidle resides under drivers/cpuidle. Index: linux-2.6.22-rc-mm/Documentation/cpuidle/driver.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/driver.txt2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/driver.txt 2007-06-06 11:33:34.0 -0700 @@ -7,16 +7,21 @@ -cpuidle driver supports capability detection for a particular system. The -init and exit routines will be called for each online CPU, with a percpu -cpuidle_driver object and driver should fill in cpuidle_states inside -cpuidle_driver depending on the CPU capability. +cpuidle driver hooks into the cpuidle infrastructure and does the +architecture/platform dependent part of CPU idle states. Driver +provides the platform idle state detection capability and also +has mechanisms in place to support actusl entry-exit into a CPU idle state. + +cpuidle driver supports capability detection for a platform using the +init and exit routines. They will be called for each online CPU, with a +percpu cpuidle_driver object and driver should fill in cpuidle_states +inside cpuidle_driver depending on the CPU capability. Driver can handle dynamic state changes (like battery<->AC), by calling force_redetect interface. It is possible to have more than one driver registered at the same time and -user can switch between drivers using /sysfs interface. +user can switch between
[PATCH 7/8] cpuidle: add rating to the governors and pick the one with highest rating by default
Introduce a governor rating scheme to pick the right governor by default. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/include/linux/cpuidle.h === --- linux-2.6.22-rc-mm.orig/include/linux/cpuidle.h 2007-06-05 17:00:09.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/cpuidle.h 2007-06-05 17:01:08.0 -0700 @@ -159,6 +159,7 @@ struct cpuidle_governor { charname[CPUIDLE_NAME_LEN]; struct list_headgovernor_list; + unsigned intrating; int (*init)(struct cpuidle_device *dev); void (*exit)(struct cpuidle_device *dev); Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-05 15:46:34.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 17:04:32.0 -0700 @@ -153,6 +153,7 @@ struct cpuidle_governor menu_governor = { .name = "menu", + .rating = 20, .scan = menu_scan_device, .select = menu_select, .reflect = menu_reflect, Index: linux-2.6.22-rc-mm/drivers/cpuidle/governor.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governor.c 2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governor.c 2007-06-05 17:15:05.0 -0700 @@ -131,7 +131,8 @@ if (__cpuidle_find_governor(gov->name) == NULL) { ret = 0; list_add_tail(>governor_list, _governors); - if (!cpuidle_curr_governor) + if (!cpuidle_curr_governor || + cpuidle_curr_governor->rating < gov->rating) cpuidle_switch_governor(gov); } mutex_unlock(_lock); @@ -142,6 +143,29 @@ EXPORT_SYMBOL_GPL(cpuidle_register_governor); /** + * cpuidle_replace_governor - find a replacement governor + * @exclude_rating: the rating that will be skipped while looking for + * new governor. + */ +struct cpuidle_governor *cpuidle_replace_governor(int exclude_rating) +{ + struct cpuidle_governor *gov; + struct cpuidle_governor *ret_gov = NULL; + unsigned int max_rating = 0; + + list_for_each_entry(gov, _governors, governor_list) { + if (gov->rating == exclude_rating) + continue; + if (gov->rating > max_rating) { + max_rating = gov->rating; + ret_gov = gov; + } + } + + return ret_gov; +} + +/** * cpuidle_unregister_governor - unregisters a governor * @gov: the governor */ @@ -151,8 +175,11 @@ return; mutex_lock(_lock); - if (gov == cpuidle_curr_governor) - cpuidle_switch_governor(NULL); + if (gov == cpuidle_curr_governor) { + struct cpuidle_governor *new_gov; + new_gov = cpuidle_replace_governor(gov->rating); + cpuidle_switch_governor(new_gov); + } list_del(>governor_list); mutex_unlock(_lock); } Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/ladder.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/ladder.c 2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/ladder.c 2007-06-05 17:03:37.0 -0700 @@ -199,6 +199,7 @@ static struct cpuidle_governor ladder_governor = { .name = "ladder", + .rating = 10, .init = ladder_init_device, .exit = ladder_exit_device, .scan = ladder_scan_device, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/8] cpuidle: make cpuidle sysfs driver/governor switch off by default
Make default cpuidle sysfs to show current_governor and current_driver in read-only mode. More elaborate available_governors and available_drivers with writeable current_governor and current_driver interface only appear with "cpuidle_sysfs_switch" boot parameter. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/cpuidle/cpuidle.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/cpuidle.c 2007-06-05 17:52:32.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/cpuidle.c2007-06-06 10:57:41.0 -0700 @@ -25,7 +25,6 @@ LIST_HEAD(cpuidle_detected_devices); static void (*pm_idle_old)(void); - /** * cpuidle_idle_call - the main idle loop * Index: linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/sysfs.c 2007-06-05 17:52:56.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c 2007-06-06 11:29:50.0 -0700 @@ -13,6 +13,14 @@ #include "cpuidle.h" +static unsigned int sysfs_switch; +static int __init cpuidle_sysfs_setup(char *unused) +{ + sysfs_switch = 1; + return 1; +} +__setup("cpuidle_sysfs_switch", cpuidle_sysfs_setup); + static ssize_t show_available_drivers(struct sys_device *dev, char *buf) { ssize_t i = 0; @@ -127,6 +135,15 @@ return count; } +static SYSDEV_ATTR(current_driver_ro, 0444, show_current_driver, NULL); +static SYSDEV_ATTR(current_governor_ro, 0444, show_current_governor, NULL); + +static struct attribute *cpuclass_default_attrs[] = { + _current_driver_ro.attr, + _current_governor_ro.attr, + NULL +}; + static SYSDEV_ATTR(available_drivers, 0444, show_available_drivers, NULL); static SYSDEV_ATTR(available_governors, 0444, show_available_governors, NULL); static SYSDEV_ATTR(current_driver, 0644, show_current_driver, @@ -134,7 +151,7 @@ static SYSDEV_ATTR(current_governor, 0644, show_current_governor, store_current_governor); -static struct attribute *cpuclass_default_attrs[] = { +static struct attribute *cpuclass_switch_attrs[] = { _available_drivers.attr, _available_governors.attr, _current_driver.attr, @@ -152,6 +169,9 @@ */ int cpuidle_add_class_sysfs(struct sysdev_class *cls) { + if (sysfs_switch) + cpuclass_attr_group.attrs = cpuclass_switch_attrs; + return sysfs_create_group(>kset.kobj, _attr_group); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/8] cpuidle: menu governor change the early break condition
Change the C-state early break out algorithm in menu governor. We only look at early breakouts that result in wakeups shorter than idle state's target_residency. If such a breakout is frequent enough, eliminate the particular idle state upto a timeout period. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-05 09:39:27.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 15:46:34.0 -0700 @@ -14,19 +14,20 @@ #include #include -#define BM_HOLDOFF 2 /* 20 ms */ +#define BM_HOLDOFF 2 /* 20 ms */ +#define DEMOTION_THRESHOLD 5 +#define DEMOTION_TIMEOUT_MULTIPLIER1000 struct menu_device { int last_state_idx; - int deepest_bm_state; - int break_last_us; - int break_elapsed_us; + int deepest_break_state; + struct timespec break_expire_time_ts; + int break_last_cnt; + int deepest_bm_state; int bm_elapsed_us; int bm_holdoff_us; - - unsigned long idle_jiffies; }; static DEFINE_PER_CPU(struct menu_device, menu_devices); @@ -45,7 +46,6 @@ /* determine the expected residency time */ expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000; - expected_us = min(expected_us, data->break_last_us); /* determine the maximum state compatible with current BM status */ if (cpuidle_get_bm_activity()) @@ -53,17 +53,33 @@ if (data->bm_elapsed_us <= data->bm_holdoff_us) max_state = data->deepest_bm_state + 1; + /* determine the maximum state compatible with recent idle breaks */ + if (data->deepest_break_state >= 0) { + struct timespec now; + ktime_get_ts(); + if (timespec_compare(>break_expire_time_ts, ) > 0) { + max_state = min(max_state, + data->deepest_break_state + 1); + } else { + data->deepest_break_state = -1; + } + } + /* find the deepest idle state that satisfies our constraints */ for (i = 1; i < max_state; i++) { struct cpuidle_state *s = >states[i]; + if (s->target_residency > expected_us) break; + if (s->exit_latency > system_latency_constraint()) break; } + if (data->last_state_idx != i - 1) + data->break_last_cnt = 0; + data->last_state_idx = i - 1; - data->idle_jiffies = tick_nohz_get_idle_jiffies(); return i - 1; } @@ -91,14 +107,27 @@ measured_us = USEC_PER_SEC / HZ; data->bm_elapsed_us += measured_us; - data->break_elapsed_us += measured_us; + + if (data->last_state_idx == 0) + return; /* -* Did something other than the timer interrupt cause the break event? +* Did something other than the timer interrupt +* cause an early break event? */ - if (tick_nohz_get_idle_jiffies() == data->idle_jiffies) { - data->break_last_us = data->break_elapsed_us; - data->break_elapsed_us = 0; + if (unlikely(measured_us < target->target_residency)) { + if (data->break_last_cnt > DEMOTION_THRESHOLD) { + data->deepest_break_state = data->last_state_idx - 1; + ktime_get_ts(>break_expire_time_ts); + timespec_add_ns(>break_expire_time_ts, + target->target_residency * + DEMOTION_TIMEOUT_MULTIPLIER); + } else { + data->break_last_cnt++; + } + } else { + if (data->break_last_cnt > 0) + data->break_last_cnt--; } } @@ -112,10 +141,9 @@ int i; data->last_state_idx = 0; - data->break_last_us = 0; - data->break_elapsed_us = 0; data->bm_elapsed_us = 0; data->bm_holdoff_us = BM_HOLDOFF; + data->deepest_break_state = -1; for (i = 1; i < dev->state_count; i++) if (dev->states[i].flags & CPUIDLE_FLAG_CHECK_BM) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] cpuidle: fis the uninitialized variable in sysfs routine
Fix the uninitialized usage of ret. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/sysfs.c 2007-06-04 15:44:17.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c 2007-06-04 15:46:49.0 -0700 @@ -301,7 +301,7 @@ */ int cpuidle_add_driver_sysfs(struct cpuidle_device *device) { - int i, ret; + int i, ret = -ENOMEM; struct cpuidle_state_kobj *kobj; /* state statistics */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] cpuidle: reenable /proc/acpi/ power interface for the time being
Keep /proc/acpi/processor/CPU*/power around for a while as powertop depends on it. It will be marked deprecated and removed in future. powertop can use cpuidle interfaces instead. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/acpi/processor_idle.c === --- linux-2.6.22-rc-mm.orig/drivers/acpi/processor_idle.c 2007-06-01 16:17:40.0 -0700 +++ linux-2.6.22-rc-mm/drivers/acpi/processor_idle.c2007-06-01 17:20:57.0 -0700 @@ -792,7 +792,7 @@ * @t1: the start time * @t2: the end time */ -static inline u32 ticks_elapsed(u32 t1, u32 t2) +static inline u32 ticks_elapsed_in_us(u32 t1, u32 t2) { if (t2 >= t1) return PM_TIMER_TICKS_TO_US(t2 - t1); @@ -802,6 +802,16 @@ return PM_TIMER_TICKS_TO_US((0x - t1) + t2); } +static inline u32 ticks_elapsed(u32 t1, u32 t2) +{ + if (t2 >= t1) + return (t2 - t1); + else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER)) + return (((0x00FF - t1) + t2) & 0x00FF); + else + return ((0x - t1) + t2); +} + /** * acpi_idle_update_bm_rld - updates the BM_RLD bit depending on target state * @pr: the processor @@ -925,7 +935,8 @@ cx->usage++; acpi_state_timer_broadcast(pr, cx, 0); - return ticks_elapsed(t1, t2); + cx->time += ticks_elapsed(t1, t2); + return ticks_elapsed_in_us(t1, t2); } static int c3_cpu_count; @@ -1009,7 +1020,8 @@ cx->usage++; acpi_state_timer_broadcast(pr, cx, 0); - return ticks_elapsed(t1, t2); + cx->time += ticks_elapsed(t1, t2); + return ticks_elapsed_in_us(t1, t2); } /** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] cpuidle: menu governor and hrtimer compile fix
Compile fix for menu governor. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 17:52:33.0 -0700 @@ -11,8 +11,8 @@ #include #include #include -#include #include +#include #define BM_HOLDOFF 2 /* 20 ms */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] cpuidle: acpi_set_cstate_limit compile fix
Len, Following are a bunch of small changes to cpuidle trying to prepare it for mainline. Some of the changes are just the compile timer errors/warnings and you probably already have them in acpi-test. Should apply cleanly to latest acpi-test. Please include in acpi-test. Thanks, Venki This patch: cpuidle compile fix related to acpi_set_cstate_limit(). Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/acpi/osl.c === --- linux-2.6.22-rc-mm.orig/drivers/acpi/osl.c 2007-06-01 16:17:40.0 -0700 +++ linux-2.6.22-rc-mm/drivers/acpi/osl.c 2007-06-01 16:21:43.0 -0700 @@ -1030,6 +1030,7 @@ if (acpi_do_set_cstate_limit) acpi_do_set_cstate_limit(); } +EXPORT_SYMBOL(acpi_set_cstate_limit); /* * Acquire a spinlock. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] cpuidle: acpi_set_cstate_limit compile fix
Len, Following are a bunch of small changes to cpuidle trying to prepare it for mainline. Some of the changes are just the compile timer errors/warnings and you probably already have them in acpi-test. Should apply cleanly to latest acpi-test. Please include in acpi-test. Thanks, Venki This patch: cpuidle compile fix related to acpi_set_cstate_limit(). Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/acpi/osl.c === --- linux-2.6.22-rc-mm.orig/drivers/acpi/osl.c 2007-06-01 16:17:40.0 -0700 +++ linux-2.6.22-rc-mm/drivers/acpi/osl.c 2007-06-01 16:21:43.0 -0700 @@ -1030,6 +1030,7 @@ if (acpi_do_set_cstate_limit) acpi_do_set_cstate_limit(); } +EXPORT_SYMBOL(acpi_set_cstate_limit); /* * Acquire a spinlock. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] cpuidle: menu governor and hrtimer compile fix
Compile fix for menu governor. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 17:52:33.0 -0700 @@ -11,8 +11,8 @@ #include linux/latency.h #include linux/time.h #include linux/ktime.h -#include linux/tick.h #include linux/hrtimer.h +#include linux/tick.h #define BM_HOLDOFF 2 /* 20 ms */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] cpuidle: reenable /proc/acpi/ power interface for the time being
Keep /proc/acpi/processor/CPU*/power around for a while as powertop depends on it. It will be marked deprecated and removed in future. powertop can use cpuidle interfaces instead. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/acpi/processor_idle.c === --- linux-2.6.22-rc-mm.orig/drivers/acpi/processor_idle.c 2007-06-01 16:17:40.0 -0700 +++ linux-2.6.22-rc-mm/drivers/acpi/processor_idle.c2007-06-01 17:20:57.0 -0700 @@ -792,7 +792,7 @@ * @t1: the start time * @t2: the end time */ -static inline u32 ticks_elapsed(u32 t1, u32 t2) +static inline u32 ticks_elapsed_in_us(u32 t1, u32 t2) { if (t2 = t1) return PM_TIMER_TICKS_TO_US(t2 - t1); @@ -802,6 +802,16 @@ return PM_TIMER_TICKS_TO_US((0x - t1) + t2); } +static inline u32 ticks_elapsed(u32 t1, u32 t2) +{ + if (t2 = t1) + return (t2 - t1); + else if (!(acpi_gbl_FADT.flags ACPI_FADT_32BIT_TIMER)) + return (((0x00FF - t1) + t2) 0x00FF); + else + return ((0x - t1) + t2); +} + /** * acpi_idle_update_bm_rld - updates the BM_RLD bit depending on target state * @pr: the processor @@ -925,7 +935,8 @@ cx-usage++; acpi_state_timer_broadcast(pr, cx, 0); - return ticks_elapsed(t1, t2); + cx-time += ticks_elapsed(t1, t2); + return ticks_elapsed_in_us(t1, t2); } static int c3_cpu_count; @@ -1009,7 +1020,8 @@ cx-usage++; acpi_state_timer_broadcast(pr, cx, 0); - return ticks_elapsed(t1, t2); + cx-time += ticks_elapsed(t1, t2); + return ticks_elapsed_in_us(t1, t2); } /** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] cpuidle: fis the uninitialized variable in sysfs routine
Fix the uninitialized usage of ret. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/sysfs.c 2007-06-04 15:44:17.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c 2007-06-04 15:46:49.0 -0700 @@ -301,7 +301,7 @@ */ int cpuidle_add_driver_sysfs(struct cpuidle_device *device) { - int i, ret; + int i, ret = -ENOMEM; struct cpuidle_state_kobj *kobj; /* state statistics */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/8] cpuidle: menu governor change the early break condition
Change the C-state early break out algorithm in menu governor. We only look at early breakouts that result in wakeups shorter than idle state's target_residency. If such a breakout is frequent enough, eliminate the particular idle state upto a timeout period. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-05 09:39:27.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 15:46:34.0 -0700 @@ -14,19 +14,20 @@ #include linux/hrtimer.h #include linux/tick.h -#define BM_HOLDOFF 2 /* 20 ms */ +#define BM_HOLDOFF 2 /* 20 ms */ +#define DEMOTION_THRESHOLD 5 +#define DEMOTION_TIMEOUT_MULTIPLIER1000 struct menu_device { int last_state_idx; - int deepest_bm_state; - int break_last_us; - int break_elapsed_us; + int deepest_break_state; + struct timespec break_expire_time_ts; + int break_last_cnt; + int deepest_bm_state; int bm_elapsed_us; int bm_holdoff_us; - - unsigned long idle_jiffies; }; static DEFINE_PER_CPU(struct menu_device, menu_devices); @@ -45,7 +46,6 @@ /* determine the expected residency time */ expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000; - expected_us = min(expected_us, data-break_last_us); /* determine the maximum state compatible with current BM status */ if (cpuidle_get_bm_activity()) @@ -53,17 +53,33 @@ if (data-bm_elapsed_us = data-bm_holdoff_us) max_state = data-deepest_bm_state + 1; + /* determine the maximum state compatible with recent idle breaks */ + if (data-deepest_break_state = 0) { + struct timespec now; + ktime_get_ts(now); + if (timespec_compare(data-break_expire_time_ts, now) 0) { + max_state = min(max_state, + data-deepest_break_state + 1); + } else { + data-deepest_break_state = -1; + } + } + /* find the deepest idle state that satisfies our constraints */ for (i = 1; i max_state; i++) { struct cpuidle_state *s = dev-states[i]; + if (s-target_residency expected_us) break; + if (s-exit_latency system_latency_constraint()) break; } + if (data-last_state_idx != i - 1) + data-break_last_cnt = 0; + data-last_state_idx = i - 1; - data-idle_jiffies = tick_nohz_get_idle_jiffies(); return i - 1; } @@ -91,14 +107,27 @@ measured_us = USEC_PER_SEC / HZ; data-bm_elapsed_us += measured_us; - data-break_elapsed_us += measured_us; + + if (data-last_state_idx == 0) + return; /* -* Did something other than the timer interrupt cause the break event? +* Did something other than the timer interrupt +* cause an early break event? */ - if (tick_nohz_get_idle_jiffies() == data-idle_jiffies) { - data-break_last_us = data-break_elapsed_us; - data-break_elapsed_us = 0; + if (unlikely(measured_us target-target_residency)) { + if (data-break_last_cnt DEMOTION_THRESHOLD) { + data-deepest_break_state = data-last_state_idx - 1; + ktime_get_ts(data-break_expire_time_ts); + timespec_add_ns(data-break_expire_time_ts, + target-target_residency * + DEMOTION_TIMEOUT_MULTIPLIER); + } else { + data-break_last_cnt++; + } + } else { + if (data-break_last_cnt 0) + data-break_last_cnt--; } } @@ -112,10 +141,9 @@ int i; data-last_state_idx = 0; - data-break_last_us = 0; - data-break_elapsed_us = 0; data-bm_elapsed_us = 0; data-bm_holdoff_us = BM_HOLDOFF; + data-deepest_break_state = -1; for (i = 1; i dev-state_count; i++) if (dev-states[i].flags CPUIDLE_FLAG_CHECK_BM) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/8] cpuidle: make cpuidle sysfs driver/governor switch off by default
Make default cpuidle sysfs to show current_governor and current_driver in read-only mode. More elaborate available_governors and available_drivers with writeable current_governor and current_driver interface only appear with cpuidle_sysfs_switch boot parameter. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/cpuidle/cpuidle.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/cpuidle.c 2007-06-05 17:52:32.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/cpuidle.c2007-06-06 10:57:41.0 -0700 @@ -25,7 +25,6 @@ LIST_HEAD(cpuidle_detected_devices); static void (*pm_idle_old)(void); - /** * cpuidle_idle_call - the main idle loop * Index: linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/sysfs.c 2007-06-05 17:52:56.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/sysfs.c 2007-06-06 11:29:50.0 -0700 @@ -13,6 +13,14 @@ #include cpuidle.h +static unsigned int sysfs_switch; +static int __init cpuidle_sysfs_setup(char *unused) +{ + sysfs_switch = 1; + return 1; +} +__setup(cpuidle_sysfs_switch, cpuidle_sysfs_setup); + static ssize_t show_available_drivers(struct sys_device *dev, char *buf) { ssize_t i = 0; @@ -127,6 +135,15 @@ return count; } +static SYSDEV_ATTR(current_driver_ro, 0444, show_current_driver, NULL); +static SYSDEV_ATTR(current_governor_ro, 0444, show_current_governor, NULL); + +static struct attribute *cpuclass_default_attrs[] = { + attr_current_driver_ro.attr, + attr_current_governor_ro.attr, + NULL +}; + static SYSDEV_ATTR(available_drivers, 0444, show_available_drivers, NULL); static SYSDEV_ATTR(available_governors, 0444, show_available_governors, NULL); static SYSDEV_ATTR(current_driver, 0644, show_current_driver, @@ -134,7 +151,7 @@ static SYSDEV_ATTR(current_governor, 0644, show_current_governor, store_current_governor); -static struct attribute *cpuclass_default_attrs[] = { +static struct attribute *cpuclass_switch_attrs[] = { attr_available_drivers.attr, attr_available_governors.attr, attr_current_driver.attr, @@ -152,6 +169,9 @@ */ int cpuidle_add_class_sysfs(struct sysdev_class *cls) { + if (sysfs_switch) + cpuclass_attr_group.attrs = cpuclass_switch_attrs; + return sysfs_create_group(cls-kset.kobj, cpuclass_attr_group); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] cpuidle: add rating to the governors and pick the one with highest rating by default
Introduce a governor rating scheme to pick the right governor by default. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/include/linux/cpuidle.h === --- linux-2.6.22-rc-mm.orig/include/linux/cpuidle.h 2007-06-05 17:00:09.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/cpuidle.h 2007-06-05 17:01:08.0 -0700 @@ -159,6 +159,7 @@ struct cpuidle_governor { charname[CPUIDLE_NAME_LEN]; struct list_headgovernor_list; + unsigned intrating; int (*init)(struct cpuidle_device *dev); void (*exit)(struct cpuidle_device *dev); Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/menu.c2007-06-05 15:46:34.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/menu.c 2007-06-05 17:04:32.0 -0700 @@ -153,6 +153,7 @@ struct cpuidle_governor menu_governor = { .name = menu, + .rating = 20, .scan = menu_scan_device, .select = menu_select, .reflect = menu_reflect, Index: linux-2.6.22-rc-mm/drivers/cpuidle/governor.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governor.c 2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governor.c 2007-06-05 17:15:05.0 -0700 @@ -131,7 +131,8 @@ if (__cpuidle_find_governor(gov-name) == NULL) { ret = 0; list_add_tail(gov-governor_list, cpuidle_governors); - if (!cpuidle_curr_governor) + if (!cpuidle_curr_governor || + cpuidle_curr_governor-rating gov-rating) cpuidle_switch_governor(gov); } mutex_unlock(cpuidle_lock); @@ -142,6 +143,29 @@ EXPORT_SYMBOL_GPL(cpuidle_register_governor); /** + * cpuidle_replace_governor - find a replacement governor + * @exclude_rating: the rating that will be skipped while looking for + * new governor. + */ +struct cpuidle_governor *cpuidle_replace_governor(int exclude_rating) +{ + struct cpuidle_governor *gov; + struct cpuidle_governor *ret_gov = NULL; + unsigned int max_rating = 0; + + list_for_each_entry(gov, cpuidle_governors, governor_list) { + if (gov-rating == exclude_rating) + continue; + if (gov-rating max_rating) { + max_rating = gov-rating; + ret_gov = gov; + } + } + + return ret_gov; +} + +/** * cpuidle_unregister_governor - unregisters a governor * @gov: the governor */ @@ -151,8 +175,11 @@ return; mutex_lock(cpuidle_lock); - if (gov == cpuidle_curr_governor) - cpuidle_switch_governor(NULL); + if (gov == cpuidle_curr_governor) { + struct cpuidle_governor *new_gov; + new_gov = cpuidle_replace_governor(gov-rating); + cpuidle_switch_governor(new_gov); + } list_del(gov-governor_list); mutex_unlock(cpuidle_lock); } Index: linux-2.6.22-rc-mm/drivers/cpuidle/governors/ladder.c === --- linux-2.6.22-rc-mm.orig/drivers/cpuidle/governors/ladder.c 2007-06-01 16:25:49.0 -0700 +++ linux-2.6.22-rc-mm/drivers/cpuidle/governors/ladder.c 2007-06-05 17:03:37.0 -0700 @@ -199,6 +199,7 @@ static struct cpuidle_governor ladder_governor = { .name = ladder, + .rating = 10, .init = ladder_init_device, .exit = ladder_exit_device, .scan = ladder_scan_device, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] cpuidle: first round of documentation updates
Documentation changes based on Pavel's feedback. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/Documentation/cpuidle/sysfs.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/sysfs.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/sysfs.txt 2007-06-06 11:35:37.0 -0700 @@ -4,14 +4,22 @@ cpuidle sysfs -System global cpuidle information are under +System global cpuidle related information and tunables are under /sys/devices/system/cpu/cpuidle The current interfaces in this directory has self-explanatory names: +* current_driver_ro +* current_governor_ro + +With cpuidle_sysfs_switch boot option (meant for developer testing) +following objects are visible instead. * available_drivers * available_governors * current_driver * current_governor +In this case user can switch the driver, governor at run time by writing +onto current_driver and current_governor. + Per logical CPU specific cpuidle information are under /sys/devices/system/cpu/cpuX/cpuidle @@ -19,9 +27,9 @@ Under this percpu directory, there is a directory for each idle state supported by the driver, which in turn has -* latency -* power -* time -* usage +* latency : Latency to exit out of this idle state (in microseconds) +* power : Power consumed while in this idle state (in milliwatts) +* time : Total time spent in this idle state (in microseconds) +* usage : Number of times this state was entered (count) Index: linux-2.6.22-rc-mm/Documentation/cpuidle/governor.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/governor.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/governor.txt 2007-06-06 11:33:34.0 -0700 @@ -11,12 +11,16 @@ cpuidle governor is policy routine that decides what idle state to enter at any given time. cpuidle core uses different callbacks to governor while handling idle entry. -* select_state callback where governor can determine next idle state to enter -* prepare_idle callback is called before entering an idle state -* scan callback is called after a driver forces redetection of the states +* select_state() callback where governor can determine next idle state to enter +* prepare_idle() callback is called before entering an idle state +* scan() callback is called after a driver forces redetection of the states More than one governor can be registered at the same time and -user can switch between drivers using /sysfs interface. +user can switch between drivers using /sysfs interface (when supported). + +More than one governor part is supported for developers to easily experiment +with different governors. By default, most optimal governor based on your +kernel configuration and platform will be selected by cpuidle. Interfaces: int cpuidle_register_governor(struct cpuidle_governor *gov); Index: linux-2.6.22-rc-mm/Documentation/cpuidle/core.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/core.txt 2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/core.txt 2007-06-06 11:33:34.0 -0700 @@ -12,6 +12,6 @@ standardized infrastructure to support independent development of governors and drivers. -cpuidle resides under /drivers/cpuidle. +cpuidle resides under drivers/cpuidle. Index: linux-2.6.22-rc-mm/Documentation/cpuidle/driver.txt === --- linux-2.6.22-rc-mm.orig/Documentation/cpuidle/driver.txt2007-06-06 11:33:25.0 -0700 +++ linux-2.6.22-rc-mm/Documentation/cpuidle/driver.txt 2007-06-06 11:33:34.0 -0700 @@ -7,16 +7,21 @@ -cpuidle driver supports capability detection for a particular system. The -init and exit routines will be called for each online CPU, with a percpu -cpuidle_driver object and driver should fill in cpuidle_states inside -cpuidle_driver depending on the CPU capability. +cpuidle driver hooks into the cpuidle infrastructure and does the +architecture/platform dependent part of CPU idle states. Driver +provides the platform idle state detection capability and also +has mechanisms in place to support actusl entry-exit into a CPU idle state. + +cpuidle driver supports capability detection for a platform using the +init and exit routines. They will be called for each online CPU, with a +percpu cpuidle_driver object and driver should fill in cpuidle_states +inside cpuidle_driver depending on the CPU capability. Driver can handle dynamic state changes (like battery-AC), by calling force_redetect interface. It is possible to have more than one driver registered at the same time and -user can switch between drivers using /sysfs interface. +user can switch between
Re: Intel's response Linux/MTRR/8GB Memory Support / Why doesn't the kernel realize the BIOS has problems and re-map appropriately?
On Fri, Jun 01, 2007 at 02:41:57PM -0700, Jesse Barnes wrote: > On Friday, June 1, 2007 2:19:43 Andi Kleen wrote: > > And normally the MTRRs win, don't they (if I remember the table correctly) > > So if the MTRR says UC and PAT disagrees it might not actually help > > I just checked, yes the MTRRs win for UC types. But it sounds like the cases > we're talking about are actually situations where there's no MTRR coverage, > so the default type is used. The manual doesn't specifically call out how > memory using the default type interacts with PAT, but it may well be that it > stays uncached if the default type is uncached. Again that argues for fixing > the MTRR mapping problem in some way. > I feel, having a silent/transparent workaround is not a good idea. With that chances are BIOS bug will go unnoticed (having an error message in dmesg may not get noticed either). Probably we should just panic at boot with a detailed message about the e820 mtrr discrepancy (which can be logged as a BUG to BIOS provider) and suggest a temporary workaround of "mem=___". Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel's response Linux/MTRR/8GB Memory Support / Why doesn't the kernel realize the BIOS has problems and re-map appropriately?
On Fri, Jun 01, 2007 at 02:41:57PM -0700, Jesse Barnes wrote: On Friday, June 1, 2007 2:19:43 Andi Kleen wrote: And normally the MTRRs win, don't they (if I remember the table correctly) So if the MTRR says UC and PAT disagrees it might not actually help I just checked, yes the MTRRs win for UC types. But it sounds like the cases we're talking about are actually situations where there's no MTRR coverage, so the default type is used. The manual doesn't specifically call out how memory using the default type interacts with PAT, but it may well be that it stays uncached if the default type is uncached. Again that argues for fixing the MTRR mapping problem in some way. I feel, having a silent/transparent workaround is not a good idea. With that chances are BIOS bug will go unnoticed (having an error message in dmesg may not get noticed either). Probably we should just panic at boot with a detailed message about the e820 mtrr discrepancy (which can be logged as a BUG to BIOS provider) and suggest a temporary workaround of mem=___. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add a flag to indicate deferrable timers in /proc/timer_stats
Add a flag in /proc/timer_stats to indicate deferrable timers. This will let developers/users to differentiate between types of tiemrs in /proc/timer_stats. Deferrable timer and normal timer will appear in /proc/timer_stats as below. 10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) Also version of timer_stats changes from v0.1 to v0.2 Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/include/linux/hrtimer.h === --- linux-2.6.22-rc-mm.orig/include/linux/hrtimer.h 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/hrtimer.h 2007-05-30 15:02:49.0 -0700 @@ -329,12 +329,13 @@ #ifdef CONFIG_TIMER_STATS extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf, -void *timerf, char * comm); +void *timerf, char * comm, +unsigned int timer_flag); static inline void timer_stats_account_hrtimer(struct hrtimer *timer) { timer_stats_update_stats(timer, timer->start_pid, timer->start_site, -timer->function, timer->start_comm); +timer->function, timer->start_comm, 0); } extern void __timer_stats_hrtimer_set_start_info(struct hrtimer *timer, Index: linux-2.6.22-rc-mm/kernel/timer.c === --- linux-2.6.22-rc-mm.orig/kernel/timer.c 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/kernel/timer.c 2007-05-30 15:19:15.0 -0700 @@ -305,6 +305,20 @@ memcpy(timer->start_comm, current->comm, TASK_COMM_LEN); timer->start_pid = current->pid; } + +static void timer_stats_account_timer(struct timer_list *timer) +{ + unsigned int flag = 0; + + if (unlikely(tbase_get_deferrable(timer->base))) + flag |= TIMER_STATS_FLAG_DEFERRABLE; + + timer_stats_update_stats(timer, timer->start_pid, timer->start_site, +timer->function, timer->start_comm, flag); +} + +#else +static void timer_stats_account_timer(struct timer_list *timer) {} #endif /** Index: linux-2.6.22-rc-mm/kernel/time/timer_stats.c === --- linux-2.6.22-rc-mm.orig/kernel/time/timer_stats.c 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/kernel/time/timer_stats.c2007-05-30 15:36:55.0 -0700 @@ -68,6 +68,7 @@ * Number of timeout events: */ unsigned long count; + unsigned inttimer_flag; /* * We save the command-line string to preserve @@ -227,7 +228,8 @@ * incremented. Otherwise the timer is registered in a free slot. */ void timer_stats_update_stats(void *timer, pid_t pid, void *startf, - void *timerf, char * comm) + void *timerf, char * comm, + unsigned int timer_flag) { /* * It doesnt matter which lock we take: @@ -240,6 +242,7 @@ input.start_func = startf; input.expire_func = timerf; input.pid = pid; + input.timer_flag = timer_flag; spin_lock_irqsave(lock, flags); if (!active) @@ -286,7 +289,7 @@ period = ktime_to_timespec(time); ms = period.tv_nsec / 100; - seq_puts(m, "Timer Stats Version: v0.1\n"); + seq_puts(m, "Timer Stats Version: v0.2\n"); seq_printf(m, "Sample period: %ld.%03ld s\n", period.tv_sec, ms); if (atomic_read(_count)) seq_printf(m, "Overflow: %d entries\n", @@ -294,8 +297,13 @@ for (i = 0; i < nr_entries; i++) { entry = entries + i; - seq_printf(m, "%4lu, %5d %-16s ", + if (entry->timer_flag & TIMER_STATS_FLAG_DEFERRABLE) { + seq_printf(m, "%4luD, %5d %-16s ", entry->count, entry->pid, entry->comm); + } else { + seq_printf(m, " %4lu, %5d %-16s ", + entry->count, entry->pid, entry->comm); + } print_name_offset(m, (unsigned long)entry->start_func); seq_puts(m, " ("); Index: linux-2.6.22-rc-mm/include/linux/timer.h === --- linux-2.6.22-rc-mm.orig/include/linux/timer.h 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/timer.h2007-05-30 15:19:23.0 -0700 @@ -85,16 +85,13 @@ */ #ifdef CONFIG_TIMER_STATS +#define TIMER_STATS_FLAG_DEFERRABLE0x1 + extern void init_timer_stats(void); extern void timer_stats_update_stats(void *timer, pid_t pid, void
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 01:30:39PM -0700, Stephen Hemminger wrote: > On Wed, 30 May 2007 12:55:51 -0700 (PDT) > David Miller <[EMAIL PROTECTED]> wrote: > > > From: Patrick McHardy <[EMAIL PROTECTED]> > > Date: Wed, 30 May 2007 20:42:32 +0200 > > > > > Stephen Hemminger wrote: > > > >>>Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c > > > >>>=== > > > >>>--- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c2007-05-24 > > > >>>11:16:03.0 -0700 > > > >>>+++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 > > > >>>15:10:02.0 -0700 > > > >>>@@ -224,7 +224,8 @@ > > > >>> if (dev->tx_timeout) { > > > >>> if (dev->watchdog_timeo <= 0) > > > >>> dev->watchdog_timeo = 5*HZ; > > > >>>- if (!mod_timer(>watchdog_timer, jiffies + > > > >>>dev->watchdog_timeo)) > > > >>>+ if (!mod_timer(>watchdog_timer, > > > >>>+ round_jiffies(jiffies + > > > >>>dev->watchdog_timeo))) > > > >>> dev_hold(dev); > > > >>> } > > > >>> } > > > >> > > > >>Please cc netdev on net patches. > > > >> > > > >>Again, I worry that if people set the watchdog timeout to, say, 0.1 > > > >>seconds > > > >>then they will get one second, which is grossly different. > > > >> > > > >>And if they were to set it to 1.5 seconds, they'd get 2.0 which is > > > >>pretty > > > >>significant, too. > > > > > > > > > > > > Alternatively, we could change to a timer that is pushed forward after > > > > each > > > > TX, maybe using hrtimer and hrtimer_forward(). That way the timer would > > > > never run in normal case. > > > > > > > > > It seems wasteful to add per-packet overhead for tx timeouts, which > > > should be an exception. Do drivers really care about the exact > > > timeout value? Compared to a packet transmission time its incredibly > > > long anyways .. > > > > I agree, this change is absolutely rediculious and is just a blind > > cookie-cutter change made without consideration of what the code is > > doing and what it's requirements are. > > > > what about the obvious compromise: > > --- a/net/sched/sch_generic.c 2007-05-30 11:42:18.0 -0700 > +++ b/net/sched/sch_generic.c 2007-05-30 13:29:34.0 -0700 > @@ -203,7 +203,11 @@ static void dev_watchdog(unsigned long a > dev->name); > dev->tx_timeout(dev); > } > - if (!mod_timer(>watchdog_timer, > round_jiffies(jiffies + dev->watchdog_timeo))) > + > + if (!mod_timer(>watchdog_timer, > +dev->watchdog_timeo > 2 * HZ > +? round_jiffies(jiffies + > dev->watchdog_timeo) > +: jiffies + dev->watchdog_timeo)) > dev_hold(dev); > } > } > > If this does not work: Another option is to use 'deferrable timer' here which will be called at same as before time when CPU is busy and on idle CPU it will be delayed until CPU comes out of idle due to any other events. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 12:55:51PM -0700, David Miller wrote: > From: Patrick McHardy <[EMAIL PROTECTED]> > Date: Wed, 30 May 2007 20:42:32 +0200 > > > Stephen Hemminger wrote: > > >>>Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c > > >>>=== > > >>>--- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c 2007-05-24 > > >>>11:16:03.0 -0700 > > >>>+++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 > > >>>15:10:02.0 -0700 > > >>>@@ -224,7 +224,8 @@ > > >>> if (dev->tx_timeout) { > > >>> if (dev->watchdog_timeo <= 0) > > >>> dev->watchdog_timeo = 5*HZ; > > >>>-if (!mod_timer(>watchdog_timer, jiffies + > > >>>dev->watchdog_timeo)) > > >>>+if (!mod_timer(>watchdog_timer, > > >>>+ round_jiffies(jiffies + > > >>>dev->watchdog_timeo))) > > >>> dev_hold(dev); > > >>> } > > >>> } > > >> > > >>Please cc netdev on net patches. > > >> > > >>Again, I worry that if people set the watchdog timeout to, say, 0.1 > > >>seconds > > >>then they will get one second, which is grossly different. > > >> > > >>And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty > > >>significant, too. > > > > > > > > > Alternatively, we could change to a timer that is pushed forward after > > > each > > > TX, maybe using hrtimer and hrtimer_forward(). That way the timer would > > > never run in normal case. > > > > > > It seems wasteful to add per-packet overhead for tx timeouts, which > > should be an exception. Do drivers really care about the exact > > timeout value? Compared to a packet transmission time its incredibly > > long anyways .. > > I agree, this change is absolutely rediculious and is just a blind > cookie-cutter change made without consideration of what the code is > doing and what it's requirements are. I hope I could atleast highlight the issue here despite the cookie-cutter patch.. On a totally idle system I have something like 85 wakeups for every 5 seconds which I am trying to reduce (to reduce the power consumption and increase battery life. And 1 interrupt out of 85 happens to be netdev watchdog timer. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 08:42:32PM +0200, Patrick McHardy wrote: > Stephen Hemminger wrote: > >>>Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c > >>>=== > >>>--- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c2007-05-24 > >>>11:16:03.0 -0700 > >>>+++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 > >>>15:10:02.0 -0700 > >>>@@ -224,7 +224,8 @@ > >>> if (dev->tx_timeout) { > >>> if (dev->watchdog_timeo <= 0) > >>> dev->watchdog_timeo = 5*HZ; > >>>- if (!mod_timer(>watchdog_timer, jiffies + > >>>dev->watchdog_timeo)) > >>>+ if (!mod_timer(>watchdog_timer, > >>>+ round_jiffies(jiffies + dev->watchdog_timeo))) > >>> dev_hold(dev); > >>> } > >>> } > >> > >>Please cc netdev on net patches. > >> > >>Again, I worry that if people set the watchdog timeout to, say, 0.1 seconds > >>then they will get one second, which is grossly different. > >> > >>And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty > >>significant, too. > > > > > > Alternatively, we could change to a timer that is pushed forward after each > > TX, maybe using hrtimer and hrtimer_forward(). That way the timer would > > never run in normal case. > > > It seems wasteful to add per-packet overhead for tx timeouts, which > should be an exception. Do drivers really care about the exact > timeout value? Compared to a packet transmission time its incredibly > long anyways .. I agree. Doing a mod_timer or hrtimer_forward to push forward may add to the complexity depending on how often TX happens. Are the drivers really worried about exact timeouts here? Can we use rounding for the timers that are more than a second, at least? Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 12:55:51PM -0700, David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Wed, 30 May 2007 20:42:32 +0200 Stephen Hemminger wrote: Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c === --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c 2007-05-24 11:16:03.0 -0700 +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 15:10:02.0 -0700 @@ -224,7 +224,8 @@ if (dev-tx_timeout) { if (dev-watchdog_timeo = 0) dev-watchdog_timeo = 5*HZ; -if (!mod_timer(dev-watchdog_timer, jiffies + dev-watchdog_timeo)) +if (!mod_timer(dev-watchdog_timer, + round_jiffies(jiffies + dev-watchdog_timeo))) dev_hold(dev); } } Please cc netdev on net patches. Again, I worry that if people set the watchdog timeout to, say, 0.1 seconds then they will get one second, which is grossly different. And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty significant, too. Alternatively, we could change to a timer that is pushed forward after each TX, maybe using hrtimer and hrtimer_forward(). That way the timer would never run in normal case. It seems wasteful to add per-packet overhead for tx timeouts, which should be an exception. Do drivers really care about the exact timeout value? Compared to a packet transmission time its incredibly long anyways .. I agree, this change is absolutely rediculious and is just a blind cookie-cutter change made without consideration of what the code is doing and what it's requirements are. I hope I could atleast highlight the issue here despite the cookie-cutter patch.. On a totally idle system I have something like 85 wakeups for every 5 seconds which I am trying to reduce (to reduce the power consumption and increase battery life. And 1 interrupt out of 85 happens to be netdev watchdog timer. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 01:30:39PM -0700, Stephen Hemminger wrote: On Wed, 30 May 2007 12:55:51 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Wed, 30 May 2007 20:42:32 +0200 Stephen Hemminger wrote: Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c === --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c2007-05-24 11:16:03.0 -0700 +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 15:10:02.0 -0700 @@ -224,7 +224,8 @@ if (dev-tx_timeout) { if (dev-watchdog_timeo = 0) dev-watchdog_timeo = 5*HZ; - if (!mod_timer(dev-watchdog_timer, jiffies + dev-watchdog_timeo)) + if (!mod_timer(dev-watchdog_timer, + round_jiffies(jiffies + dev-watchdog_timeo))) dev_hold(dev); } } Please cc netdev on net patches. Again, I worry that if people set the watchdog timeout to, say, 0.1 seconds then they will get one second, which is grossly different. And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty significant, too. Alternatively, we could change to a timer that is pushed forward after each TX, maybe using hrtimer and hrtimer_forward(). That way the timer would never run in normal case. It seems wasteful to add per-packet overhead for tx timeouts, which should be an exception. Do drivers really care about the exact timeout value? Compared to a packet transmission time its incredibly long anyways .. I agree, this change is absolutely rediculious and is just a blind cookie-cutter change made without consideration of what the code is doing and what it's requirements are. what about the obvious compromise: --- a/net/sched/sch_generic.c 2007-05-30 11:42:18.0 -0700 +++ b/net/sched/sch_generic.c 2007-05-30 13:29:34.0 -0700 @@ -203,7 +203,11 @@ static void dev_watchdog(unsigned long a dev-name); dev-tx_timeout(dev); } - if (!mod_timer(dev-watchdog_timer, round_jiffies(jiffies + dev-watchdog_timeo))) + + if (!mod_timer(dev-watchdog_timer, +dev-watchdog_timeo 2 * HZ +? round_jiffies(jiffies + dev-watchdog_timeo) +: jiffies + dev-watchdog_timeo)) dev_hold(dev); } } If this does not work: Another option is to use 'deferrable timer' here which will be called at same as before time when CPU is busy and on idle CPU it will be delayed until CPU comes out of idle due to any other events. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add a flag to indicate deferrable timers in /proc/timer_stats
Add a flag in /proc/timer_stats to indicate deferrable timers. This will let developers/users to differentiate between types of tiemrs in /proc/timer_stats. Deferrable timer and normal timer will appear in /proc/timer_stats as below. 10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) Also version of timer_stats changes from v0.1 to v0.2 Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/include/linux/hrtimer.h === --- linux-2.6.22-rc-mm.orig/include/linux/hrtimer.h 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/hrtimer.h 2007-05-30 15:02:49.0 -0700 @@ -329,12 +329,13 @@ #ifdef CONFIG_TIMER_STATS extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf, -void *timerf, char * comm); +void *timerf, char * comm, +unsigned int timer_flag); static inline void timer_stats_account_hrtimer(struct hrtimer *timer) { timer_stats_update_stats(timer, timer-start_pid, timer-start_site, -timer-function, timer-start_comm); +timer-function, timer-start_comm, 0); } extern void __timer_stats_hrtimer_set_start_info(struct hrtimer *timer, Index: linux-2.6.22-rc-mm/kernel/timer.c === --- linux-2.6.22-rc-mm.orig/kernel/timer.c 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/kernel/timer.c 2007-05-30 15:19:15.0 -0700 @@ -305,6 +305,20 @@ memcpy(timer-start_comm, current-comm, TASK_COMM_LEN); timer-start_pid = current-pid; } + +static void timer_stats_account_timer(struct timer_list *timer) +{ + unsigned int flag = 0; + + if (unlikely(tbase_get_deferrable(timer-base))) + flag |= TIMER_STATS_FLAG_DEFERRABLE; + + timer_stats_update_stats(timer, timer-start_pid, timer-start_site, +timer-function, timer-start_comm, flag); +} + +#else +static void timer_stats_account_timer(struct timer_list *timer) {} #endif /** Index: linux-2.6.22-rc-mm/kernel/time/timer_stats.c === --- linux-2.6.22-rc-mm.orig/kernel/time/timer_stats.c 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/kernel/time/timer_stats.c2007-05-30 15:36:55.0 -0700 @@ -68,6 +68,7 @@ * Number of timeout events: */ unsigned long count; + unsigned inttimer_flag; /* * We save the command-line string to preserve @@ -227,7 +228,8 @@ * incremented. Otherwise the timer is registered in a free slot. */ void timer_stats_update_stats(void *timer, pid_t pid, void *startf, - void *timerf, char * comm) + void *timerf, char * comm, + unsigned int timer_flag) { /* * It doesnt matter which lock we take: @@ -240,6 +242,7 @@ input.start_func = startf; input.expire_func = timerf; input.pid = pid; + input.timer_flag = timer_flag; spin_lock_irqsave(lock, flags); if (!active) @@ -286,7 +289,7 @@ period = ktime_to_timespec(time); ms = period.tv_nsec / 100; - seq_puts(m, Timer Stats Version: v0.1\n); + seq_puts(m, Timer Stats Version: v0.2\n); seq_printf(m, Sample period: %ld.%03ld s\n, period.tv_sec, ms); if (atomic_read(overflow_count)) seq_printf(m, Overflow: %d entries\n, @@ -294,8 +297,13 @@ for (i = 0; i nr_entries; i++) { entry = entries + i; - seq_printf(m, %4lu, %5d %-16s , + if (entry-timer_flag TIMER_STATS_FLAG_DEFERRABLE) { + seq_printf(m, %4luD, %5d %-16s , entry-count, entry-pid, entry-comm); + } else { + seq_printf(m, %4lu, %5d %-16s , + entry-count, entry-pid, entry-comm); + } print_name_offset(m, (unsigned long)entry-start_func); seq_puts(m, (); Index: linux-2.6.22-rc-mm/include/linux/timer.h === --- linux-2.6.22-rc-mm.orig/include/linux/timer.h 2007-05-24 17:04:10.0 -0700 +++ linux-2.6.22-rc-mm/include/linux/timer.h2007-05-30 15:19:23.0 -0700 @@ -85,16 +85,13 @@ */ #ifdef CONFIG_TIMER_STATS +#define TIMER_STATS_FLAG_DEFERRABLE0x1 + extern void init_timer_stats(void); extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf, -
Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
On Wed, May 30, 2007 at 08:42:32PM +0200, Patrick McHardy wrote: Stephen Hemminger wrote: Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c === --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c2007-05-24 11:16:03.0 -0700 +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 15:10:02.0 -0700 @@ -224,7 +224,8 @@ if (dev-tx_timeout) { if (dev-watchdog_timeo = 0) dev-watchdog_timeo = 5*HZ; - if (!mod_timer(dev-watchdog_timer, jiffies + dev-watchdog_timeo)) + if (!mod_timer(dev-watchdog_timer, + round_jiffies(jiffies + dev-watchdog_timeo))) dev_hold(dev); } } Please cc netdev on net patches. Again, I worry that if people set the watchdog timeout to, say, 0.1 seconds then they will get one second, which is grossly different. And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty significant, too. Alternatively, we could change to a timer that is pushed forward after each TX, maybe using hrtimer and hrtimer_forward(). That way the timer would never run in normal case. It seems wasteful to add per-packet overhead for tx timeouts, which should be an exception. Do drivers really care about the exact timeout value? Compared to a packet transmission time its incredibly long anyways .. I agree. Doing a mod_timer or hrtimer_forward to push forward may add to the complexity depending on how often TX happens. Are the drivers really worried about exact timeouts here? Can we use rounding for the timers that are more than a second, at least? Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 05:04:13PM -0700, H. Peter Anvin wrote: > > If they grow slowly from the bottom, I guess we could simply allocate > space in the vector byte by byte instead. Either way, it means more > work whenever anything has to change. > hpa, Below patch adds a new word for feature bits that willb eused for all Intel features that may be spread around in CPUID leafs like 0x6, 0xA, etc. I added "ida" bit first into this word. I will send an incremental patch to move ARCH_PERFMON bit and any other feature bits in these leaf subsequently. The patch is against newsetup git tree. Please apply. Thanks, Venki Use a new CPU feature word to cover all Intel features that are spread around in different CPUID leafs like 0x5, 0x6 and 0xA. Make this feature detection code common across i386 and x86_64. Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature will be enabled automatically by current acpi-cpufreq driver. Refer to Intel Software Developer's Manual for more details about the feature. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-i386/cpufeature.h === --- linux-2.6.orig/include/asm-i386/cpufeature.h2007-05-29 07:30:28.0 -0700 +++ linux-2.6/include/asm-i386/cpufeature.h 2007-05-29 10:21:17.0 -0700 @@ -12,7 +12,7 @@ #endif #include -#define NCAPINTS 7 /* N 32-bit words worth of info */ +#define NCAPINTS 8 /* N 32-bit words worth of info */ /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */ #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */ @@ -109,6 +109,9 @@ #define X86_FEATURE_LAHF_LM(6*32+ 0) /* LAHF/SAHF in long mode */ #define X86_FEATURE_CMP_LEGACY (6*32+ 1) /* If yes HyperThreading not valid */ +/* More extended Intel flags: From various new CPUID levels like 0x6, 0xA etc */ +#define X86_FEATURE_IDA(7*32+ 0) /* Intel Dynamic Acceleration */ + #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && \ ( (((bit)>>5)==0 && (1UL<<((bit)&31) & REQUIRED_MASK0)) || \ @@ -117,7 +120,8 @@ (((bit)>>5)==3 && (1UL<<((bit)&31) & REQUIRED_MASK3)) || \ (((bit)>>5)==4 && (1UL<<((bit)&31) & REQUIRED_MASK4)) || \ (((bit)>>5)==5 && (1UL<<((bit)&31) & REQUIRED_MASK5)) || \ - (((bit)>>5)==6 && (1UL<<((bit)&31) & REQUIRED_MASK6)) ) \ + (((bit)>>5)==6 && (1UL<<((bit)&31) & REQUIRED_MASK6)) || \ + (((bit)>>5)==7 && (1UL<<((bit)&31) & REQUIRED_MASK7)) ) \ ? 1 : \ test_bit(bit, (c)->x86_capability)) #define boot_cpu_has(bit) cpu_has(_cpu_data, bit) Index: linux-2.6/arch/i386/kernel/cpu/proc.c === --- linux-2.6.orig/arch/i386/kernel/cpu/proc.c 2007-05-29 07:30:20.0 -0700 +++ linux-2.6/arch/i386/kernel/cpu/proc.c 2007-05-29 08:20:51.0 -0700 @@ -65,6 +65,12 @@ "osvw", "ibs", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + + /* Intel-defined (#3) */ + "ida", NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; static const char * const x86_power_flags[] = { "ts", /* temperature sensor */ Index: linux-2.6/arch/x86_64/kernel/setup.c === --- linux-2.6.orig/arch/x86_64/kernel/setup.c 2007-05-29 07:30:21.0 -0700 +++ linux-2.6/arch/x86_64/kernel/setup.c2007-05-29 09:20:01.0 -0700 @@ -699,6 +699,7 @@ /* Cache sizes */ unsigned n; + init_additional_intel_features(c); init_intel_cacheinfo(c); if (c->cpuid_level > 9 ) { unsigned eax = cpuid_eax(10); @@ -973,6 +974,12 @@ "osvw", "ibs", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + + /* Intel-defined (#3) */ + "ida", NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; static char *x86_power_flags[] = { "ts", /* temperature sensor */ Index: linux-2.6/include/asm-i386/required-features.h
Re: [PATCH 1/4] Make usb-autosuspend timer 1 sec jiffy aligned
On Tue, May 29, 2007 at 11:22:30AM -0700, Randy Dunlap wrote: > On Tue, 29 May 2007 10:58:21 -0700 Venki Pallipadi wrote: > > > > > > > Below are a bunch of random timers, that were active on my system, > > that can better be round_jiffies() aligned. > > and these 4 patches help with (a) power usage, or (b) cache > usage/niceness, or (c) other (be specific)... > Yes. They are all related to power savings with tickless kernel. A 5 sec timer account for 0.2 unnecessary wakeups per sec (powertop numbers). All these patches together account for somewhere between 0.5-1 wakeup per second saving. That means my wakeups per second comes down from ~18 per second to ~17 per second. On my dual core laptop, CPUs will have more than 3% increase in average C3 residency (actual powertop number went from ~104mS to ~108mS long term C3 residency). The actual AC power numbers were not consisitent enough to be reported here. But, all these small changes will add up in terms of power savings and battery life. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] Make mce polling timers 1 sec jiffy aligned
round_jiffies() for i386 and x86-64 non-critical/corrected MCE polling. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/arch/x86_64/kernel/mce.c === --- linux-2.6.22-rc-mm.orig/arch/x86_64/kernel/mce.c2007-05-24 11:15:57.0 -0700 +++ linux-2.6.22-rc-mm/arch/x86_64/kernel/mce.c 2007-05-25 17:29:21.0 -0700 @@ -366,7 +366,8 @@ next_interval = min(next_interval*2, check_interval*HZ); } - schedule_delayed_work(_work, next_interval); + schedule_delayed_work(_work, + round_jiffies_relative(next_interval)); } @@ -374,7 +375,8 @@ { next_interval = check_interval * HZ; if (next_interval) - schedule_delayed_work(_work, next_interval); + schedule_delayed_work(_work, + round_jiffies_relative(next_interval)); return 0; } __initcall(periodic_mcheck_init); @@ -618,7 +620,8 @@ on_each_cpu(mce_init, NULL, 1, 1); next_interval = check_interval * HZ; if (next_interval) - schedule_delayed_work(_work, next_interval); + schedule_delayed_work(_work, + round_jiffies_relative(next_interval)); } static struct sysdev_class mce_sysclass = { Index: linux-2.6.22-rc-mm/arch/i386/kernel/cpu/mcheck/non-fatal.c === --- linux-2.6.22-rc-mm.orig/arch/i386/kernel/cpu/mcheck/non-fatal.c 2007-04-25 20:08:32.0 -0700 +++ linux-2.6.22-rc-mm/arch/i386/kernel/cpu/mcheck/non-fatal.c 2007-05-25 17:27:49.0 -0700 @@ -57,7 +57,7 @@ static void mce_work_fn(struct work_struct *work) { on_each_cpu(mce_checkregs, NULL, 1, 1); - schedule_delayed_work(_work, MCE_RATE); + schedule_delayed_work(_work, round_jiffies_relative(MCE_RATE)); } static int __init init_nonfatal_mce_checker(void) @@ -82,7 +82,7 @@ /* * Check for non-fatal errors every MCE_RATE s */ - schedule_delayed_work(_work, MCE_RATE); + schedule_delayed_work(_work, round_jiffies_relative(MCE_RATE)); printk(KERN_INFO "Machine check exception polling timer started.\n"); return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
round_jiffies for net dev watchdog timer. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c === --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c 2007-05-24 11:16:03.0 -0700 +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 15:10:02.0 -0700 @@ -224,7 +224,8 @@ if (dev->tx_timeout) { if (dev->watchdog_timeo <= 0) dev->watchdog_timeo = 5*HZ; - if (!mod_timer(>watchdog_timer, jiffies + dev->watchdog_timeo)) + if (!mod_timer(>watchdog_timer, + round_jiffies(jiffies + dev->watchdog_timeo))) dev_hold(dev); } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] Make page-writeback timers 1 sec jiffy aligned
timer round_jiffies in page-writeback. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/mm/page-writeback.c === --- linux-2.6.22-rc-mm.orig/mm/page-writeback.c 2007-05-25 10:49:11.0 -0700 +++ linux-2.6.22-rc-mm/mm/page-writeback.c 2007-05-25 10:49:29.0 -0700 @@ -469,7 +469,7 @@ if (time_before(next_jif, jiffies + HZ)) next_jif = jiffies + HZ; if (dirty_writeback_interval) - mod_timer(_timer, next_jif); + mod_timer(_timer, round_jiffies(next_jif)); } /* @@ -481,7 +481,7 @@ proc_dointvec_userhz_jiffies(table, write, file, buffer, length, ppos); if (dirty_writeback_interval) { mod_timer(_timer, - jiffies + dirty_writeback_interval); + round_jiffies(jiffies + dirty_writeback_interval)); } else { del_timer(_timer); } @@ -491,7 +491,8 @@ static void wb_timer_fn(unsigned long unused) { if (pdflush_operation(wb_kupdate, 0) < 0) - mod_timer(_timer, jiffies + HZ); /* delay 1 second */ + mod_timer(_timer, round_jiffies(jiffies + HZ)); + /* delay 1 second */ } static void laptop_flush(unsigned long unused) @@ -511,7 +512,7 @@ */ void laptop_io_completion(void) { - mod_timer(_mode_wb_timer, jiffies + laptop_mode); + mod_timer(_mode_wb_timer, round_jiffies(jiffies + laptop_mode)); } /* @@ -582,7 +583,7 @@ */ void __init page_writeback_init(void) { - mod_timer(_timer, jiffies + dirty_writeback_interval); + mod_timer(_timer, round_jiffies(jiffies + dirty_writeback_interval)); writeback_set_ratelimit(); register_cpu_notifier(_nb); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] Make usb-autosuspend timer 1 sec jiffy aligned
Below are a bunch of random timers, that were active on my system, that can better be round_jiffies() aligned. I guess we need a audit of all timer usages atleast in kernel-core. This patch: Make usb autosuspend timers 1sec jiffy aligned. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.22-rc-mm/drivers/usb/core/driver.c === --- linux-2.6.22-rc-mm.orig/drivers/usb/core/driver.c 2007-05-24 11:16:00.0 -0700 +++ linux-2.6.22-rc-mm/drivers/usb/core/driver.c2007-05-25 10:00:50.0 -0700 @@ -974,7 +974,7 @@ * or for the past. */ queue_delayed_work(ksuspend_usb_wq, >autosuspend, - suspend_time - jiffies); + round_jiffies_relative(suspend_time - jiffies)); } return -EAGAIN; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] Make usb-autosuspend timer 1 sec jiffy aligned
Below are a bunch of random timers, that were active on my system, that can better be round_jiffies() aligned. I guess we need a audit of all timer usages atleast in kernel-core. This patch: Make usb autosuspend timers 1sec jiffy aligned. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/drivers/usb/core/driver.c === --- linux-2.6.22-rc-mm.orig/drivers/usb/core/driver.c 2007-05-24 11:16:00.0 -0700 +++ linux-2.6.22-rc-mm/drivers/usb/core/driver.c2007-05-25 10:00:50.0 -0700 @@ -974,7 +974,7 @@ * or for the past. */ queue_delayed_work(ksuspend_usb_wq, udev-autosuspend, - suspend_time - jiffies); + round_jiffies_relative(suspend_time - jiffies)); } return -EAGAIN; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] Make page-writeback timers 1 sec jiffy aligned
timer round_jiffies in page-writeback. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/mm/page-writeback.c === --- linux-2.6.22-rc-mm.orig/mm/page-writeback.c 2007-05-25 10:49:11.0 -0700 +++ linux-2.6.22-rc-mm/mm/page-writeback.c 2007-05-25 10:49:29.0 -0700 @@ -469,7 +469,7 @@ if (time_before(next_jif, jiffies + HZ)) next_jif = jiffies + HZ; if (dirty_writeback_interval) - mod_timer(wb_timer, next_jif); + mod_timer(wb_timer, round_jiffies(next_jif)); } /* @@ -481,7 +481,7 @@ proc_dointvec_userhz_jiffies(table, write, file, buffer, length, ppos); if (dirty_writeback_interval) { mod_timer(wb_timer, - jiffies + dirty_writeback_interval); + round_jiffies(jiffies + dirty_writeback_interval)); } else { del_timer(wb_timer); } @@ -491,7 +491,8 @@ static void wb_timer_fn(unsigned long unused) { if (pdflush_operation(wb_kupdate, 0) 0) - mod_timer(wb_timer, jiffies + HZ); /* delay 1 second */ + mod_timer(wb_timer, round_jiffies(jiffies + HZ)); + /* delay 1 second */ } static void laptop_flush(unsigned long unused) @@ -511,7 +512,7 @@ */ void laptop_io_completion(void) { - mod_timer(laptop_mode_wb_timer, jiffies + laptop_mode); + mod_timer(laptop_mode_wb_timer, round_jiffies(jiffies + laptop_mode)); } /* @@ -582,7 +583,7 @@ */ void __init page_writeback_init(void) { - mod_timer(wb_timer, jiffies + dirty_writeback_interval); + mod_timer(wb_timer, round_jiffies(jiffies + dirty_writeback_interval)); writeback_set_ratelimit(); register_cpu_notifier(ratelimit_nb); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned
round_jiffies for net dev watchdog timer. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c === --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c 2007-05-24 11:16:03.0 -0700 +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c 2007-05-25 15:10:02.0 -0700 @@ -224,7 +224,8 @@ if (dev-tx_timeout) { if (dev-watchdog_timeo = 0) dev-watchdog_timeo = 5*HZ; - if (!mod_timer(dev-watchdog_timer, jiffies + dev-watchdog_timeo)) + if (!mod_timer(dev-watchdog_timer, + round_jiffies(jiffies + dev-watchdog_timeo))) dev_hold(dev); } } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] Make mce polling timers 1 sec jiffy aligned
round_jiffies() for i386 and x86-64 non-critical/corrected MCE polling. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.22-rc-mm/arch/x86_64/kernel/mce.c === --- linux-2.6.22-rc-mm.orig/arch/x86_64/kernel/mce.c2007-05-24 11:15:57.0 -0700 +++ linux-2.6.22-rc-mm/arch/x86_64/kernel/mce.c 2007-05-25 17:29:21.0 -0700 @@ -366,7 +366,8 @@ next_interval = min(next_interval*2, check_interval*HZ); } - schedule_delayed_work(mcheck_work, next_interval); + schedule_delayed_work(mcheck_work, + round_jiffies_relative(next_interval)); } @@ -374,7 +375,8 @@ { next_interval = check_interval * HZ; if (next_interval) - schedule_delayed_work(mcheck_work, next_interval); + schedule_delayed_work(mcheck_work, + round_jiffies_relative(next_interval)); return 0; } __initcall(periodic_mcheck_init); @@ -618,7 +620,8 @@ on_each_cpu(mce_init, NULL, 1, 1); next_interval = check_interval * HZ; if (next_interval) - schedule_delayed_work(mcheck_work, next_interval); + schedule_delayed_work(mcheck_work, + round_jiffies_relative(next_interval)); } static struct sysdev_class mce_sysclass = { Index: linux-2.6.22-rc-mm/arch/i386/kernel/cpu/mcheck/non-fatal.c === --- linux-2.6.22-rc-mm.orig/arch/i386/kernel/cpu/mcheck/non-fatal.c 2007-04-25 20:08:32.0 -0700 +++ linux-2.6.22-rc-mm/arch/i386/kernel/cpu/mcheck/non-fatal.c 2007-05-25 17:27:49.0 -0700 @@ -57,7 +57,7 @@ static void mce_work_fn(struct work_struct *work) { on_each_cpu(mce_checkregs, NULL, 1, 1); - schedule_delayed_work(mce_work, MCE_RATE); + schedule_delayed_work(mce_work, round_jiffies_relative(MCE_RATE)); } static int __init init_nonfatal_mce_checker(void) @@ -82,7 +82,7 @@ /* * Check for non-fatal errors every MCE_RATE s */ - schedule_delayed_work(mce_work, MCE_RATE); + schedule_delayed_work(mce_work, round_jiffies_relative(MCE_RATE)); printk(KERN_INFO Machine check exception polling timer started.\n); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] Make usb-autosuspend timer 1 sec jiffy aligned
On Tue, May 29, 2007 at 11:22:30AM -0700, Randy Dunlap wrote: On Tue, 29 May 2007 10:58:21 -0700 Venki Pallipadi wrote: Below are a bunch of random timers, that were active on my system, that can better be round_jiffies() aligned. and these 4 patches help with (a) power usage, or (b) cache usage/niceness, or (c) other (be specific)... Yes. They are all related to power savings with tickless kernel. A 5 sec timer account for 0.2 unnecessary wakeups per sec (powertop numbers). All these patches together account for somewhere between 0.5-1 wakeup per second saving. That means my wakeups per second comes down from ~18 per second to ~17 per second. On my dual core laptop, CPUs will have more than 3% increase in average C3 residency (actual powertop number went from ~104mS to ~108mS long term C3 residency). The actual AC power numbers were not consisitent enough to be reported here. But, all these small changes will add up in terms of power savings and battery life. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 05:04:13PM -0700, H. Peter Anvin wrote: If they grow slowly from the bottom, I guess we could simply allocate space in the vector byte by byte instead. Either way, it means more work whenever anything has to change. hpa, Below patch adds a new word for feature bits that willb eused for all Intel features that may be spread around in CPUID leafs like 0x6, 0xA, etc. I added ida bit first into this word. I will send an incremental patch to move ARCH_PERFMON bit and any other feature bits in these leaf subsequently. The patch is against newsetup git tree. Please apply. Thanks, Venki Use a new CPU feature word to cover all Intel features that are spread around in different CPUID leafs like 0x5, 0x6 and 0xA. Make this feature detection code common across i386 and x86_64. Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature will be enabled automatically by current acpi-cpufreq driver. Refer to Intel Software Developer's Manual for more details about the feature. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6/include/asm-i386/cpufeature.h === --- linux-2.6.orig/include/asm-i386/cpufeature.h2007-05-29 07:30:28.0 -0700 +++ linux-2.6/include/asm-i386/cpufeature.h 2007-05-29 10:21:17.0 -0700 @@ -12,7 +12,7 @@ #endif #include asm/required-features.h -#define NCAPINTS 7 /* N 32-bit words worth of info */ +#define NCAPINTS 8 /* N 32-bit words worth of info */ /* Intel-defined CPU features, CPUID level 0x0001 (edx), word 0 */ #define X86_FEATURE_FPU(0*32+ 0) /* Onboard FPU */ @@ -109,6 +109,9 @@ #define X86_FEATURE_LAHF_LM(6*32+ 0) /* LAHF/SAHF in long mode */ #define X86_FEATURE_CMP_LEGACY (6*32+ 1) /* If yes HyperThreading not valid */ +/* More extended Intel flags: From various new CPUID levels like 0x6, 0xA etc */ +#define X86_FEATURE_IDA(7*32+ 0) /* Intel Dynamic Acceleration */ + #define cpu_has(c, bit) \ (__builtin_constant_p(bit)\ ( (((bit)5)==0 (1UL((bit)31) REQUIRED_MASK0)) || \ @@ -117,7 +120,8 @@ (((bit)5)==3 (1UL((bit)31) REQUIRED_MASK3)) || \ (((bit)5)==4 (1UL((bit)31) REQUIRED_MASK4)) || \ (((bit)5)==5 (1UL((bit)31) REQUIRED_MASK5)) || \ - (((bit)5)==6 (1UL((bit)31) REQUIRED_MASK6)) ) \ + (((bit)5)==6 (1UL((bit)31) REQUIRED_MASK6)) || \ + (((bit)5)==7 (1UL((bit)31) REQUIRED_MASK7)) ) \ ? 1 : \ test_bit(bit, (c)-x86_capability)) #define boot_cpu_has(bit) cpu_has(boot_cpu_data, bit) Index: linux-2.6/arch/i386/kernel/cpu/proc.c === --- linux-2.6.orig/arch/i386/kernel/cpu/proc.c 2007-05-29 07:30:20.0 -0700 +++ linux-2.6/arch/i386/kernel/cpu/proc.c 2007-05-29 08:20:51.0 -0700 @@ -65,6 +65,12 @@ osvw, ibs, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + + /* Intel-defined (#3) */ + ida, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; static const char * const x86_power_flags[] = { ts, /* temperature sensor */ Index: linux-2.6/arch/x86_64/kernel/setup.c === --- linux-2.6.orig/arch/x86_64/kernel/setup.c 2007-05-29 07:30:21.0 -0700 +++ linux-2.6/arch/x86_64/kernel/setup.c2007-05-29 09:20:01.0 -0700 @@ -699,6 +699,7 @@ /* Cache sizes */ unsigned n; + init_additional_intel_features(c); init_intel_cacheinfo(c); if (c-cpuid_level 9 ) { unsigned eax = cpuid_eax(10); @@ -973,6 +974,12 @@ osvw, ibs, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + + /* Intel-defined (#3) */ + ida, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; static char *x86_power_flags[] = { ts, /* temperature sensor */ Index: linux-2.6/include/asm-i386/required-features.h ===
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 03:02:23PM -0700, Andrew Morton wrote: > On Wed, 23 May 2007 15:46:37 -0700 > Venki Pallipadi <[EMAIL PROTECTED]> wrote: > > > Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature > > will be enabled automatically by current acpi-cpufreq driver and cpufreq. > > > > Refer to Intel Software Developer's Manual for more details about the > > feature. > > > > Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> > > > > Index: linux-2.6.22-rc-mm/arch/i386/kernel/cpu/proc.c > > === > > --- linux-2.6.22-rc-mm.orig/arch/i386/kernel/cpu/proc.c > > +++ linux-2.6.22-rc-mm/arch/i386/kernel/cpu/proc.c > > @@ -41,7 +41,7 @@ static int show_cpuinfo(struct seq_file > > "cxmmx", "k6_mtrr", "cyrix_arr", "centaur_mcr", > > NULL, NULL, NULL, NULL, > > "constant_tsc", "up", NULL, NULL, NULL, NULL, NULL, NULL, > > - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > + "ida", NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > > > /* Intel-defined (#2) */ > > Index: linux-2.6.22-rc-mm/arch/x86_64/kernel/setup.c > > === > > --- linux-2.6.22-rc-mm.orig/arch/x86_64/kernel/setup.c > > +++ linux-2.6.22-rc-mm/arch/x86_64/kernel/setup.c > > @@ -949,7 +949,7 @@ static int show_cpuinfo(struct seq_file > > /* Other (Linux-defined) */ > > "cxmmx", NULL, "cyrix_arr", "centaur_mcr", NULL, > > "constant_tsc", NULL, NULL, > > - "up", NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > + "up", NULL, NULL, NULL, "ida", NULL, NULL, NULL, > > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, > > Ho hum. This clashes with hpa's git-newsetup tree, which goes for a great > tromp through the cpuinfo implementation. > Hmm.. Will move feature detection to setup routines and will also refresh the patch against latest mm and resend it Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 11:25:27PM +0200, Andi Kleen wrote: > On Thursday 24 May 2007 23:13:37 Venki Pallipadi wrote: > > On Thu, May 24, 2007 at 11:08:38PM +0200, Andi Kleen wrote: > > > > > > I think it's generally a good idea to push cpuinfo flags in earliest > > > as possible; just make sure we actually use the final name (so that we > > > don't get > > > into a pni->sse3 mess again) > > > > > > > ida is official name as in the Software Developer's Manual now. So, should > > not be a issue unless marketing folks change their mind in future :-) > > Well they did sometimes in the past. > > But actually reading the patch: it seems weird to detect the flag > in acpi-cpufreq and essentially change /proc/cpuinfo when a > module is loaded. Why not in the intel setup function? And why is it > not in the standard CPUID 1 features mask anyways? > I can do it in intel setup function. But, the feature may not be activated unless the driver is loaded. Going by the hardware capability point of view, we can do it in setup function. The feature appears in CPUID 6 (called Power Management Leaf) instead of regular CPUID 1 features. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 11:08:38PM +0200, Andi Kleen wrote: > > I think it's generally a good idea to push cpuinfo flags in earliest > as possible; just make sure we actually use the final name (so that we don't > get > into a pni->sse3 mess again) > ida is official name as in the Software Developer's Manual now. So, should not be a issue unless marketing folks change their mind in future :-) Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Display Intel Dynamic Acceleration feature in /proc/cpuinfo
On Thu, May 24, 2007 at 05:01:04PM -0400, Dave Jones wrote: > On Thu, May 24, 2007 at 01:55:13PM -0700, Andrew Morton wrote: > > On Wed, 23 May 2007 15:46:37 -0700 > > Venki Pallipadi <[EMAIL PROTECTED]> wrote: > > > > > Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature > > > will be enabled automatically by current acpi-cpufreq driver and cpufreq. > > > > So you're saying that the cpufreq code in Linus's tree aleady supports IDA? > > If so, this is a 2.6.22 patch, isn't it? > > From my limited understanding[*], ida is the "We're single threaded, > disable the 2nd core, and clock the first core faster" magic. > It doesn't need code-changes, as its all done in hardware afaik. IDA state will appear as a new highest freq P-state (P0) and when software requests that frequency, hardware can provide a higher frequency than that oppurtunistically and transparently. The current cpufreq code will detect this new state and enter that state when CPU is busy. > > identifying & exporting the flags on earlier kernels should be harmless, > but not really 'mustfix'. > Agree with Dave that it is not a mustfix. As the patch is pretty harmless would be nice to have in 2.6.22. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/