Re: [PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value
On Thursday, October 15, 2015 07:34:15 AM Prarit Bhargava wrote: > Rafael, as requested ... > > [v2]: rebased on > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git > bleeding-edge > > ^^^ prolly doesn't need to be in the commit log > > P. > > ---8<--- > > On systems that initialize the intel_pstate driver with the performance > governor, and then switch to the powersave governor will not transition to > lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct > is set to a low value. > > The behavior of governor switching changed after commit a04759924e25 > ("[cpufreq] intel_pstate: honor user space min_perf_pct override on > resume"). The commit introduced tracking of performance percentage > changes via sysfs in order to restore userspace changes during > suspend/resume. The problem occurs because the global values of the newly > introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor > change and this causes the powersave governor to inherit the performance > governor's settings. > > A simple change would have been to reset max_sysfs_pct to 100 and > min_sysfs_pct to 0 on a governor change, which fixes the problem with > governor switching. However, since we cannot break userspace[1] the fix > is now to give each governor its own limits storage area so that governor > specific changes are tracked. > > I successfully tested this by booting with both the performance governor > and the powersave governor by default, and switching between the two > governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values, > and looking at the output of cpupower frequency-info). Suspend/Resume > testing was performed by Doug Smythies. > > [1] Systems which suspend/resume using the unmaintained pm-utils package > will always transition to the performance governor before the suspend and > after the resume. This means a system using the powersave governor will > go from powersave to performance, then suspend/resume, performance to > powersave. The simple change during governor changes would have been > overwritten when the governor changed before and after the suspend/resume. > I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225 > against Fedora to remove the 94cpufreq file that causes the problem. It > should be noted that pm-utils is obsoleted with newer versions of systemd. > > Cc: Kristen Carlson Accardi > Cc: "Rafael J. Wysocki" > Cc: Viresh Kumar > Cc: linux...@vger.kernel.org > Cc: Doug Smythies > Signed-off-by: Prarit Bhargava Applied, thanks! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value
On Thursday, October 15, 2015 07:34:15 AM Prarit Bhargava wrote: > Rafael, as requested ... > > [v2]: rebased on > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git > bleeding-edge > > ^^^ prolly doesn't need to be in the commit log > > P. > > ---8<--- > > On systems that initialize the intel_pstate driver with the performance > governor, and then switch to the powersave governor will not transition to > lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct > is set to a low value. > > The behavior of governor switching changed after commit a04759924e25 > ("[cpufreq] intel_pstate: honor user space min_perf_pct override on > resume"). The commit introduced tracking of performance percentage > changes via sysfs in order to restore userspace changes during > suspend/resume. The problem occurs because the global values of the newly > introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor > change and this causes the powersave governor to inherit the performance > governor's settings. > > A simple change would have been to reset max_sysfs_pct to 100 and > min_sysfs_pct to 0 on a governor change, which fixes the problem with > governor switching. However, since we cannot break userspace[1] the fix > is now to give each governor its own limits storage area so that governor > specific changes are tracked. > > I successfully tested this by booting with both the performance governor > and the powersave governor by default, and switching between the two > governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values, > and looking at the output of cpupower frequency-info). Suspend/Resume > testing was performed by Doug Smythies. > > [1] Systems which suspend/resume using the unmaintained pm-utils package > will always transition to the performance governor before the suspend and > after the resume. This means a system using the powersave governor will > go from powersave to performance, then suspend/resume, performance to > powersave. The simple change during governor changes would have been > overwritten when the governor changed before and after the suspend/resume. > I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225 > against Fedora to remove the 94cpufreq file that causes the problem. It > should be noted that pm-utils is obsoleted with newer versions of systemd. > > Cc: Kristen Carlson Accardi> Cc: "Rafael J. Wysocki" > Cc: Viresh Kumar > Cc: linux...@vger.kernel.org > Cc: Doug Smythies > Signed-off-by: Prarit Bhargava Applied, thanks! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value
Rafael, as requested ... [v2]: rebased on git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge ^^^ prolly doesn't need to be in the commit log P. ---8<--- On systems that initialize the intel_pstate driver with the performance governor, and then switch to the powersave governor will not transition to lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. The behavior of governor switching changed after commit a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume"). The commit introduced tracking of performance percentage changes via sysfs in order to restore userspace changes during suspend/resume. The problem occurs because the global values of the newly introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor change and this causes the powersave governor to inherit the performance governor's settings. A simple change would have been to reset max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor change, which fixes the problem with governor switching. However, since we cannot break userspace[1] the fix is now to give each governor its own limits storage area so that governor specific changes are tracked. I successfully tested this by booting with both the performance governor and the powersave governor by default, and switching between the two governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values, and looking at the output of cpupower frequency-info). Suspend/Resume testing was performed by Doug Smythies. [1] Systems which suspend/resume using the unmaintained pm-utils package will always transition to the performance governor before the suspend and after the resume. This means a system using the powersave governor will go from powersave to performance, then suspend/resume, performance to powersave. The simple change during governor changes would have been overwritten when the governor changed before and after the suspend/resume. I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225 against Fedora to remove the 94cpufreq file that causes the problem. It should be noted that pm-utils is obsoleted with newer versions of systemd. Cc: Kristen Carlson Accardi Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux...@vger.kernel.org Cc: Doug Smythies Signed-off-by: Prarit Bhargava --- drivers/cpufreq/intel_pstate.c | 142 - 1 file changed, 85 insertions(+), 57 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index c568226..cde38c8 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -167,7 +167,20 @@ struct perf_limits { int min_perf_ctl; }; -static struct perf_limits limits = { +static struct perf_limits performance_limits = { + .no_turbo = 0, + .turbo_disabled = 0, + .max_perf_pct = 100, + .max_perf = int_tofp(1), + .min_perf_pct = 100, + .min_perf = int_tofp(1), + .max_policy_pct = 100, + .max_sysfs_pct = 100, + .min_policy_pct = 0, + .min_sysfs_pct = 0, +}; + +static struct perf_limits powersave_limits = { .no_turbo = 0, .turbo_disabled = 0, .max_perf_pct = 100, @@ -182,6 +195,12 @@ static struct perf_limits limits = { .min_perf_ctl = 0, }; +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE +static struct perf_limits *limits = _limits; +#else +static struct perf_limits *limits = _limits; +#endif + #if IS_ENABLED(CONFIG_ACPI) /* * The max target pstate ratio is a 8 bit value in both PLATFORM_INFO MSR and @@ -256,7 +275,7 @@ static int intel_pstate_init_perf_limits(struct cpufreq_policy *policy) if (turbo_pss_ctl <= cpu->pstate.max_pstate && turbo_pss_ctl > cpu->pstate.min_pstate) { pr_debug("intel_pstate: no turbo range exists in _PSS\n"); - limits.no_turbo = limits.turbo_disabled = 1; + limits->no_turbo = limits->turbo_disabled = 1; cpu->pstate.turbo_pstate = cpu->pstate.max_pstate; turbo_absent = true; } @@ -415,7 +434,7 @@ static inline void update_turbo_state(void) cpu = all_cpu_data[0]; rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); - limits.turbo_disabled = + limits->turbo_disabled = (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE || cpu->pstate.max_pstate == cpu->pstate.turbo_pstate); } @@ -434,14 +453,14 @@ static void intel_pstate_hwp_set(void) for_each_online_cpu(cpu) { rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, ); - adj_range = limits.min_perf_pct * range / 100; + adj_range = limits->min_perf_pct * range / 100; min = hw_min + adj_range; value &= ~HWP_MIN_PERF(~0L); value |= HWP_MIN_PERF(min); - adj_range = limits.max_perf_pct * range /
[PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value
Rafael, as requested ... [v2]: rebased on git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge ^^^ prolly doesn't need to be in the commit log P. ---8<--- On systems that initialize the intel_pstate driver with the performance governor, and then switch to the powersave governor will not transition to lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct is set to a low value. The behavior of governor switching changed after commit a04759924e25 ("[cpufreq] intel_pstate: honor user space min_perf_pct override on resume"). The commit introduced tracking of performance percentage changes via sysfs in order to restore userspace changes during suspend/resume. The problem occurs because the global values of the newly introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor change and this causes the powersave governor to inherit the performance governor's settings. A simple change would have been to reset max_sysfs_pct to 100 and min_sysfs_pct to 0 on a governor change, which fixes the problem with governor switching. However, since we cannot break userspace[1] the fix is now to give each governor its own limits storage area so that governor specific changes are tracked. I successfully tested this by booting with both the performance governor and the powersave governor by default, and switching between the two governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values, and looking at the output of cpupower frequency-info). Suspend/Resume testing was performed by Doug Smythies. [1] Systems which suspend/resume using the unmaintained pm-utils package will always transition to the performance governor before the suspend and after the resume. This means a system using the powersave governor will go from powersave to performance, then suspend/resume, performance to powersave. The simple change during governor changes would have been overwritten when the governor changed before and after the suspend/resume. I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225 against Fedora to remove the 94cpufreq file that causes the problem. It should be noted that pm-utils is obsoleted with newer versions of systemd. Cc: Kristen Carlson AccardiCc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux...@vger.kernel.org Cc: Doug Smythies Signed-off-by: Prarit Bhargava --- drivers/cpufreq/intel_pstate.c | 142 - 1 file changed, 85 insertions(+), 57 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index c568226..cde38c8 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -167,7 +167,20 @@ struct perf_limits { int min_perf_ctl; }; -static struct perf_limits limits = { +static struct perf_limits performance_limits = { + .no_turbo = 0, + .turbo_disabled = 0, + .max_perf_pct = 100, + .max_perf = int_tofp(1), + .min_perf_pct = 100, + .min_perf = int_tofp(1), + .max_policy_pct = 100, + .max_sysfs_pct = 100, + .min_policy_pct = 0, + .min_sysfs_pct = 0, +}; + +static struct perf_limits powersave_limits = { .no_turbo = 0, .turbo_disabled = 0, .max_perf_pct = 100, @@ -182,6 +195,12 @@ static struct perf_limits limits = { .min_perf_ctl = 0, }; +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE +static struct perf_limits *limits = _limits; +#else +static struct perf_limits *limits = _limits; +#endif + #if IS_ENABLED(CONFIG_ACPI) /* * The max target pstate ratio is a 8 bit value in both PLATFORM_INFO MSR and @@ -256,7 +275,7 @@ static int intel_pstate_init_perf_limits(struct cpufreq_policy *policy) if (turbo_pss_ctl <= cpu->pstate.max_pstate && turbo_pss_ctl > cpu->pstate.min_pstate) { pr_debug("intel_pstate: no turbo range exists in _PSS\n"); - limits.no_turbo = limits.turbo_disabled = 1; + limits->no_turbo = limits->turbo_disabled = 1; cpu->pstate.turbo_pstate = cpu->pstate.max_pstate; turbo_absent = true; } @@ -415,7 +434,7 @@ static inline void update_turbo_state(void) cpu = all_cpu_data[0]; rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); - limits.turbo_disabled = + limits->turbo_disabled = (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE || cpu->pstate.max_pstate == cpu->pstate.turbo_pstate); } @@ -434,14 +453,14 @@ static void intel_pstate_hwp_set(void) for_each_online_cpu(cpu) { rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, ); - adj_range = limits.min_perf_pct * range / 100; + adj_range = limits->min_perf_pct * range / 100; min = hw_min + adj_range; value &=