Re: [PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value

2015-10-19 Thread Rafael J. Wysocki
On Thursday, October 15, 2015 07:34:15 AM Prarit Bhargava wrote:
> Rafael, as requested ...
> 
> [v2]: rebased on
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
> bleeding-edge
> 
> ^^^ prolly doesn't need to be in the commit log
> 
> P.
> 
> ---8<---
> 
> On systems that initialize the intel_pstate driver with the performance
> governor, and then switch to the powersave governor will not transition to
> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
> is set to a low value.
> 
> The behavior of governor switching changed after commit a04759924e25
> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>  resume").  The commit introduced tracking of performance percentage
> changes via sysfs in order to restore userspace changes during
> suspend/resume.  The problem occurs because the global values of the newly
> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
> change and this causes the powersave governor to inherit the performance
> governor's settings.
> 
> A simple change would have been to reset max_sysfs_pct to 100 and
> min_sysfs_pct to 0 on a governor change, which fixes the problem with
> governor switching.  However, since we cannot break userspace[1] the fix
> is now to give each governor its own limits storage area so that governor
> specific changes are tracked.
> 
> I successfully tested this by booting with both the performance governor
> and the powersave governor by default, and switching between the two
> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
> and looking at the output of cpupower frequency-info).  Suspend/Resume
> testing was performed by Doug Smythies.
> 
> [1] Systems which suspend/resume using the unmaintained pm-utils package
> will always transition to the performance governor before the suspend and
> after the resume.  This means a system using the powersave governor will
> go from powersave to performance, then suspend/resume, performance to
> powersave.  The simple change during governor changes would have been
> overwritten when the governor changed before and after the suspend/resume.
> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
> against Fedora to remove the 94cpufreq file that causes the problem.  It
> should be noted that pm-utils is obsoleted with newer versions of systemd.
> 
> Cc: Kristen Carlson Accardi 
> Cc: "Rafael J. Wysocki" 
> Cc: Viresh Kumar 
> Cc: linux...@vger.kernel.org
> Cc: Doug Smythies 
> Signed-off-by: Prarit Bhargava 

Applied, thanks!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value

2015-10-19 Thread Rafael J. Wysocki
On Thursday, October 15, 2015 07:34:15 AM Prarit Bhargava wrote:
> Rafael, as requested ...
> 
> [v2]: rebased on
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
> bleeding-edge
> 
> ^^^ prolly doesn't need to be in the commit log
> 
> P.
> 
> ---8<---
> 
> On systems that initialize the intel_pstate driver with the performance
> governor, and then switch to the powersave governor will not transition to
> lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
> is set to a low value.
> 
> The behavior of governor switching changed after commit a04759924e25
> ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
>  resume").  The commit introduced tracking of performance percentage
> changes via sysfs in order to restore userspace changes during
> suspend/resume.  The problem occurs because the global values of the newly
> introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
> change and this causes the powersave governor to inherit the performance
> governor's settings.
> 
> A simple change would have been to reset max_sysfs_pct to 100 and
> min_sysfs_pct to 0 on a governor change, which fixes the problem with
> governor switching.  However, since we cannot break userspace[1] the fix
> is now to give each governor its own limits storage area so that governor
> specific changes are tracked.
> 
> I successfully tested this by booting with both the performance governor
> and the powersave governor by default, and switching between the two
> governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
> and looking at the output of cpupower frequency-info).  Suspend/Resume
> testing was performed by Doug Smythies.
> 
> [1] Systems which suspend/resume using the unmaintained pm-utils package
> will always transition to the performance governor before the suspend and
> after the resume.  This means a system using the powersave governor will
> go from powersave to performance, then suspend/resume, performance to
> powersave.  The simple change during governor changes would have been
> overwritten when the governor changed before and after the suspend/resume.
> I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
> against Fedora to remove the 94cpufreq file that causes the problem.  It
> should be noted that pm-utils is obsoleted with newer versions of systemd.
> 
> Cc: Kristen Carlson Accardi 
> Cc: "Rafael J. Wysocki" 
> Cc: Viresh Kumar 
> Cc: linux...@vger.kernel.org
> Cc: Doug Smythies 
> Signed-off-by: Prarit Bhargava 

Applied, thanks!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value

2015-10-15 Thread Prarit Bhargava
Rafael, as requested ...

[v2]: rebased on
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge

^^^ prolly doesn't need to be in the commit log

P.

---8<---

On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will not transition to
lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
is set to a low value.

The behavior of governor switching changed after commit a04759924e25
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
 resume").  The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume.  The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
change and this causes the powersave governor to inherit the performance
governor's settings.

A simple change would have been to reset max_sysfs_pct to 100 and
min_sysfs_pct to 0 on a governor change, which fixes the problem with
governor switching.  However, since we cannot break userspace[1] the fix
is now to give each governor its own limits storage area so that governor
specific changes are tracked.

I successfully tested this by booting with both the performance governor
and the powersave governor by default, and switching between the two
governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
and looking at the output of cpupower frequency-info).  Suspend/Resume
testing was performed by Doug Smythies.

[1] Systems which suspend/resume using the unmaintained pm-utils package
will always transition to the performance governor before the suspend and
after the resume.  This means a system using the powersave governor will
go from powersave to performance, then suspend/resume, performance to
powersave.  The simple change during governor changes would have been
overwritten when the governor changed before and after the suspend/resume.
I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
against Fedora to remove the 94cpufreq file that causes the problem.  It
should be noted that pm-utils is obsoleted with newer versions of systemd.

Cc: Kristen Carlson Accardi 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Cc: Doug Smythies 
Signed-off-by: Prarit Bhargava 
---
 drivers/cpufreq/intel_pstate.c | 142 -
 1 file changed, 85 insertions(+), 57 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c568226..cde38c8 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -167,7 +167,20 @@ struct perf_limits {
int min_perf_ctl;
 };
 
-static struct perf_limits limits = {
+static struct perf_limits performance_limits = {
+   .no_turbo = 0,
+   .turbo_disabled = 0,
+   .max_perf_pct = 100,
+   .max_perf = int_tofp(1),
+   .min_perf_pct = 100,
+   .min_perf = int_tofp(1),
+   .max_policy_pct = 100,
+   .max_sysfs_pct = 100,
+   .min_policy_pct = 0,
+   .min_sysfs_pct = 0,
+};
+
+static struct perf_limits powersave_limits = {
.no_turbo = 0,
.turbo_disabled = 0,
.max_perf_pct = 100,
@@ -182,6 +195,12 @@ static struct perf_limits limits = {
.min_perf_ctl = 0,
 };
 
+#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
+static struct perf_limits *limits = _limits;
+#else
+static struct perf_limits *limits = _limits;
+#endif
+
 #if IS_ENABLED(CONFIG_ACPI)
 /*
  * The max target pstate ratio is a 8 bit value in both PLATFORM_INFO MSR and
@@ -256,7 +275,7 @@ static int intel_pstate_init_perf_limits(struct 
cpufreq_policy *policy)
if (turbo_pss_ctl <= cpu->pstate.max_pstate &&
turbo_pss_ctl > cpu->pstate.min_pstate) {
pr_debug("intel_pstate: no turbo range exists in _PSS\n");
-   limits.no_turbo = limits.turbo_disabled = 1;
+   limits->no_turbo = limits->turbo_disabled = 1;
cpu->pstate.turbo_pstate = cpu->pstate.max_pstate;
turbo_absent = true;
}
@@ -415,7 +434,7 @@ static inline void update_turbo_state(void)
 
cpu = all_cpu_data[0];
rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
-   limits.turbo_disabled =
+   limits->turbo_disabled =
(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
 }
@@ -434,14 +453,14 @@ static void intel_pstate_hwp_set(void)
 
for_each_online_cpu(cpu) {
rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, );
-   adj_range = limits.min_perf_pct * range / 100;
+   adj_range = limits->min_perf_pct * range / 100;
min = hw_min + adj_range;
value &= ~HWP_MIN_PERF(~0L);
value |= HWP_MIN_PERF(min);
 
-   adj_range = limits.max_perf_pct * range / 

[PATCH v2] cpufreq, intel_pstate, Fix intel_pstate powersave min_perf_pct value

2015-10-15 Thread Prarit Bhargava
Rafael, as requested ...

[v2]: rebased on
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge

^^^ prolly doesn't need to be in the commit log

P.

---8<---

On systems that initialize the intel_pstate driver with the performance
governor, and then switch to the powersave governor will not transition to
lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
is set to a low value.

The behavior of governor switching changed after commit a04759924e25
("[cpufreq] intel_pstate: honor user space min_perf_pct override on
 resume").  The commit introduced tracking of performance percentage
changes via sysfs in order to restore userspace changes during
suspend/resume.  The problem occurs because the global values of the newly
introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
change and this causes the powersave governor to inherit the performance
governor's settings.

A simple change would have been to reset max_sysfs_pct to 100 and
min_sysfs_pct to 0 on a governor change, which fixes the problem with
governor switching.  However, since we cannot break userspace[1] the fix
is now to give each governor its own limits storage area so that governor
specific changes are tracked.

I successfully tested this by booting with both the performance governor
and the powersave governor by default, and switching between the two
governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
and looking at the output of cpupower frequency-info).  Suspend/Resume
testing was performed by Doug Smythies.

[1] Systems which suspend/resume using the unmaintained pm-utils package
will always transition to the performance governor before the suspend and
after the resume.  This means a system using the powersave governor will
go from powersave to performance, then suspend/resume, performance to
powersave.  The simple change during governor changes would have been
overwritten when the governor changed before and after the suspend/resume.
I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
against Fedora to remove the 94cpufreq file that causes the problem.  It
should be noted that pm-utils is obsoleted with newer versions of systemd.

Cc: Kristen Carlson Accardi 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Cc: Doug Smythies 
Signed-off-by: Prarit Bhargava 
---
 drivers/cpufreq/intel_pstate.c | 142 -
 1 file changed, 85 insertions(+), 57 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c568226..cde38c8 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -167,7 +167,20 @@ struct perf_limits {
int min_perf_ctl;
 };
 
-static struct perf_limits limits = {
+static struct perf_limits performance_limits = {
+   .no_turbo = 0,
+   .turbo_disabled = 0,
+   .max_perf_pct = 100,
+   .max_perf = int_tofp(1),
+   .min_perf_pct = 100,
+   .min_perf = int_tofp(1),
+   .max_policy_pct = 100,
+   .max_sysfs_pct = 100,
+   .min_policy_pct = 0,
+   .min_sysfs_pct = 0,
+};
+
+static struct perf_limits powersave_limits = {
.no_turbo = 0,
.turbo_disabled = 0,
.max_perf_pct = 100,
@@ -182,6 +195,12 @@ static struct perf_limits limits = {
.min_perf_ctl = 0,
 };
 
+#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE
+static struct perf_limits *limits = _limits;
+#else
+static struct perf_limits *limits = _limits;
+#endif
+
 #if IS_ENABLED(CONFIG_ACPI)
 /*
  * The max target pstate ratio is a 8 bit value in both PLATFORM_INFO MSR and
@@ -256,7 +275,7 @@ static int intel_pstate_init_perf_limits(struct 
cpufreq_policy *policy)
if (turbo_pss_ctl <= cpu->pstate.max_pstate &&
turbo_pss_ctl > cpu->pstate.min_pstate) {
pr_debug("intel_pstate: no turbo range exists in _PSS\n");
-   limits.no_turbo = limits.turbo_disabled = 1;
+   limits->no_turbo = limits->turbo_disabled = 1;
cpu->pstate.turbo_pstate = cpu->pstate.max_pstate;
turbo_absent = true;
}
@@ -415,7 +434,7 @@ static inline void update_turbo_state(void)
 
cpu = all_cpu_data[0];
rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
-   limits.turbo_disabled =
+   limits->turbo_disabled =
(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE ||
 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
 }
@@ -434,14 +453,14 @@ static void intel_pstate_hwp_set(void)
 
for_each_online_cpu(cpu) {
rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, );
-   adj_range = limits.min_perf_pct * range / 100;
+   adj_range = limits->min_perf_pct * range / 100;
min = hw_min + adj_range;
value &=