Just wanna move forward for this work, here is a PSARC onepager,
Any inputs
are really appreciated!
Thanks,
-Aubrey
======== system-pm-policy_onepager_v1.txt
=================================
Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
1. Introduction
1.1. Project/Component Working Name:
system-pm-policy keyword
1.2. Name of Document Author/Supplier:
Author: Aubrey Li<[email protected]>
1.3. Date of This Document:
April 28 , 2010
2. Project Summary
2.1. Project Description:
Solaris support for the system-pm-policy keyword in
power.conf(4).
A mechanism is desired to set system wide power
performance bias.
2.2. Risks and Assumptions:
Very few customers will use this keyword. Most customers
will desire
power performance balanced policy to be the default.
4. Technical Description:
4.1. Details:
pmconfig(1M) parses /etc/power.conf, if the
system-pm-policy keword
is in power.conf(4), it passes the user preferred policy
to the kernel
thru pm_ioctl by the command PM_SET_SYSTEM_POLICY.
pm_ioctl() then
calls pm_set_system_policy() to set the global policy
variable and
calls the power managable modules to pass the policy down.
Currently pm_set_system_policy() only set the CPU power
management
policy, and could set memory and other devices power
management policy
in future. CPU pm policy setting is machine specific.
CPU has a few power management features, like C-state,
P-state, energy
performance bias etc. Every CPU pm feature which wants to
inherit the
system-pm-policy will register its callback function to a
list, when
pmconfig passes the policy to the kernel, the kernel will
walk the list
to call the callback function and hence set the user
perferred policy
to the different modules.
/etc/power.conf may have [system-pm-policy<value>]
|
v
pmconfig
|
v
pm_ioctl(PM_SET_SYSTEM_POLICY, policy)
|
v
pm_set_system_policy(policy)
|
----> CPU pm policy callback
| |
| ----> registered CPU pm feature 1
callback(ENERGY_PERF_BIAS)
| |
| ----> ...
|
----> Memory pm policy callback in future
|
----> ...
Power performance balanced policy will be set by default,
this keeps the
current out-of-box setting unchanged. The system which has
extreme
performance requirements could disable the power
management features by
performance bias setting. If laptop runs on a battery, or
the system in
the low utilization prefers power than performance,
system-pm-policy could
be set to power bias and save more power, this could lead
to the lowest
CPU clock and always deepest idle state.
Different power manageable devices could inherit the
system wide policy
completely, or they can maintain a specific pm policy
themselves but the
system wide policy must be the biggest weight coefficient
to their own
mechanism.
4.2. Bug/RFE Number(s): xxxxxxx
4.5. Interfaces:
This project will import these existing interfaces.
Interface stability will be "committed".
Import:
power.conf(4) (PSARC/1992/202)
pmconfig(1m)
Export:
system-pm-policy
system-pm-policy keyword.
A system-pm-policy entry can be added to power.conf(4) to
set the system
wide power policy. If this entry is present and set to
default or it is
not present then the default balanced policy will be used,
this keeps the
current behavior unchanged. The other options will tune
the policy to power
bias or performance bias.
power.conf(4) man page addition:
a system-pm-policy may be used to set system wide power
policy. The format
of the system-pm-policy entry is system-pm-policy policy.
Acceptable policy values are:
default Power performance balanced policy.
perf-bias The system drives to maximum performance at any
energy cost.
balanced Balanced performance vs. power and energy
power-bias Max energy efficient.
absent If the system-pm-policy keyword is absent from
power.conf(4),
the behavior is the same as the default case.
4.6. Doc Impact:
power.conf man page. See above.
4.7. Admin/Config Impact:
Administrators of systems can use this option to match the
different power
performance requirement.
4.8. HA Impact: None.
4.9. I18N/L10N Impact: No.
4.10. Packaging& Delivery:
This change will be delivered as part of the Deep C-State
RFE.
These changes will be made at the same time:
kernel package
power.conf package
pmconfig package
4.11. Security Impact: None.
4.12. Dependencies: power.conf, pmconfig(1M)
6. Resources and Schedule:
6.1. Projected Availability: April 2010
6.4. Product Approval Committee requested information:
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
7. Prototype Availability:
7.1. Prototype Availability:
Prototype available on OpenSolaris in April 2010.
===================================================================================
Li, Aubrey wrote:
Hi Bill,
Here I made a change to propose system-wide policy support.
http://cr.opensolaris.org/~aubrey/sys_pm_policy_v1/
The user profile from /etc/power.conf is still passed to the kernel
thru pm_ioctl, then call pm_set_system_policy(). Currently there
is only
cpu pm policy setting there, if memory/other devices need a bias
as well,
they can also be added to that function.
cpu pm policy related implementation has minor change against last
webrev,
mcpu_pm_policy pointer has been moved from machcpu to
mcpu_pm_mach_state
structure according to your suggestion.
Any comments and suggestions are highly appreciated.
Thanks,
-Aubrey
Li, Aubrey wrote:
It looks like memory PM need such a bias as well. So I'd like to
change
the proposal to use the keyword "sys-pm-policy" instead. The
mechanism
will use the existing callb implementation to pass the user
policy from
/etc/power.conf to the kernel and walk the module registered list to
call
module hook function to set the pm policy individually.
I'm not sure if any other device driver need or be happy with this
proposal.
It would be great if the device driver developer can share some
thoughts
here.
Thanks,
-Aubrey
Julia.Harper wrote:
I assume that this knob (profile) when turned way down would
basically
put the
system into "power savings" mode -- where the set of power
states is
restricted.
That is, no matter how long the utilization level demands more
power,
the
highest power states (for the cpus, memory, whatever) will never be
entered. We
should probably use terminology that makes this clear.
-- jdh
Liu, Jiang wrote:
I prefer the solution to introduce a global power profile for all
devices. Currently
we need such a profile for CPUPM. In future when supporting memory
power
management, we may need a similiar profile for memory PM. And user
won't
like two variables/profiles for the same objective.
Li, Aubrey<> wrote:
Bill Holler wrote:
Hi,
I forgot to mention that cpu_pm_policy is just a policy.
There is no guaranty it maps to a specific MSR or hardware
implementation.
Yes, I would like to propose a new option for CPU power
management
policy. This policy is a CPU bias between performance and power,
the
future CPU power management enhancement work can be based on this
policy. - the default policy should keep the current "out of the
box"
behavior unchanged, we'll try to save more power without
performance
hurt.
- there will be more power management futures coming on the
future
processor, like ENERGY_PERFORMANCE_BIAS, we can register these
new
futures under the policy framework, and offer a knob to the
user to
change these settings on the fly.
- laptop users who want to prolong the battery life and less heat
and
smaller fan noise may want the system to work in some edge
situation:
for example, currently CPU can work in the highest clock if cpupm
is
disabled, but no choice to let CPU always work in the lowest
clock.
Similarly, Always enter deepest c-state is another choice to save
more power. What's more, power aware dispatcher could be more
flexible to pick up CPU and dispatch thread if there is a policy
indicator. - Some users doesn't care about power. Yes, we already
have the options to let them to set ENERGY_PERFORMANCE_BIAS to be
performance bias, to close c-state/p-state, and so on and so
forth.
But it's more friendly to the user to just change only one
option.
Here, the policy only focus on CPU. If you think we should have a
policy for the memory, for the devices, or we should have a
system-wide policy, let's do this. cpu_pm_policy can be one
part of
system-wide policy.
If nobody have thoughts on it, I'll continue to prepare a PSARC
file
to add cpu_pm_policy keyword.
For example Solaris could be dynamically setting the
ENERGY_PERFORMANCE_BIAS register to different settings depending
on
things such as system-load,
Yes, such of these settings can be dynamically changed if we see
the
benefit.
the priority of the application being scheduled, a power
policy of
the application,
Making the thread power aware need another bunch of interfaces I
think. For example, cmt_balance() can choose the different
processor
group according to the perf/power bias of the thread.
or power policy of the zone.
Zone policy is an interesting topic. Different zone could have
different CPU resource, or can share the global CPU resource,
different zone could have different power policy, or they can
inherit
the global cpu_pm_policy setting. The virtual container could
have
many, but the hardware resource is unique. I think this can be
enhanced in the zone management, which will not be covered in my
proposal, :)
Thanks,
-Aubrey
Regards,
Bill
On 03/03/10 16:21, Bill Holler wrote:
+1.
Hi Aubrey,
I also think it is time to move forward with this proposal.
Generally we want the system to work best "out of the box"
with no tuning. On the other hand, vendors will keep improving
products with new features, and there will always be some
specific
applications were custom settings may be better. I feel this
proposal supports innovation and application specific
customization
in line with the OpenSolaris community goals.
This proposal applies to all types of CPUs. It uses
"cpu_pm_policy"
instead of for example mentioning a specific CPU's MSR. ;-)
This
proposal will be useful with other CPUs if/when they have
hardware
mechanisms for tuning power / performance.
In the arc case we want to mention that there could be a policy
conflict between this component setting and a
system-power-policy,
external Power Caping, etc. Generally we want users to use the
default or a higher level policy such as the system power
policy.
Unfortunately the system power policy may not be fine-grain or
diverse enough for some applications to specify cpu power
policy.
In that case cpu_pm_policy will be useful. My thought is: the
user
must really know what they want if they specify a component
policy
such as cpu_pm_policy instead of just using the system power
policy. For that reason I feel cpu_pm_policy should
override the
system-power-policy at the cpupm level.
Power Caping is different. Power Capping is an external
policy.
It
is currently "owned" by the SP external to the OS. Power
Caping
should override a local cpu_pm_policy.
Implementation comments:
IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state
structure instead of in the machcpu.
We may want to allow the user to specify a number instead of
just
Perf, Balanced, Power, Default?
Regards,
Bill
On 02/20/10 18:43, Li, Aubrey wrote:
Hi Bill,
I think it's time to continue this proposal, since b134 is
closed
and the build is not limited now. power/perf bias setting is a
start point for future power related work, I'll prepare a
PSARC
file for the new option if this is acceptable. No is also a
good
answer with good reason.
Thanks,
-Aubrey
Bill.Holler Wrote:
Hi,
This proposal is for a mechanism to set the new MSR
IA32_ENERGY_PERF_BIAS_MSR. This is a new hardware
feature. The MSR effects overall power/performance.
It gives a hint to the processor& package for desired
power/performance characteristics. It is related to
p-states
and
c-states (and may effect these features), but this
feature can
have other socket/system-level effects as well.
The programmers guides do not go into details what the other
effects can be. :-(
The perf and power impact of this MSR is model specific.
It's able to throttle turbo on WSM and probably help to do
more
hardware decision in future. For example, when the short
interrupt
storm is detected, it can demote CC6 request to CC3.
On 11/05/09 05:15, minskey guo wrote:
Jedy Wang ??:
Hi Li,
As far as I know, gnome-power-manager has removed the
support
for changing governor which is the same as profile I
think.
I
remember someone wrote a blog explaining the reason but I
can
not find it now.
I
wonder why what makes us still need to implement this
feature.
In linux world, there is ondemand governor in kernel. It
sets
cpu freqency according to cpu's current load. So, somebody
consider that
eveybody
should use that governor, and let CPUs finish their jobs
asap
and
then
enter
into C states for power-saving. Comparing to P state,
c-state
does
save
more power. That's why gnome removed it.
This is also model specific and depends on if the
frequency and
voltage and power are linear. That's true on latest processor
but
not on earlier processor.
I'm not sure why gnome removed it, but seems not a good
idea to
me. Some users want max perf and others want longer battery
life.
Yes, a good p-state + c-state implementation is not easy to
tune
for more power savings. Running in lower p-states when a
CPU
is
busy burns more power due to shorter time in deeper
C-states.
Entering deeper C-states too aggressively also burns more
power
(on both an idle and busy system) due to unnecessary wakeup
latency. ;-) Without knowing the details, it seems likely
that
the gnome-power-manager was removed because setting it made
worse
decisions than a runtime prediction.
Solaris currently has mechanisms to turn P-state and deeper
C-state support on/off.
A requirement is that the Energy Perf Bias MSR can be set on
systems not running a GUI. We would like to support a
possible
future Gnome interface to set this MSR if/when it
exists. The
proposal provides a mechanism that works on systems without
Gnome.
Right, most of servers do not run gnome. I don't expect gnome
support but it would be great if it will, :-)
IMHO, we should use this global cpu power policy setting
instead
of "cpupm" and "cpu-deep-idle", this is more friendly to the
user. The users just want more perf or more power, I think
they
don't care if the system support p/c- state at the same time.
"cpupm" is a confusion only for p-state. we call "cpupm"
before
we have deep idle support. Actually cpu-deep-idle is also one
part of cpu power management, :)
but, someone doesn't care power-saving, when comparing
it to
other factors. For example, if you are plagued by the noise
of
CPU fan,
and
expect quiet it then you can lower cpu frequency, which
results
in lower heat, and then fan can be stopped.
personally, I vote +1 for this project if I could vote,
but I
don't
like
the names of "perf-bias" etc :)
Besides, can somebody tell me where
IA32_ENERGY_PERF_BIAS_MSR
comes ? Is it a part of IPS feature ?
Intel's Software Developer's Manuals 2A describes CPUID
detection
of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the
MSR.
http://www.intel.com/products/processor/manuals/
Sorry, I do not know what IPS stands for?
cough, cough, IPS is not a released feature and should not be
discussed here, ;p
Thanks,
-Aubrey
Regards,
Bill
-minskey
I remember why already support 2 profile through gnome-
power-
manager
on
Solaris. What's the difference between them?
I do not understand the exact meaning perf-bias, balanced
and
power-
bias
either. Does not perf-bias means the cpu frequency will be
always
at
the
highest level?
Regards,
Jedy
On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
Hi,
When we enable intel energy performance bias feature, we
found the power profile implementation is necessary.
Here I
did a draft for cpu level power policy.
http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
The proposal added a new keyword to /etc/power.conf
"cpu-power-policy", And we have 4 options for this new
keyword: 1) perf-bias 2) balanced
3) power-bias
4) default, the same as perf-bias.
/etc/power.conf accepts the user input and passes the
prefered
policy
to the kernel thru ioctl. Then pm_ioctl calls the
callback
to
walk
a
cpu
power policy list. Every cpu pm feature which wants to be
adjusted
by
this option and verified to be supported will register
its
callback function to the list, so that it can be
called and
adjusted by pmconfig.
-------------------------------------------------------
-
/etc/power.conf | pm_ioctl(cpu_power_policy, policy)
|
cpu_power_policy_callb (policy)
|
----> registered pm feature callback 1
(ENERGY_PERF_BIAS)
|
----> registered pm feature callback 2
...
---------------------------------------------------------
Currently, only energy_perf_bias feature is registered,
because my intention is to support adjusting
energy_perf_bias
MSR without reboot. I guess
we
probably
can add p/t/c-state support later. When we add
p/t/c-state
support, my quick thought is, this option will override
"cpupm" and "cpu-deep-idle" setting.
Welcome your any comments and suggestions.
Thanks,
-Aubrey
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
tesla-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/tesla-dev
Liu Jiang (Gerry)
OpenSolaris, OTC, SSG, Intel
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
--
---------------------
Julia Harper, [email protected]
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss