Hi Bill,

Thanks to bring this forward to the power architects.

Bill Holler wrote:
>
>Hi Aubrey,
>
>+1. This looks great to me. We need it for many projects to allow
>the admin to specify the system should run more energy-efficient.
>
>We discussed this with the power architects, and they agree it looks
>good.
>One suggestion was to change "power" to "energy". For example
>they would like to see "power-bias" name changed to "energy-bias",
>and this should be called something like "system energy policy".
>That would help identify that this knob is for system *energy*
>efficiency.

This sounds good. I'll change it in the new version onepager.

>
>As a side note, saving energy is different from saving power.
>Power Capping (an externally imposed policy) takes effect when
>power has an increased cost or increased maginal-cost. For example:
>1. the power grid fails or is over-budgeted, 2. the server room has
>exceeded cooling capacity. Power Capping is different and has
>higher precedence than this system-level energy efficiency policy.

Definitely. I'll add this note to the PSARC file as well.
I'll post a new onepager according to the new SMF implementation of
tunable power option.

Thanks,
-Aubrey

>
>I think I covered the points the pm architects were concerned with?
>Sarito or Julia etc can comment if I left something out. :-)
>
>Regards,
>Bill
>
>
>On 04/01/10 01:43, Li, Aubrey wrote:
>> Randy Fishel wrote:
>>
>>>  This might be a bit contentious, as there not only is effort to
>>> migrate the configuration to SMF, there is a consideration to define
>>> something similar to system-pm-policy.  On the other hand, there also
>>> is lacking architecture and there doesn't seem to be much momentum in
>>> providing it.
>>>
>>>  I am also leaving for vacation on Friday morning.  I will take a
>>> printout with me in hopes of maybe reviewing it over the next week.
>>> It may also give others the opportunity to see how this might fit
>into
>>> the "new" architecture.
>>>
>>>  Cheers!
>>>
>>>       ---- Randy
>>>
>>
>> This was intended as cpu-pm-policy, a mechanism to provide a knob for
>the
>> user to tune the pm policy introduced by Intel Energy_Perf_Bias
>feature on
>> the fly. Currently Energy_Perf_Bias is set to be performance bias by
>default,
>> that means the power control unit in the processor will drive the
>processor
>> to the peak performance with any energy cost. This feature for example
>can
>> throttle turbo performance boost by setting a MSR to Power bias. In
>the near
>> future, the trend of silicon design is doing more and more in hardware,
>Package
>> /core C-state auto promotion or demotion, QPI link state, DRAM
>refreshing, etc
>> all will accept the hint from this feature.
>>
>> Besides this, as for CPU, we don't have an option to let the processor
>run at
>> the lowest frequency, or always run in the supported deepest idle
>state if in
>> idle. CMT_COALESCE dispatching policy is disabled in the kernel due to
>peak
>> performance hurt. But this policy helps to group the utilization onto
>one
>> package or even one core as possible. If we could group the
>utilization onto
>> one package in idle, that means the other packages can sleep longer
>and deeper,
>> and hence save more energy. These should be the momentum to prolong
>the battery
>> life or server not in the rush hour.
>>
>> Besides CPU, memory or other devices have the same situation. In the
>current
>> kernel, the memory power management driver FIPE has a default policy
>setting
>>                 fipe_pm_policy = FIPE_PM_POLICY_BALANCE
>> From the source, FIPE_PM_POLICY_POWERSAVE policy could save more power
>I think.
>> Sooner or later, DDR3 could have the same requirement if we implement
>power
>> management on it.
>>
>> Recently, I found USB EHCI driver is not friendly to idle power when I
>did a
>> power characterization analysis. EHCI driver keeps polling and making
>the host
>> controller to issue DMA read and write operations when there is no USB
>related
>> ops, or even when there is no USB device connected. This problem
>throttles the
>> package c-state and makes a big gap between solaris and other OSes.
>This might
>> not depend on the power/perf profile. But a profile could make the
>solution easy.
>>
>> I believe there are a few other cases I missed to give more momentum
>to introduce
>> a user profile for power performance bias, :)
>>
>> Thanks,
>> -Aubrey
>>
>>> On Thu, 1 Apr 2010, Li, Aubrey wrote:
>>>
>>>
>>>> Just wanna move forward for this work, here is a PSARC onepager, Any
>>>>
>>> inputs
>>>
>>>> are really appreciated!
>>>>
>>>> Thanks,
>>>> -Aubrey
>>>>
>>>> ======== system-pm-policy_onepager_v1.txt
>>>>
>>> =================================
>>>
>>>> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
>>>>
>>>> 1. Introduction
>>>>    1.1. Project/Component Working Name:
>>>>         system-pm-policy keyword
>>>>
>>>>    1.2. Name of Document Author/Supplier:
>>>>         Author: Aubrey Li <[email protected]>
>>>>
>>>>    1.3. Date of This Document:
>>>>         April 28 , 2010
>>>>
>>>> 2. Project Summary
>>>>    2.1. Project Description:
>>>>         Solaris support for the system-pm-policy keyword in
>>>>
>>> power.conf(4).
>>>
>>>>         A mechanism is desired to set system wide power performance
>>>>
>>> bias.
>>>
>>>>    2.2. Risks and Assumptions:
>>>>         Very few customers will use this keyword. Most customers
>will
>>>>
>>> desire
>>>
>>>>         power performance balanced policy to be the default.
>>>>
>>>> 4. Technical Description:
>>>>     4.1. Details:
>>>>
>>>>         pmconfig(1M) parses /etc/power.conf, if the system-pm-policy
>>>>
>>> keword
>>>
>>>>         is in power.conf(4), it passes the user preferred policy to
>>>>
>>> the kernel
>>>
>>>>         thru pm_ioctl by the command PM_SET_SYSTEM_POLICY. pm_ioctl()
>>>>
>>> then
>>>
>>>>         calls pm_set_system_policy() to set the global policy
>variable
>>>>
>>> and
>>>
>>>>         calls the power managable modules to pass the policy down.
>>>>
>>>>         Currently pm_set_system_policy() only set the CPU power
>>>>
>>> management
>>>
>>>>         policy, and could set memory and other devices power
>>>>
>>> management policy
>>>
>>>>         in future. CPU pm policy setting is machine specific.
>>>>
>>>>         CPU has a few power management features, like C-state, P-
>state,
>>>>
>>> energy
>>>
>>>>         performance bias etc. Every CPU pm feature which wants to
>>>>
>>> inherit the
>>>
>>>>         system-pm-policy will register its callback function to a
>list,
>>>>
>>> when
>>>
>>>>         pmconfig passes the policy to the kernel, the kernel will
>walk
>>>>
>>> the list
>>>
>>>>         to call the callback function and hence set the user
>perferred
>>>>
>>> policy
>>>
>>>>         to the different modules.
>>>>
>>>>         /etc/power.conf may have [system-pm-policy <value>]
>>>>           |
>>>>           v
>>>>         pmconfig
>>>>           |
>>>>           v
>>>>         pm_ioctl(PM_SET_SYSTEM_POLICY, policy)
>>>>           |
>>>>           v
>>>>         pm_set_system_policy(policy)
>>>>           |
>>>>           ----> CPU pm policy callback
>>>>           |     |
>>>>           |     ----> registered CPU pm feature 1
>>>>
>>> callback(ENERGY_PERF_BIAS)
>>>
>>>>           |     |
>>>>           |     ----> ...
>>>>           |
>>>>           ----> Memory pm policy callback in future
>>>>           |
>>>>           ----> ...
>>>>
>>>>
>>>>         Power performance balanced policy will be set by default,
>this
>>>>
>>> keeps the
>>>
>>>>         current out-of-box setting unchanged. The system which has
>>>>
>>> extreme
>>>
>>>>         performance requirements could disable the power management
>>>>
>>> features by
>>>
>>>>         performance bias setting. If laptop runs on a battery, or
>the
>>>>
>>> system in
>>>
>>>>         the low utilization prefers power than performance, system-
>pm-
>>>>
>>> policy could
>>>
>>>>         be set to power bias and save more power, this could lead to
>>>>
>>> the lowest
>>>
>>>>         CPU clock and always deepest idle state.
>>>>
>>>>         Different power manageable devices could inherit the system
>>>>
>>> wide policy
>>>
>>>>         completely, or they can maintain a specific pm policy
>>>>
>>> themselves but the
>>>
>>>>         system wide policy must be the biggest weight coefficient to
>>>>
>>> their own
>>>
>>>>         mechanism.
>>>>
>>>>
>>>>     4.2. Bug/RFE Number(s): xxxxxxx
>>>>
>>>>     4.5. Interfaces:
>>>>         This project will import these existing interfaces.
>>>>         Interface stability will be "committed".
>>>>
>>>>         Import:
>>>>                 power.conf(4) (PSARC/1992/202)
>>>>                 pmconfig(1m)
>>>>
>>>>         Export:
>>>>                 system-pm-policy
>>>>
>>>>         system-pm-policy keyword.
>>>>         A system-pm-policy entry can be added to power.conf(4) to
>set
>>>>
>>> the system
>>>
>>>>         wide power policy. If this entry is present and set to
>default
>>>>
>>> or it is
>>>
>>>>         not present then the default balanced policy will be used,
>>>>
>>> this keeps the
>>>
>>>>         current behavior unchanged. The other options will tune the
>>>>
>>> policy to power
>>>
>>>>         bias or performance bias.
>>>>
>>>>         power.conf(4) man page addition:
>>>>
>>>>         a system-pm-policy may be used to set system wide power
>policy.
>>>>
>>> The format
>>>
>>>>         of the system-pm-policy entry is system-pm-policy policy.
>>>>
>>>>      Acceptable policy values are:
>>>>
>>>>      default    Power performance balanced policy.
>>>>
>>>>      perf-bias  The system drives to maximum performance at any
>energy
>>>>
>>> cost.
>>>
>>>>      balanced   Balanced performance vs. power and energy
>>>>
>>>>      power-bias Max energy efficient.
>>>>
>>>>      absent     If the system-pm-policy keyword is absent from
>>>>
>>> power.conf(4),
>>>
>>>>                 the behavior is the same as the default case.
>>>>
>>>>     4.6. Doc Impact:
>>>>         power.conf man page.  See above.
>>>>
>>>>     4.7. Admin/Config Impact:
>>>>         Administrators of systems can use this option to match the
>>>>
>>> different power
>>>
>>>>         performance requirement.
>>>>
>>>>     4.8. HA Impact: None.
>>>>
>>>>     4.9. I18N/L10N Impact: No.
>>>>
>>>>     4.10. Packaging & Delivery:
>>>>         This change will be delivered as part of the Deep C-State
>RFE.
>>>>         These changes will be made at the same time:
>>>>                 kernel package
>>>>                 power.conf package
>>>>                 pmconfig package
>>>>
>>>>     4.11. Security Impact: None.
>>>>
>>>>     4.12. Dependencies: power.conf, pmconfig(1M)
>>>>
>>>> 6. Resources and Schedule:
>>>>    6.1. Projected Availability: April 2010
>>>>
>>>>    6.4. Product Approval Committee requested information:
>>>>         6.4.1. Consolidation C-team Name:
>>>>                 ON
>>>>    6.5. ARC review type: FastTrack
>>>>    6.6. ARC Exposure:   open
>>>>
>>>> 7. Prototype Availability:
>>>>    7.1. Prototype Availability:
>>>>         Prototype available on OpenSolaris in April 2010.
>>>>
>>>>
>>>
>========================================================================
>>> ===========
>>>
>>>> Li, Aubrey wrote:
>>>>
>>>>> Hi Bill,
>>>>>
>>>>> Here I made a change to propose system-wide policy support.
>>>>> http://cr.opensolaris.org/~aubrey/sys_pm_policy_v1/
>>>>> The user profile from /etc/power.conf is still passed to the kernel
>>>>> thru pm_ioctl, then call pm_set_system_policy(). Currently there is
>>>>>
>>> only
>>>
>>>>> cpu pm policy setting there, if memory/other devices need a bias as
>>>>>
>>> well,
>>>
>>>>> they can also be added to that function.
>>>>> cpu pm policy related implementation has minor change against last
>>>>> webrev,
>>>>> mcpu_pm_policy pointer has been moved from machcpu to
>>>>>
>>> mcpu_pm_mach_state
>>>
>>>>> structure according to your suggestion.
>>>>>
>>>>> Any comments and suggestions are highly appreciated.
>>>>>
>>>>> Thanks,
>>>>> -Aubrey
>>>>>
>>>>> Li, Aubrey wrote:
>>>>>
>>>>>> It looks like memory PM need such a bias as well. So I'd like to
>>>>>>
>>> change
>>>
>>>>>> the proposal to use the keyword "sys-pm-policy" instead. The
>>>>>>
>>> mechanism
>>>
>>>>>> will use the existing callb implementation to pass the user policy
>>>>>>
>>> from
>>>
>>>>>> /etc/power.conf to the kernel and walk the module registered list
>to
>>>>>> call
>>>>>> module hook function to set the pm policy individually.
>>>>>>
>>>>>> I'm not sure if any other device driver need or be happy with this
>>>>>> proposal.
>>>>>> It would be great if the device driver developer can share some
>>>>>>
>>>>> thoughts
>>>>>
>>>>>> here.
>>>>>>
>>>>>> Thanks,
>>>>>> -Aubrey
>>>>>>
>>>>>> Julia.Harper wrote:
>>>>>>
>>>>>>> I assume that this knob (profile) when turned way down would
>>>>>>>
>>> basically
>>>
>>>>>>> put the
>>>>>>> system into "power savings" mode -- where the set of power states
>>>>>>>
>>> is
>>>
>>>>>>> restricted.
>>>>>>>  That is, no matter how long the utilization level demands more
>>>>>>>
>>> power,
>>>
>>>>>>> the
>>>>>>> highest power states (for the cpus, memory, whatever) will never
>be
>>>>>>> entered.  We
>>>>>>> should probably use terminology that makes this clear.
>>>>>>>
>>>>>>> -- jdh
>>>>>>>
>>>>>>>
>>>>>>> Liu, Jiang wrote:
>>>>>>>
>>>>>>>> I prefer the solution to introduce a global power profile for
>all
>>>>>>>>
>>>>>>> devices. Currently
>>>>>>>
>>>>>>>> we need such a profile for CPUPM. In future when supporting
>>>>>>>>
>>> memory
>>>
>>>>>>> power
>>>>>>>
>>>>>>>> management, we may need a similiar profile for memory PM. And
>>>>>>>>
>>> user
>>>
>>>>>>> won't
>>>>>>>
>>>>>>>> like two variables/profiles for the same objective.
>>>>>>>>
>>>>>>>> Li, Aubrey <> wrote:
>>>>>>>>
>>>>>>>>> Bill Holler wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I forgot to mention that cpu_pm_policy is just a policy.
>>>>>>>>>> There is no guaranty it maps to a specific MSR or hardware
>>>>>>>>>> implementation.
>>>>>>>>>>
>>>>>>>>> Yes, I would like to propose a new option for CPU power
>>>>>>>>>
>>> management
>>>
>>>>>>>>> policy. This policy is a CPU bias between performance and power,
>>>>>>>>>
>>>>> the
>>>>>
>>>>>>>>> future CPU power management enhancement work can be based on
>>>>>>>>>
>>> this
>>>
>>>>>>>>> policy. - the default policy should keep the current "out of
>the
>>>>>>>>>
>>>>>> box"
>>>>>>
>>>>>>>>> behavior unchanged, we'll try to save more power without
>>>>>>>>>
>>>>> performance
>>>>>
>>>>>>>>> hurt.
>>>>>>>>> - there will be more power management futures coming on the
>>>>>>>>>
>>> future
>>>
>>>>>>>>> processor, like ENERGY_PERFORMANCE_BIAS, we can register these
>>>>>>>>>
>>> new
>>>
>>>>>>>>> futures under the policy framework, and offer a knob to the
>user
>>>>>>>>>
>>> to
>>>
>>>>>>>>> change these settings on the fly.
>>>>>>>>> - laptop users who want to prolong the battery life and less
>>>>>>>>>
>>> heat
>>>
>>>>>> and
>>>>>>
>>>>>>>>> smaller fan noise may want the system to work in some edge
>>>>>>>>>
>>>>> situation:
>>>>>
>>>>>>>>> for example, currently CPU can work in the highest clock if
>>>>>>>>>
>>> cpupm
>>>
>>>>> is
>>>>>
>>>>>>>>> disabled, but no choice to let CPU always work in the lowest
>>>>>>>>>
>>> clock.
>>>
>>>>>>>>> Similarly, Always enter deepest c-state is another choice to
>>>>>>>>>
>>> save
>>>
>>>>>>>>> more power. What's more, power aware dispatcher could be more
>>>>>>>>> flexible to pick up CPU and dispatch thread if there is a
>policy
>>>>>>>>> indicator. - Some users doesn't care about power. Yes, we
>>>>>>>>>
>>> already
>>>
>>>>>>>>> have the options to let them to set ENERGY_PERFORMANCE_BIAS to
>>>>>>>>>
>>> be
>>>
>>>>>>>>> performance bias, to close c-state/p-state, and so on and so
>>>>>>>>>
>>> forth.
>>>
>>>>>>>>> But it's more friendly to the user to just change only one
>>>>>>>>>
>>> option.
>>>
>>>>>>>>> Here, the policy only focus on CPU. If you think we should have
>>>>>>>>>
>>> a
>>>
>>>>>>>>> policy for the memory, for the devices, or we should have a
>>>>>>>>> system-wide policy, let's do this. cpu_pm_policy can be one
>part
>>>>>>>>>
>>> of
>>>
>>>>>>>>> system-wide policy.
>>>>>>>>> If nobody have thoughts on it, I'll continue to prepare a PSARC
>>>>>>>>>
>>>>> file
>>>>>
>>>>>>>>> to add cpu_pm_policy keyword.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> For example Solaris could be dynamically setting the
>>>>>>>>>> ENERGY_PERFORMANCE_BIAS register to different settings
>>>>>>>>>>
>>> depending
>>>
>>>>> on
>>>>>
>>>>>>>>>> things such as system-load,
>>>>>>>>>>
>>>>>>>>> Yes, such of these settings can be dynamically changed if we
>see
>>>>>>>>>
>>>>> the
>>>>>
>>>>>>>>> benefit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> the priority of the application being scheduled, a power
>policy
>>>>>>>>>>
>>> of
>>>
>>>>>>>>>> the application,
>>>>>>>>>>
>>>>>>>>> Making the thread power aware need another bunch of interfaces
>I
>>>>>>>>> think. For example, cmt_balance() can choose the different
>>>>>>>>>
>>>>> processor
>>>>>
>>>>>>>>> group according to the perf/power bias of the thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> or power policy of the zone.
>>>>>>>>>>
>>>>>>>>> Zone policy is an interesting topic. Different zone could have
>>>>>>>>> different CPU resource, or can share the global CPU resource,
>>>>>>>>> different zone could have different power policy, or they can
>>>>>>>>>
>>>>>> inherit
>>>>>>
>>>>>>>>> the global cpu_pm_policy setting. The virtual container could
>>>>>>>>>
>>> have
>>>
>>>>>>>>> many, but the hardware resource is unique. I think this can be
>>>>>>>>> enhanced in the zone management, which will not be covered in
>my
>>>>>>>>> proposal, :)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -Aubrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Bill
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/03/10 16:21, Bill Holler wrote:
>>>>>>>>>>
>>>>>>>>>>> +1.
>>>>>>>>>>>
>>>>>>>>>>> Hi Aubrey,
>>>>>>>>>>>
>>>>>>>>>>> I also think it is time to move forward with this proposal.
>>>>>>>>>>> Generally we want the system to work best "out of the box"
>>>>>>>>>>> with no tuning.  On the other hand, vendors will keep
>>>>>>>>>>>
>>> improving
>>>
>>>>>>>>>>> products with new features, and there will always be some
>>>>>>>>>>>
>>>>> specific
>>>>>
>>>>>>>>>>> applications were custom settings may be better.  I feel this
>>>>>>>>>>> proposal supports innovation and application specific
>>>>>>>>>>>
>>>>>> customization
>>>>>>
>>>>>>>>>>> in line with the OpenSolaris community goals.
>>>>>>>>>>>
>>>>>>>>>>> This proposal applies to all types of CPUs.  It uses
>>>>>>>>>>>
>>>>>>> "cpu_pm_policy"
>>>>>>>
>>>>>>>>>>> instead of for example mentioning a specific CPU's MSR.  ;-)
>>>>>>>>>>>
>>>>> This
>>>>>
>>>>>>>>>>> proposal will be useful with other CPUs if/when they have
>>>>>>>>>>>
>>>>> hardware
>>>>>
>>>>>>>>>>> mechanisms for tuning power / performance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In the arc case we want to mention that there could be a
>>>>>>>>>>>
>>> policy
>>>
>>>>>>>>>>> conflict between this component setting and a system-power-
>>>>>>>>>>>
>>> policy,
>>>
>>>>>>>>>>> external Power Caping, etc. Generally we want users to use
>the
>>>>>>>>>>> default or a higher level policy such as the system power
>>>>>>>>>>>
>>> policy.
>>>
>>>>>>>>>>> Unfortunately the system power policy may not be fine-grain
>or
>>>>>>>>>>> diverse enough for some applications to specify cpu power
>>>>>>>>>>>
>>> policy.
>>>
>>>>>>>>>>> In that case cpu_pm_policy will be useful.  My thought is:
>the
>>>>>>>>>>>
>>>>>> user
>>>>>>
>>>>>>>>>>> must really know what they want if they specify a component
>>>>>>>>>>>
>>>>> policy
>>>>>
>>>>>>>>>>> such as cpu_pm_policy instead of just using the system power
>>>>>>>>>>> policy.  For that reason I feel cpu_pm_policy should override
>>>>>>>>>>>
>>> the
>>>
>>>>>>>>>>> system-power-policy at the cpupm level.
>>>>>>>>>>>
>>>>>>>>>>> Power Caping is different.  Power Capping is an external
>>>>>>>>>>>
>>> policy.
>>>
>>>>>>> It
>>>>>>>
>>>>>>>>>>> is currently "owned" by the SP external to the OS.  Power
>>>>>>>>>>>
>>> Caping
>>>
>>>>>>>>>>> should override a local cpu_pm_policy.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Implementation comments:
>>>>>>>>>>> IMHO mcpu_pm_policy pointer should be in the
>>>>>>>>>>>
>>> mcpu_pm_mach_state
>>>
>>>>>>>>>>> structure instead of in the machcpu.
>>>>>>>>>>> We may want to allow the user to specify a number instead of
>>>>>>>>>>>
>>> just
>>>
>>>>>>>>>>> Perf, Balanced, Power, Default?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Bill
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02/20/10 18:43, Li, Aubrey wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Bill,
>>>>>>>>>>>>
>>>>>>>>>>>> I think it's time to continue this proposal, since b134 is
>>>>>>>>>>>>
>>>>> closed
>>>>>
>>>>>>>>>>>> and the build is not limited now. power/perf bias setting is
>>>>>>>>>>>>
>>> a
>>>
>>>>>>>>>>>> start point for future power related work, I'll prepare a
>>>>>>>>>>>>
>>> PSARC
>>>
>>>>>>>>>>>> file for the new option if this is acceptable. No is also a
>>>>>>>>>>>>
>>> good
>>>
>>>>>>>>>>>> answer with good reason.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -Aubrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Bill.Holler Wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This proposal is for a mechanism to set the new MSR
>>>>>>>>>>>>>> IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
>>>>>>>>>>>>>> feature.  The MSR effects overall power/performance.
>>>>>>>>>>>>>> It gives a hint to the processor & package for desired
>>>>>>>>>>>>>> power/performance characteristics.  It is related to p-
>>>>>>>>>>>>>>
>>> states
>>>
>>>>>>> and
>>>>>>>
>>>>>>>>>>>>>> c-states (and may effect these features), but this feature
>>>>>>>>>>>>>>
>>> can
>>>
>>>>>>>>>>>>>> have other socket/system-level effects as well.
>>>>>>>>>>>>>> The programmers guides do not go into details what the
>>>>>>>>>>>>>>
>>> other
>>>
>>>>>>>>>>>>>> effects can be.  :-(
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> The perf and power impact of this MSR is model specific.
>>>>>>>>>>>>> It's able to throttle turbo on WSM and probably help to do
>>>>>>>>>>>>>
>>> more
>>>
>>>>>>>>>>>>> hardware decision in future. For example, when the short
>>>>>>>>>>>>>
>>>>>>> interrupt
>>>>>>>
>>>>>>>>>>>>> storm is detected, it can demote CC6 request to CC3.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/05/09 05:15, minskey guo wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jedy Wang ??:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Li,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As far as I know, gnome-power-manager has removed the
>>>>>>>>>>>>>>>>
>>>>> support
>>>>>
>>>>>>>>>>>>>>>> for changing governor which is the same as profile I
>>>>>>>>>>>>>>>>
>>> think.
>>>
>>>>> I
>>>>>
>>>>>>>>>>>>>>>> remember someone wrote a blog explaining the reason but
>I
>>>>>>>>>>>>>>>>
>>>>> can
>>>>>
>>>>>>>>>>>>>>>> not find it now.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> I
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wonder why what makes us still need to implement this
>>>>>>>>>>>>>>>>
>>>>> feature.
>>>>>
>>>>>>>>>>>>>>> In linux world, there is ondemand governor in kernel. It
>>>>>>>>>>>>>>>
>>> sets
>>>
>>>>>>>>>>>>>>> cpu freqency according to cpu's current load. So,
>somebody
>>>>>>>>>>>>>>> consider that
>>>>>>>>>>>>>>>
>>>>>>>>>> eveybody
>>>>>>>>>>
>>>>>>>>>>>>>>> should use that governor, and let CPUs finish their jobs
>>>>>>>>>>>>>>>
>>> asap
>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> then
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> enter
>>>>>>>>>>>>>>> into C states for power-saving. Comparing to P state, c-
>>>>>>>>>>>>>>>
>>> state
>>>
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> save
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> more power. That's why gnome removed it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> This is also model specific and depends on if the frequency
>>>>>>>>>>>>>
>>> and
>>>
>>>>>>>>>>>>> voltage and power are linear. That's true on latest
>>>>>>>>>>>>>
>>> processor
>>>
>>>>>> but
>>>>>>
>>>>>>>>>>>>> not on earlier processor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not sure why gnome removed it, but seems not a good
>idea
>>>>>>>>>>>>>
>>> to
>>>
>>>>>>>>>>>>> me. Some users want max perf and others want longer battery
>>>>>>>>>>>>>
>>>>> life.
>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, a good p-state + c-state implementation is not easy
>to
>>>>>>>>>>>>>>
>>>>>> tune
>>>>>>
>>>>>>>>>>>>>> for more power savings.  Running in lower p-states when a
>>>>>>>>>>>>>>
>>> CPU
>>>
>>>>>> is
>>>>>>
>>>>>>>>>>>>>> busy burns more power due to shorter time in deeper C-
>>>>>>>>>>>>>>
>>> states.
>>>
>>>>>>>>>>>>>> Entering deeper C-states too aggressively also burns more
>>>>>>>>>>>>>>
>>>>> power
>>>>>
>>>>>>>>>>>>>> (on both an idle and busy system) due to unnecessary
>wakeup
>>>>>>>>>>>>>> latency.  ;-)  Without knowing the details, it seems
>likely
>>>>>>>>>>>>>>
>>>>>> that
>>>>>>
>>>>>>>>>>>>>> the gnome-power-manager was removed because setting it
>made
>>>>>>>>>>>>>>
>>>>>>> worse
>>>>>>>
>>>>>>>>>>>>>> decisions than a runtime prediction.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Solaris currently has mechanisms to turn P-state and
>deeper
>>>>>>>>>>>>>> C-state support on/off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A requirement is that the Energy Perf Bias MSR can be set
>>>>>>>>>>>>>>
>>> on
>>>
>>>>>>>>>>>>>> systems not running a GUI.  We would like to support a
>>>>>>>>>>>>>>
>>>>> possible
>>>>>
>>>>>>>>>>>>>> future Gnome interface to set this MSR if/when it exists.
>>>>>>>>>>>>>>
>>> The
>>>
>>>>>>>>>>>>>> proposal provides a mechanism that works on systems
>without
>>>>>>>>>>>>>> Gnome.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Right, most of servers do not run gnome. I don't expect
>>>>>>>>>>>>>
>>> gnome
>>>
>>>>>>>>>>>>> support but it would be great if it will, :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> IMHO, we should use this global cpu power policy setting
>>>>>>>>>>>>>
>>>>> instead
>>>>>
>>>>>>>>>>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to
>the
>>>>>>>>>>>>> user. The users just want more perf or more power, I think
>>>>>>>>>>>>>
>>> they
>>>
>>>>>>>>>>>>> don't care if the system support p/c- state at the same
>time.
>>>>>>>>>>>>> "cpupm" is a confusion only for p-state. we call "cpupm"
>>>>>>>>>>>>>
>>> before
>>>
>>>>>>>>>>>>> we have deep idle support. Actually cpu-deep-idle is also
>>>>>>>>>>>>>
>>> one
>>>
>>>>>>>>>>>>> part of cpu power management, :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> but, someone doesn't care power-saving, when comparing it
>>>>>>>>>>>>>>>
>>> to
>>>
>>>>>>>>>>>>>>> other factors. For example, if you are plagued by the
>>>>>>>>>>>>>>>
>>> noise
>>>
>>>>> of
>>>>>
>>>>>>>>>>>>>>> CPU fan,
>>>>>>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>>>>>>> expect quiet it then you can lower cpu frequency, which
>>>>>>>>>>>>>>>
>>>>>> results
>>>>>>
>>>>>>>>>>>>>>> in lower heat, and then fan can be stopped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> personally, I vote +1 for this project if I could vote,
>>>>>>>>>>>>>>>
>>> but I
>>>
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> like
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the names of "perf-bias" etc :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Besides, can somebody tell me where
>>>>>>>>>>>>>>>
>>> IA32_ENERGY_PERF_BIAS_MSR
>>>
>>>>>>>>>>>>>>> comes ? Is it a part of IPS feature ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Intel's Software Developer's Manuals 2A describes CPUID
>>>>>>>>>>>>>>
>>>>>>> detection
>>>>>>>
>>>>>>>>>>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the
>>>>>>>>>>>>>>
>>> MSR.
>>>
>>>>>>>>>>>>>> http://www.intel.com/products/processor/manuals/
>>>>>>>>>>>>>> Sorry, I do not know what IPS stands for?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> cough, cough, IPS is not a released feature and should not
>>>>>>>>>>>>>
>>> be
>>>
>>>>>>>>>>>>> discussed here, ;p
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -Aubrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -minskey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I remember why already support 2 profile through gnome-
>>>>>>>>>>>>>>>>
>>>>> power-
>>>>>
>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> on
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Solaris. What's the difference between them?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not understand the exact meaning perf-bias,
>balanced
>>>>>>>>>>>>>>>>
>>>>> and
>>>>>
>>>>>>>>>>>>>>>> power-
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> bias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> either. Does not perf-bias means the cpu frequency will
>>>>>>>>>>>>>>>>
>>> be
>>>
>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> highest level?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jedy
>>>>>>>>>>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When we enable intel energy performance bias feature,
>we
>>>>>>>>>>>>>>>>> found the power profile implementation is necessary.
>>>>>>>>>>>>>>>>>
>>> Here I
>>>
>>>>>>>>>>>>>>>>> did a draft for cpu level power policy.
>>>>>>>>>>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The proposal added a new keyword to /etc/power.conf
>>>>>>>>>>>>>>>>> "cpu-power-policy", And we have 4 options for this new
>>>>>>>>>>>>>>>>> keyword: 1) perf-bias 2) balanced
>>>>>>>>>>>>>>>>> 3) power-bias
>>>>>>>>>>>>>>>>> 4) default, the same as perf-bias.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /etc/power.conf accepts the user input and passes the
>>>>>>>>>>>>>>>>>
>>>>>>> prefered
>>>>>>>
>>>>>>>>>>>>> policy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the
>>>>>>>>>>>>>>>>>
>>> callback
>>>
>>>>>> to
>>>>>>
>>>>>>>>>>>>>>>>> walk
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> a
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> cpu
>>>>>>>>>>>>>>>>> power policy list. Every cpu pm feature which wants to
>>>>>>>>>>>>>>>>>
>>> be
>>>
>>>>>>>>>>>>>>>>> adjusted
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> by
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> this option and verified to be supported will register
>>>>>>>>>>>>>>>>>
>>> its
>>>
>>>>>>>>>>>>>>>>> callback function to the list, so that it can be called
>>>>>>>>>>>>>>>>>
>>> and
>>>
>>>>>>>>>>>>>>>>> adjusted by pmconfig.
>>>>>>>>>>>>>>>>>     ---------------------------------------------------
>-
>>>>>>>>>>>>>>>>>
>>> ---
>>>
>>>>> -
>>>>>
>>>>>>>>>>>>>>>>>     /etc/power.conf | pm_ioctl(cpu_power_policy, policy)
>>>>>>>>>>>>>>>>>     |
>>>>>>>>>>>>>>>>> cpu_power_policy_callb (policy)
>>>>>>>>>>>>>>>>>     |
>>>>>>>>>>>>>>>>>     ----> registered pm feature callback 1
>>>>>>>>>>>>>>>>>
>>>>> (ENERGY_PERF_BIAS)
>>>>>
>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>>     ----> registered pm feature callback 2
>>>>>>>>>>>>>>>>>     ...
>>>>>>>>>>>>>>>>> -------------------------------------------------------
>-
>>>>>>>>>>>>>>>>>
>>> -
>>>
>>>>>>>>>>>>>>>>> Currently, only energy_perf_bias feature is registered,
>>>>>>>>>>>>>>>>> because my intention is to support adjusting
>>>>>>>>>>>>>>>>>
>>>>>> energy_perf_bias
>>>>>>
>>>>>>>>>>>>>>>>> MSR without reboot. I guess
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> we
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> can add p/t/c-state support later. When we add p/t/c-
>>>>>>>>>>>>>>>>>
>>> state
>>>
>>>>>>>>>>>>>>>>> support, my quick thought is, this option will override
>>>>>>>>>>>>>>>>> "cpupm" and "cpu-deep-idle" setting.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Welcome your any comments and suggestions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -Aubrey
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> tesla-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>>>>>>>>>
>>>>>>>> Liu Jiang (Gerry)
>>>>>>>> OpenSolaris, OTC, SSG, Intel
>>>>>>>> _______________________________________________
>>>>>>>> pm-discuss mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ---------------------
>>>>>>>     Julia Harper, [email protected]
>>>>>>>
>>>>>> _______________________________________________
>>>>>> pm-discuss mailing list
>>>>>> [email protected]
>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>
>>>>> _______________________________________________
>>>>> pm-discuss mailing list
>>>>> [email protected]
>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

Reply via email to