It looks like memory PM need such a bias as well. So I'd like to change
the proposal to use the keyword "sys-pm-policy" instead. The mechanism
will use the existing callb implementation to pass the user policy from
/etc/power.conf to the kernel and walk the module registered list to call
module hook function to set the pm policy individually.

I'm not sure if any other device driver need or be happy with this proposal.
It would be great if the device driver developer can share some thoughts
here.

Thanks,
-Aubrey

Julia.Harper wrote:
>
>I assume that this knob (profile) when turned way down would basically
>put the
>system into "power savings" mode -- where the set of power states is
>restricted.
>  That is, no matter how long the utilization level demands more power,
>the
>highest power states (for the cpus, memory, whatever) will never be
>entered.  We
>should probably use terminology that makes this clear.
>
>-- jdh
>
>
>Liu, Jiang wrote:
>> I prefer the solution to introduce a global power profile for all
>devices. Currently
>> we need such a profile for CPUPM. In future when supporting memory
>power
>> management, we may need a similiar profile for memory PM. And user
>won't
>> like two variables/profiles for the same objective.
>>
>> Li, Aubrey <> wrote:
>>> Bill Holler wrote:
>>>> Hi,
>>>>
>>>> I forgot to mention that cpu_pm_policy is just a policy.
>>>> There is no guaranty it maps to a specific MSR or hardware
>>>> implementation.
>>> Yes, I would like to propose a new option for CPU power management
>>> policy. This policy is a CPU bias between performance and power, the
>>> future CPU power management enhancement work can be based on this
>>> policy. - the default policy should keep the current "out of the box"
>>> behavior unchanged, we'll try to save more power without performance
>>> hurt.
>>> - there will be more power management futures coming on the future
>>> processor, like ENERGY_PERFORMANCE_BIAS, we can register these new
>>> futures under the policy framework, and offer a knob to the user to
>>> change these settings on the fly.
>>> - laptop users who want to prolong the battery life and less heat and
>>> smaller fan noise may want the system to work in some edge situation:
>>> for example, currently CPU can work in the highest clock if cpupm is
>>> disabled, but no choice to let CPU always work in the lowest clock.
>>> Similarly, Always enter deepest c-state is another choice to save
>>> more power. What's more, power aware dispatcher could be more
>>> flexible to pick up CPU and dispatch thread if there is a policy
>>> indicator. - Some users doesn't care about power. Yes, we already
>>> have the options to let them to set ENERGY_PERFORMANCE_BIAS to be
>>> performance bias, to close c-state/p-state, and so on and so forth.
>>> But it's more friendly to the user to just change only one option.
>>>
>>> Here, the policy only focus on CPU. If you think we should have a
>>> policy for the memory, for the devices, or we should have a
>>> system-wide policy, let's do this. cpu_pm_policy can be one part of
>>> system-wide policy.
>>> If nobody have thoughts on it, I'll continue to prepare a PSARC file
>>> to add cpu_pm_policy keyword.
>>>
>>>> For example Solaris could be dynamically setting the
>>>> ENERGY_PERFORMANCE_BIAS register to different settings depending on
>>>> things such as system-load,
>>> Yes, such of these settings can be dynamically changed if we see the
>>> benefit.
>>>
>>>> the priority of the application being scheduled, a power policy of
>>>> the application,
>>> Making the thread power aware need another bunch of interfaces I
>>> think. For example, cmt_balance() can choose the different processor
>>> group according to the perf/power bias of the thread.
>>>
>>>> or power policy of the zone.
>>> Zone policy is an interesting topic. Different zone could have
>>> different CPU resource, or can share the global CPU resource,
>>> different zone could have different power policy, or they can inherit
>>> the global cpu_pm_policy setting. The virtual container could have
>>> many, but the hardware resource is unique. I think this can be
>>> enhanced in the zone management, which will not be covered in my
>>> proposal, :)
>>>
>>> Thanks,
>>> -Aubrey
>>>
>>>> Regards,
>>>> Bill
>>>>
>>>>
>>>> On 03/03/10 16:21, Bill Holler wrote:
>>>>> +1.
>>>>>
>>>>> Hi Aubrey,
>>>>>
>>>>> I also think it is time to move forward with this proposal.
>>>>> Generally we want the system to work best "out of the box"
>>>>> with no tuning.  On the other hand, vendors will keep improving
>>>>> products with new features, and there will always be some specific
>>>>> applications were custom settings may be better.  I feel this
>>>>> proposal supports innovation and application specific customization
>>>>> in line with the OpenSolaris community goals.
>>>>>
>>>>> This proposal applies to all types of CPUs.  It uses
>"cpu_pm_policy"
>>>>> instead of for example mentioning a specific CPU's MSR.  ;-)  This
>>>>> proposal will be useful with other CPUs if/when they have hardware
>>>>> mechanisms for tuning power / performance.
>>>>>
>>>>>
>>>>> In the arc case we want to mention that there could be a policy
>>>>> conflict between this component setting and a system-power-policy,
>>>>> external Power Caping, etc. Generally we want users to use the
>>>>> default or a higher level policy such as the system power policy.
>>>>> Unfortunately the system power policy may not be fine-grain or
>>>>> diverse enough for some applications to specify cpu power policy.
>>>>> In that case cpu_pm_policy will be useful.  My thought is: the user
>>>>> must really know what they want if they specify a component policy
>>>>> such as cpu_pm_policy instead of just using the system power
>>>>> policy.  For that reason I feel cpu_pm_policy should override the
>>>>> system-power-policy at the cpupm level.
>>>>>
>>>>> Power Caping is different.  Power Capping is an external policy.
>It
>>>>> is currently "owned" by the SP external to the OS.  Power Caping
>>>>> should override a local cpu_pm_policy.
>>>>>
>>>>>
>>>>> Implementation comments:
>>>>> IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state
>>>>> structure instead of in the machcpu.
>>>>> We may want to allow the user to specify a number instead of just
>>>>> Perf, Balanced, Power, Default?
>>>>>
>>>>> Regards,
>>>>> Bill
>>>>>
>>>>>
>>>>> On 02/20/10 18:43, Li, Aubrey wrote:
>>>>>> Hi Bill,
>>>>>>
>>>>>> I think it's time to continue this proposal, since b134 is closed
>>>>>> and the build is not limited now. power/perf bias setting is a
>>>>>> start point for future power related work, I'll prepare a PSARC
>>>>>> file for the new option if this is acceptable. No is also a good
>>>>>> answer with good reason.
>>>>>>
>>>>>> Thanks,
>>>>>> -Aubrey
>>>>>>
>>>>>>
>>>>>>> Bill.Holler Wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This proposal is for a mechanism to set the new MSR
>>>>>>>> IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
>>>>>>>> feature.  The MSR effects overall power/performance.
>>>>>>>> It gives a hint to the processor & package for desired
>>>>>>>> power/performance characteristics.  It is related to p-states
>and
>>>>>>>> c-states (and may effect these features), but this feature can
>>>>>>>> have other socket/system-level effects as well.
>>>>>>>> The programmers guides do not go into details what the other
>>>>>>>> effects can be.  :-(
>>>>>>>>
>>>>>>> The perf and power impact of this MSR is model specific.
>>>>>>> It's able to throttle turbo on WSM and probably help to do more
>>>>>>> hardware decision in future. For example, when the short
>interrupt
>>>>>>> storm is detected, it can demote CC6 request to CC3.
>>>>>>>
>>>>>>>
>>>>>>>> On 11/05/09 05:15, minskey guo wrote:
>>>>>>>>
>>>>>>>>> Jedy Wang ??:
>>>>>>>>>
>>>>>>>>>> Hi Li,
>>>>>>>>>>
>>>>>>>>>> As far as I know, gnome-power-manager has removed the support
>>>>>>>>>> for changing governor which is the same as profile I think. I
>>>>>>>>>> remember someone wrote a blog explaining the reason but I can
>>>>>>>>>> not find it now.
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>>>> wonder why what makes us still need to implement this feature.
>>>>>>>>>>
>>>>>>>>> In linux world, there is ondemand governor in kernel. It sets
>>>>>>>>> cpu freqency according to cpu's current load. So, somebody
>>>>>>>>> consider that
>>>> eveybody
>>>>>>>>> should use that governor, and let CPUs finish their jobs asap
>>>>>>>>> and
>>>>>>>>>
>>>>>>> then
>>>>>>>
>>>>>>>>> enter
>>>>>>>>> into C states for power-saving. Comparing to P state, c-state
>>>>>>>>> does
>>>>>>>>>
>>>>>>> save
>>>>>>>
>>>>>>>>> more power. That's why gnome removed it.
>>>>>>>>>
>>>>>>> This is also model specific and depends on if the frequency and
>>>>>>> voltage and power are linear. That's true on latest processor but
>>>>>>> not on earlier processor.
>>>>>>>
>>>>>>> I'm not sure why gnome removed it, but seems not a good idea to
>>>>>>> me. Some users want max perf and others want longer battery life.
>>>>>>>
>>>>>>>
>>>>>>>> Yes, a good p-state + c-state implementation is not easy to tune
>>>>>>>> for more power savings.  Running in lower p-states when a CPU is
>>>>>>>> busy burns more power due to shorter time in deeper C-states.
>>>>>>>> Entering deeper C-states too aggressively also burns more power
>>>>>>>> (on both an idle and busy system) due to unnecessary wakeup
>>>>>>>> latency.  ;-)  Without knowing the details, it seems likely that
>>>>>>>> the gnome-power-manager was removed because setting it made
>worse
>>>>>>>> decisions than a runtime prediction.
>>>>>>>>
>>>>>>>>
>>>>>>>> Solaris currently has mechanisms to turn P-state and deeper
>>>>>>>> C-state support on/off.
>>>>>>>>
>>>>>>>> A requirement is that the Energy Perf Bias MSR can be set on
>>>>>>>> systems not running a GUI.  We would like to support a possible
>>>>>>>> future Gnome interface to set this MSR if/when it exists.  The
>>>>>>>> proposal provides a mechanism that works on systems without
>>>>>>>> Gnome.
>>>>>>>>
>>>>>>> Right, most of servers do not run gnome. I don't expect gnome
>>>>>>> support but it would be great if it will, :-)
>>>>>>>
>>>>>>> IMHO, we should use this global cpu power policy setting instead
>>>>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to the
>>>>>>> user. The users just want more perf or more power, I think they
>>>>>>> don't care if the system support p/c- state at the same time.
>>>>>>> "cpupm" is a confusion only for p-state. we call "cpupm" before
>>>>>>> we have deep idle support. Actually cpu-deep-idle is also one
>>>>>>> part of cpu power management, :)
>>>>>>>
>>>>>>>>> but, someone doesn't care power-saving, when comparing it to
>>>>>>>>> other factors. For example, if you are plagued by the noise of
>>>>>>>>> CPU fan,
>>>> and
>>>>>>>>> expect quiet it then you can lower cpu frequency, which results
>>>>>>>>> in lower heat, and then fan can be stopped.
>>>>>>>>>
>>>>>>>>> personally, I vote +1 for this project if I could vote, but I
>>>>>>>>> don't
>>>>>>>>>
>>>>>>> like
>>>>>>>
>>>>>>>>> the names of "perf-bias" etc :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR
>>>>>>>>> comes ? Is it a part of IPS feature ?
>>>>>>>>>
>>>>>>>> Intel's Software Developer's Manuals 2A describes CPUID
>detection
>>>>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the MSR.
>>>>>>>> http://www.intel.com/products/processor/manuals/
>>>>>>>> Sorry, I do not know what IPS stands for?
>>>>>>>>
>>>>>>> cough, cough, IPS is not a released feature and should not be
>>>>>>> discussed here, ;p
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Aubrey
>>>>>>>
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Bill
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -minskey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I remember why already support 2 profile through gnome-power-
>>>>>>>>>> manager
>>>>>>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>>>> Solaris. What's the difference between them?
>>>>>>>>>>
>>>>>>>>>> I do not understand the exact meaning perf-bias, balanced and
>>>>>>>>>> power-
>>>>>>>>>>
>>>>>>> bias
>>>>>>>
>>>>>>>>>> either. Does not perf-bias means the cpu frequency will be
>>>>>>>>>> always
>>>> at
>>>>>>> the
>>>>>>>
>>>>>>>>>> highest level?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Jedy
>>>>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> When we enable intel energy performance bias feature, we
>>>>>>>>>>> found the power profile implementation is necessary. Here I
>>>>>>>>>>> did a draft for cpu level power policy.
>>>>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
>>>>>>>>>>>
>>>>>>>>>>> The proposal added a new keyword to /etc/power.conf
>>>>>>>>>>> "cpu-power-policy", And we have 4 options for this new
>>>>>>>>>>> keyword: 1) perf-bias 2) balanced
>>>>>>>>>>> 3) power-bias
>>>>>>>>>>> 4) default, the same as perf-bias.
>>>>>>>>>>>
>>>>>>>>>>> /etc/power.conf accepts the user input and passes the
>prefered
>>>>>>>>>>>
>>>>>>> policy
>>>>>>>
>>>>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback to
>>>>>>>>>>> walk
>>>>>>>>>>>
>>>>>>> a
>>>>>>>
>>>>>>>>>>> cpu
>>>>>>>>>>> power policy list. Every cpu pm feature which wants to be
>>>>>>>>>>> adjusted
>>>>>>>>>>>
>>>>>>> by
>>>>>>>
>>>>>>>>>>> this option and verified to be supported will register its
>>>>>>>>>>> callback function to the list, so that it can be called and
>>>>>>>>>>> adjusted by pmconfig.
>>>>>>>>>>>     --------------------------------------------------------
>>>>>>>>>>>     /etc/power.conf | pm_ioctl(cpu_power_policy, policy)
>>>>>>>>>>>     |
>>>>>>>>>>> cpu_power_policy_callb (policy)
>>>>>>>>>>>     |
>>>>>>>>>>>     ----> registered pm feature callback 1 (ENERGY_PERF_BIAS)
>>>>>>>>>>> |
>>>>>>>>>>>     ----> registered pm feature callback 2
>>>>>>>>>>>     ...
>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>> Currently, only energy_perf_bias feature is registered,
>>>>>>>>>>> because my intention is to support adjusting energy_perf_bias
>>>>>>>>>>> MSR without reboot. I guess
>>>>>>>>>>>
>>>>>>> we
>>>>>>>
>>>>>>>>>>> probably
>>>>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state
>>>>>>>>>>> support, my quick thought is, this option will override
>>>>>>>>>>> "cpupm" and "cpu-deep-idle" setting.
>>>>>>>>>>>
>>>>>>>>>>> Welcome your any comments and suggestions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -Aubrey
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> pm-discuss mailing list
>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> pm-discuss mailing list
>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pm-discuss mailing list
>>>>>>> pm-discuss at opensolaris.org
>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>
>>>>> _______________________________________________
>>>>> pm-discuss mailing list
>>>>> pm-discuss at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>> _______________________________________________
>>>> pm-discuss mailing list
>>>> pm-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>> _______________________________________________
>>> tesla-dev mailing list
>>> tesla-dev at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>>
>> Liu Jiang (Gerry)
>> OpenSolaris, OTC, SSG, Intel
>> _______________________________________________
>> pm-discuss mailing list
>> pm-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>
>--
>
>---------------------
>     Julia Harper, julia.harper at oracle.com

Reply via email to