I prefer the solution to introduce a global power profile for all devices. 
Currently
we need such a profile for CPUPM. In future when supporting memory power
management, we may need a similiar profile for memory PM. And user won't
like two variables/profiles for the same objective.

Li, Aubrey <> wrote:
> Bill Holler wrote:
>>
>> Hi,
>>
>> I forgot to mention that cpu_pm_policy is just a policy.
>> There is no guaranty it maps to a specific MSR or hardware
>> implementation.
>
> Yes, I would like to propose a new option for CPU power management
> policy. This policy is a CPU bias between performance and power, the
> future CPU power management enhancement work can be based on this
> policy. - the default policy should keep the current "out of the box"
> behavior unchanged, we'll try to save more power without performance
> hurt.
> - there will be more power management futures coming on the future
> processor, like ENERGY_PERFORMANCE_BIAS, we can register these new
> futures under the policy framework, and offer a knob to the user to
> change these settings on the fly.
> - laptop users who want to prolong the battery life and less heat and
> smaller fan noise may want the system to work in some edge situation:
> for example, currently CPU can work in the highest clock if cpupm is
> disabled, but no choice to let CPU always work in the lowest clock.
> Similarly, Always enter deepest c-state is another choice to save
> more power. What's more, power aware dispatcher could be more
> flexible to pick up CPU and dispatch thread if there is a policy
> indicator. - Some users doesn't care about power. Yes, we already
> have the options to let them to set ENERGY_PERFORMANCE_BIAS to be
> performance bias, to close c-state/p-state, and so on and so forth.
> But it's more friendly to the user to just change only one option.
>
> Here, the policy only focus on CPU. If you think we should have a
> policy for the memory, for the devices, or we should have a
> system-wide policy, let's do this. cpu_pm_policy can be one part of
> system-wide policy.
> If nobody have thoughts on it, I'll continue to prepare a PSARC file
> to add cpu_pm_policy keyword.
>
>>
>> For example Solaris could be dynamically setting the
>> ENERGY_PERFORMANCE_BIAS register to different settings depending on
>> things such as system-load,
>
> Yes, such of these settings can be dynamically changed if we see the
> benefit.
>
>> the priority of the application being scheduled, a power policy of
>> the application,
>
> Making the thread power aware need another bunch of interfaces I
> think. For example, cmt_balance() can choose the different processor
> group according to the perf/power bias of the thread.
>
>> or power policy of the zone.
>
> Zone policy is an interesting topic. Different zone could have
> different CPU resource, or can share the global CPU resource,
> different zone could have different power policy, or they can inherit
> the global cpu_pm_policy setting. The virtual container could have
> many, but the hardware resource is unique. I think this can be
> enhanced in the zone management, which will not be covered in my
> proposal, :)
>
> Thanks,
> -Aubrey
>
>>
>> Regards,
>> Bill
>>
>>
>> On 03/03/10 16:21, Bill Holler wrote:
>>> +1.
>>>
>>> Hi Aubrey,
>>>
>>> I also think it is time to move forward with this proposal.
>>> Generally we want the system to work best "out of the box"
>>> with no tuning.  On the other hand, vendors will keep improving
>>> products with new features, and there will always be some specific
>>> applications were custom settings may be better.  I feel this
>>> proposal supports innovation and application specific customization
>>> in line with the OpenSolaris community goals.
>>>
>>> This proposal applies to all types of CPUs.  It uses "cpu_pm_policy"
>>> instead of for example mentioning a specific CPU's MSR.  ;-)  This
>>> proposal will be useful with other CPUs if/when they have hardware
>>> mechanisms for tuning power / performance.
>>>
>>>
>>> In the arc case we want to mention that there could be a policy
>>> conflict between this component setting and a system-power-policy,
>>> external Power Caping, etc. Generally we want users to use the
>>> default or a higher level policy such as the system power policy.
>>> Unfortunately the system power policy may not be fine-grain or
>>> diverse enough for some applications to specify cpu power policy.
>>> In that case cpu_pm_policy will be useful.  My thought is: the user
>>> must really know what they want if they specify a component policy
>>> such as cpu_pm_policy instead of just using the system power
>>> policy.  For that reason I feel cpu_pm_policy should override the
>>> system-power-policy at the cpupm level.
>>>
>>> Power Caping is different.  Power Capping is an external policy.  It
>>> is currently "owned" by the SP external to the OS.  Power Caping
>>> should override a local cpu_pm_policy.
>>>
>>>
>>> Implementation comments:
>>> IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state
>>> structure instead of in the machcpu.
>>> We may want to allow the user to specify a number instead of just
>>> Perf, Balanced, Power, Default?
>>>
>>> Regards,
>>> Bill
>>>
>>>
>>> On 02/20/10 18:43, Li, Aubrey wrote:
>>>> Hi Bill,
>>>>
>>>> I think it's time to continue this proposal, since b134 is closed
>>>> and the build is not limited now. power/perf bias setting is a
>>>> start point for future power related work, I'll prepare a PSARC
>>>> file for the new option if this is acceptable. No is also a good
>>>> answer with good reason.
>>>>
>>>> Thanks,
>>>> -Aubrey
>>>>
>>>>
>>>>> Bill.Holler Wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This proposal is for a mechanism to set the new MSR
>>>>>> IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
>>>>>> feature.  The MSR effects overall power/performance.
>>>>>> It gives a hint to the processor & package for desired
>>>>>> power/performance characteristics.  It is related to p-states and
>>>>>> c-states (and may effect these features), but this feature can
>>>>>> have other socket/system-level effects as well.
>>>>>> The programmers guides do not go into details what the other
>>>>>> effects can be.  :-(
>>>>>>
>>>>> The perf and power impact of this MSR is model specific.
>>>>> It's able to throttle turbo on WSM and probably help to do more
>>>>> hardware decision in future. For example, when the short interrupt
>>>>> storm is detected, it can demote CC6 request to CC3.
>>>>>
>>>>>
>>>>>> On 11/05/09 05:15, minskey guo wrote:
>>>>>>
>>>>>>> Jedy Wang ??:
>>>>>>>
>>>>>>>> Hi Li,
>>>>>>>>
>>>>>>>> As far as I know, gnome-power-manager has removed the support
>>>>>>>> for changing governor which is the same as profile I think. I
>>>>>>>> remember someone wrote a blog explaining the reason but I can
>>>>>>>> not find it now.
>>>>>>>>
>>>>> I
>>>>>
>>>>>>>> wonder why what makes us still need to implement this feature.
>>>>>>>>
>>>>>>> In linux world, there is ondemand governor in kernel. It sets
>>>>>>> cpu freqency according to cpu's current load. So, somebody
>>>>>>> consider that
>> eveybody
>>>>>>> should use that governor, and let CPUs finish their jobs asap
>>>>>>> and
>>>>>>>
>>>>> then
>>>>>
>>>>>>> enter
>>>>>>> into C states for power-saving. Comparing to P state, c-state
>>>>>>> does
>>>>>>>
>>>>> save
>>>>>
>>>>>>> more power. That's why gnome removed it.
>>>>>>>
>>>>> This is also model specific and depends on if the frequency and
>>>>> voltage and power are linear. That's true on latest processor but
>>>>> not on earlier processor.
>>>>>
>>>>> I'm not sure why gnome removed it, but seems not a good idea to
>>>>> me. Some users want max perf and others want longer battery life.
>>>>>
>>>>>
>>>>>> Yes, a good p-state + c-state implementation is not easy to tune
>>>>>> for more power savings.  Running in lower p-states when a CPU is
>>>>>> busy burns more power due to shorter time in deeper C-states.
>>>>>> Entering deeper C-states too aggressively also burns more power
>>>>>> (on both an idle and busy system) due to unnecessary wakeup
>>>>>> latency.  ;-)  Without knowing the details, it seems likely that
>>>>>> the gnome-power-manager was removed because setting it made worse
>>>>>> decisions than a runtime prediction.
>>>>>>
>>>>>>
>>>>>> Solaris currently has mechanisms to turn P-state and deeper
>>>>>> C-state support on/off.
>>>>>>
>>>>>> A requirement is that the Energy Perf Bias MSR can be set on
>>>>>> systems not running a GUI.  We would like to support a possible
>>>>>> future Gnome interface to set this MSR if/when it exists.  The
>>>>>> proposal provides a mechanism that works on systems without
>>>>>> Gnome.
>>>>>>
>>>>> Right, most of servers do not run gnome. I don't expect gnome
>>>>> support but it would be great if it will, :-)
>>>>>
>>>>> IMHO, we should use this global cpu power policy setting instead
>>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to the
>>>>> user. The users just want more perf or more power, I think they
>>>>> don't care if the system support p/c- state at the same time.
>>>>> "cpupm" is a confusion only for p-state. we call "cpupm" before
>>>>> we have deep idle support. Actually cpu-deep-idle is also one
>>>>> part of cpu power management, :)
>>>>>
>>>>>>
>>>>>>> but, someone doesn't care power-saving, when comparing it to
>>>>>>> other factors. For example, if you are plagued by the noise of
>>>>>>> CPU fan,
>> and
>>>>>>> expect quiet it then you can lower cpu frequency, which results
>>>>>>> in lower heat, and then fan can be stopped.
>>>>>>>
>>>>>>> personally, I vote +1 for this project if I could vote, but I
>>>>>>> don't
>>>>>>>
>>>>> like
>>>>>
>>>>>>> the names of "perf-bias" etc :)
>>>>>>>
>>>>>>>
>>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR
>>>>>>> comes ? Is it a part of IPS feature ?
>>>>>>>
>>>>>> Intel's Software Developer's Manuals 2A describes CPUID detection
>>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the MSR.
>>>>>> http://www.intel.com/products/processor/manuals/
>>>>>> Sorry, I do not know what IPS stands for?
>>>>>>
>>>>> cough, cough, IPS is not a released feature and should not be
>>>>> discussed here, ;p
>>>>>
>>>>> Thanks,
>>>>> -Aubrey
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Bill
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -minskey
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I remember why already support 2 profile through gnome-power-
>>>>>>>> manager
>>>>>>>>
>>>>> on
>>>>>
>>>>>>>> Solaris. What's the difference between them?
>>>>>>>>
>>>>>>>> I do not understand the exact meaning perf-bias, balanced and
>>>>>>>> power-
>>>>>>>>
>>>>> bias
>>>>>
>>>>>>>> either. Does not perf-bias means the cpu frequency will be
>>>>>>>> always
>> at
>>>>>>>>
>>>>> the
>>>>>
>>>>>>>> highest level?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Jedy
>>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> When we enable intel energy performance bias feature, we
>>>>>>>>> found the power profile implementation is necessary. Here I
>>>>>>>>> did a draft for cpu level power policy.
>>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
>>>>>>>>>
>>>>>>>>> The proposal added a new keyword to /etc/power.conf
>>>>>>>>> "cpu-power-policy", And we have 4 options for this new
>>>>>>>>> keyword: 1) perf-bias 2) balanced
>>>>>>>>> 3) power-bias
>>>>>>>>> 4) default, the same as perf-bias.
>>>>>>>>>
>>>>>>>>> /etc/power.conf accepts the user input and passes the prefered
>>>>>>>>>
>>>>> policy
>>>>>
>>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback to
>>>>>>>>> walk
>>>>>>>>>
>>>>> a
>>>>>
>>>>>>>>> cpu
>>>>>>>>> power policy list. Every cpu pm feature which wants to be
>>>>>>>>> adjusted
>>>>>>>>>
>>>>> by
>>>>>
>>>>>>>>> this option and verified to be supported will register its
>>>>>>>>> callback function to the list, so that it can be called and
>>>>>>>>> adjusted by pmconfig.
>>>>>>>>>     --------------------------------------------------------
>>>>>>>>>     /etc/power.conf | pm_ioctl(cpu_power_policy, policy)
>>>>>>>>>     |
>>>>>>>>> cpu_power_policy_callb (policy)
>>>>>>>>>     |
>>>>>>>>>     ----> registered pm feature callback 1 (ENERGY_PERF_BIAS)
>>>>>>>>> |
>>>>>>>>>     ----> registered pm feature callback 2
>>>>>>>>>     ...
>>>>>>>>> ---------------------------------------------------------
>>>>>>>>> Currently, only energy_perf_bias feature is registered,
>>>>>>>>> because my intention is to support adjusting energy_perf_bias
>>>>>>>>> MSR without reboot. I guess
>>>>>>>>>
>>>>> we
>>>>>
>>>>>>>>> probably
>>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state
>>>>>>>>> support, my quick thought is, this option will override
>>>>>>>>> "cpupm" and "cpu-deep-idle" setting.
>>>>>>>>>
>>>>>>>>> Welcome your any comments and suggestions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -Aubrey
>>>>>>>>> _______________________________________________
>>>>>>>>> pm-discuss mailing list
>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> pm-discuss mailing list
>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pm-discuss mailing list
>>>>>>> pm-discuss at opensolaris.org
>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>
>>>>>> _______________________________________________
>>>>>> pm-discuss mailing list
>>>>>> pm-discuss at opensolaris.org
>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>
>>>>> _______________________________________________
>>>>> pm-discuss mailing list
>>>>> pm-discuss at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>
>>>
>>> _______________________________________________
>>> pm-discuss mailing list
>>> pm-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>
>> _______________________________________________
>> pm-discuss mailing list
>> pm-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
> _______________________________________________
> tesla-dev mailing list
> tesla-dev at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/tesla-dev

Liu Jiang (Gerry)
OpenSolaris, OTC, SSG, Intel

Reply via email to