Hi Bill,

Here I made a change to propose system-wide policy support.
http://cr.opensolaris.org/~aubrey/sys_pm_policy_v1/
The user profile from /etc/power.conf is still passed to the kernel
thru pm_ioctl, then call pm_set_system_policy(). Currently there is only
cpu pm policy setting there, if memory/other devices need a bias as well,
they can also be added to that function.
cpu pm policy related implementation has minor change against last webrev,
mcpu_pm_policy pointer has been moved from machcpu to mcpu_pm_mach_state
structure according to your suggestion.

Any comments and suggestions are highly appreciated.

Thanks,
-Aubrey

Li, Aubrey wrote:
>
>It looks like memory PM need such a bias as well. So I'd like to change
>the proposal to use the keyword "sys-pm-policy" instead. The mechanism
>will use the existing callb implementation to pass the user policy from
>/etc/power.conf to the kernel and walk the module registered list to
>call
>module hook function to set the pm policy individually.
>
>I'm not sure if any other device driver need or be happy with this
>proposal.
>It would be great if the device driver developer can share some thoughts
>here.
>
>Thanks,
>-Aubrey
>
>Julia.Harper wrote:
>>
>>I assume that this knob (profile) when turned way down would basically
>>put the
>>system into "power savings" mode -- where the set of power states is
>>restricted.
>>  That is, no matter how long the utilization level demands more power,
>>the
>>highest power states (for the cpus, memory, whatever) will never be
>>entered.  We
>>should probably use terminology that makes this clear.
>>
>>-- jdh
>>
>>
>>Liu, Jiang wrote:
>>> I prefer the solution to introduce a global power profile for all
>>devices. Currently
>>> we need such a profile for CPUPM. In future when supporting memory
>>power
>>> management, we may need a similiar profile for memory PM. And user
>>won't
>>> like two variables/profiles for the same objective.
>>>
>>> Li, Aubrey <> wrote:
>>>> Bill Holler wrote:
>>>>> Hi,
>>>>>
>>>>> I forgot to mention that cpu_pm_policy is just a policy.
>>>>> There is no guaranty it maps to a specific MSR or hardware
>>>>> implementation.
>>>> Yes, I would like to propose a new option for CPU power management
>>>> policy. This policy is a CPU bias between performance and power, the
>>>> future CPU power management enhancement work can be based on this
>>>> policy. - the default policy should keep the current "out of the
>box"
>>>> behavior unchanged, we'll try to save more power without performance
>>>> hurt.
>>>> - there will be more power management futures coming on the future
>>>> processor, like ENERGY_PERFORMANCE_BIAS, we can register these new
>>>> futures under the policy framework, and offer a knob to the user to
>>>> change these settings on the fly.
>>>> - laptop users who want to prolong the battery life and less heat
>and
>>>> smaller fan noise may want the system to work in some edge situation:
>>>> for example, currently CPU can work in the highest clock if cpupm is
>>>> disabled, but no choice to let CPU always work in the lowest clock.
>>>> Similarly, Always enter deepest c-state is another choice to save
>>>> more power. What's more, power aware dispatcher could be more
>>>> flexible to pick up CPU and dispatch thread if there is a policy
>>>> indicator. - Some users doesn't care about power. Yes, we already
>>>> have the options to let them to set ENERGY_PERFORMANCE_BIAS to be
>>>> performance bias, to close c-state/p-state, and so on and so forth.
>>>> But it's more friendly to the user to just change only one option.
>>>>
>>>> Here, the policy only focus on CPU. If you think we should have a
>>>> policy for the memory, for the devices, or we should have a
>>>> system-wide policy, let's do this. cpu_pm_policy can be one part of
>>>> system-wide policy.
>>>> If nobody have thoughts on it, I'll continue to prepare a PSARC file
>>>> to add cpu_pm_policy keyword.
>>>>
>>>>> For example Solaris could be dynamically setting the
>>>>> ENERGY_PERFORMANCE_BIAS register to different settings depending on
>>>>> things such as system-load,
>>>> Yes, such of these settings can be dynamically changed if we see the
>>>> benefit.
>>>>
>>>>> the priority of the application being scheduled, a power policy of
>>>>> the application,
>>>> Making the thread power aware need another bunch of interfaces I
>>>> think. For example, cmt_balance() can choose the different processor
>>>> group according to the perf/power bias of the thread.
>>>>
>>>>> or power policy of the zone.
>>>> Zone policy is an interesting topic. Different zone could have
>>>> different CPU resource, or can share the global CPU resource,
>>>> different zone could have different power policy, or they can
>inherit
>>>> the global cpu_pm_policy setting. The virtual container could have
>>>> many, but the hardware resource is unique. I think this can be
>>>> enhanced in the zone management, which will not be covered in my
>>>> proposal, :)
>>>>
>>>> Thanks,
>>>> -Aubrey
>>>>
>>>>> Regards,
>>>>> Bill
>>>>>
>>>>>
>>>>> On 03/03/10 16:21, Bill Holler wrote:
>>>>>> +1.
>>>>>>
>>>>>> Hi Aubrey,
>>>>>>
>>>>>> I also think it is time to move forward with this proposal.
>>>>>> Generally we want the system to work best "out of the box"
>>>>>> with no tuning.  On the other hand, vendors will keep improving
>>>>>> products with new features, and there will always be some specific
>>>>>> applications were custom settings may be better.  I feel this
>>>>>> proposal supports innovation and application specific
>customization
>>>>>> in line with the OpenSolaris community goals.
>>>>>>
>>>>>> This proposal applies to all types of CPUs.  It uses
>>"cpu_pm_policy"
>>>>>> instead of for example mentioning a specific CPU's MSR.  ;-)  This
>>>>>> proposal will be useful with other CPUs if/when they have hardware
>>>>>> mechanisms for tuning power / performance.
>>>>>>
>>>>>>
>>>>>> In the arc case we want to mention that there could be a policy
>>>>>> conflict between this component setting and a system-power-policy,
>>>>>> external Power Caping, etc. Generally we want users to use the
>>>>>> default or a higher level policy such as the system power policy.
>>>>>> Unfortunately the system power policy may not be fine-grain or
>>>>>> diverse enough for some applications to specify cpu power policy.
>>>>>> In that case cpu_pm_policy will be useful.  My thought is: the
>user
>>>>>> must really know what they want if they specify a component policy
>>>>>> such as cpu_pm_policy instead of just using the system power
>>>>>> policy.  For that reason I feel cpu_pm_policy should override the
>>>>>> system-power-policy at the cpupm level.
>>>>>>
>>>>>> Power Caping is different.  Power Capping is an external policy.
>>It
>>>>>> is currently "owned" by the SP external to the OS.  Power Caping
>>>>>> should override a local cpu_pm_policy.
>>>>>>
>>>>>>
>>>>>> Implementation comments:
>>>>>> IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state
>>>>>> structure instead of in the machcpu.
>>>>>> We may want to allow the user to specify a number instead of just
>>>>>> Perf, Balanced, Power, Default?
>>>>>>
>>>>>> Regards,
>>>>>> Bill
>>>>>>
>>>>>>
>>>>>> On 02/20/10 18:43, Li, Aubrey wrote:
>>>>>>> Hi Bill,
>>>>>>>
>>>>>>> I think it's time to continue this proposal, since b134 is closed
>>>>>>> and the build is not limited now. power/perf bias setting is a
>>>>>>> start point for future power related work, I'll prepare a PSARC
>>>>>>> file for the new option if this is acceptable. No is also a good
>>>>>>> answer with good reason.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Aubrey
>>>>>>>
>>>>>>>
>>>>>>>> Bill.Holler Wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This proposal is for a mechanism to set the new MSR
>>>>>>>>> IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
>>>>>>>>> feature.  The MSR effects overall power/performance.
>>>>>>>>> It gives a hint to the processor & package for desired
>>>>>>>>> power/performance characteristics.  It is related to p-states
>>and
>>>>>>>>> c-states (and may effect these features), but this feature can
>>>>>>>>> have other socket/system-level effects as well.
>>>>>>>>> The programmers guides do not go into details what the other
>>>>>>>>> effects can be.  :-(
>>>>>>>>>
>>>>>>>> The perf and power impact of this MSR is model specific.
>>>>>>>> It's able to throttle turbo on WSM and probably help to do more
>>>>>>>> hardware decision in future. For example, when the short
>>interrupt
>>>>>>>> storm is detected, it can demote CC6 request to CC3.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 11/05/09 05:15, minskey guo wrote:
>>>>>>>>>
>>>>>>>>>> Jedy Wang ??:
>>>>>>>>>>
>>>>>>>>>>> Hi Li,
>>>>>>>>>>>
>>>>>>>>>>> As far as I know, gnome-power-manager has removed the support
>>>>>>>>>>> for changing governor which is the same as profile I think. I
>>>>>>>>>>> remember someone wrote a blog explaining the reason but I can
>>>>>>>>>>> not find it now.
>>>>>>>>>>>
>>>>>>>> I
>>>>>>>>
>>>>>>>>>>> wonder why what makes us still need to implement this feature.
>>>>>>>>>>>
>>>>>>>>>> In linux world, there is ondemand governor in kernel. It sets
>>>>>>>>>> cpu freqency according to cpu's current load. So, somebody
>>>>>>>>>> consider that
>>>>> eveybody
>>>>>>>>>> should use that governor, and let CPUs finish their jobs asap
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>> then
>>>>>>>>
>>>>>>>>>> enter
>>>>>>>>>> into C states for power-saving. Comparing to P state, c-state
>>>>>>>>>> does
>>>>>>>>>>
>>>>>>>> save
>>>>>>>>
>>>>>>>>>> more power. That's why gnome removed it.
>>>>>>>>>>
>>>>>>>> This is also model specific and depends on if the frequency and
>>>>>>>> voltage and power are linear. That's true on latest processor
>but
>>>>>>>> not on earlier processor.
>>>>>>>>
>>>>>>>> I'm not sure why gnome removed it, but seems not a good idea to
>>>>>>>> me. Some users want max perf and others want longer battery life.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Yes, a good p-state + c-state implementation is not easy to
>tune
>>>>>>>>> for more power savings.  Running in lower p-states when a CPU
>is
>>>>>>>>> busy burns more power due to shorter time in deeper C-states.
>>>>>>>>> Entering deeper C-states too aggressively also burns more power
>>>>>>>>> (on both an idle and busy system) due to unnecessary wakeup
>>>>>>>>> latency.  ;-)  Without knowing the details, it seems likely
>that
>>>>>>>>> the gnome-power-manager was removed because setting it made
>>worse
>>>>>>>>> decisions than a runtime prediction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Solaris currently has mechanisms to turn P-state and deeper
>>>>>>>>> C-state support on/off.
>>>>>>>>>
>>>>>>>>> A requirement is that the Energy Perf Bias MSR can be set on
>>>>>>>>> systems not running a GUI.  We would like to support a possible
>>>>>>>>> future Gnome interface to set this MSR if/when it exists.  The
>>>>>>>>> proposal provides a mechanism that works on systems without
>>>>>>>>> Gnome.
>>>>>>>>>
>>>>>>>> Right, most of servers do not run gnome. I don't expect gnome
>>>>>>>> support but it would be great if it will, :-)
>>>>>>>>
>>>>>>>> IMHO, we should use this global cpu power policy setting instead
>>>>>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to the
>>>>>>>> user. The users just want more perf or more power, I think they
>>>>>>>> don't care if the system support p/c- state at the same time.
>>>>>>>> "cpupm" is a confusion only for p-state. we call "cpupm" before
>>>>>>>> we have deep idle support. Actually cpu-deep-idle is also one
>>>>>>>> part of cpu power management, :)
>>>>>>>>
>>>>>>>>>> but, someone doesn't care power-saving, when comparing it to
>>>>>>>>>> other factors. For example, if you are plagued by the noise of
>>>>>>>>>> CPU fan,
>>>>> and
>>>>>>>>>> expect quiet it then you can lower cpu frequency, which
>results
>>>>>>>>>> in lower heat, and then fan can be stopped.
>>>>>>>>>>
>>>>>>>>>> personally, I vote +1 for this project if I could vote, but I
>>>>>>>>>> don't
>>>>>>>>>>
>>>>>>>> like
>>>>>>>>
>>>>>>>>>> the names of "perf-bias" etc :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR
>>>>>>>>>> comes ? Is it a part of IPS feature ?
>>>>>>>>>>
>>>>>>>>> Intel's Software Developer's Manuals 2A describes CPUID
>>detection
>>>>>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the MSR.
>>>>>>>>> http://www.intel.com/products/processor/manuals/
>>>>>>>>> Sorry, I do not know what IPS stands for?
>>>>>>>>>
>>>>>>>> cough, cough, IPS is not a released feature and should not be
>>>>>>>> discussed here, ;p
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Aubrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Bill
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -minskey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I remember why already support 2 profile through gnome-power-
>>>>>>>>>>> manager
>>>>>>>>>>>
>>>>>>>> on
>>>>>>>>
>>>>>>>>>>> Solaris. What's the difference between them?
>>>>>>>>>>>
>>>>>>>>>>> I do not understand the exact meaning perf-bias, balanced and
>>>>>>>>>>> power-
>>>>>>>>>>>
>>>>>>>> bias
>>>>>>>>
>>>>>>>>>>> either. Does not perf-bias means the cpu frequency will be
>>>>>>>>>>> always
>>>>> at
>>>>>>>> the
>>>>>>>>
>>>>>>>>>>> highest level?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Jedy
>>>>>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> When we enable intel energy performance bias feature, we
>>>>>>>>>>>> found the power profile implementation is necessary. Here I
>>>>>>>>>>>> did a draft for cpu level power policy.
>>>>>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
>>>>>>>>>>>>
>>>>>>>>>>>> The proposal added a new keyword to /etc/power.conf
>>>>>>>>>>>> "cpu-power-policy", And we have 4 options for this new
>>>>>>>>>>>> keyword: 1) perf-bias 2) balanced
>>>>>>>>>>>> 3) power-bias
>>>>>>>>>>>> 4) default, the same as perf-bias.
>>>>>>>>>>>>
>>>>>>>>>>>> /etc/power.conf accepts the user input and passes the
>>prefered
>>>>>>>>>>>>
>>>>>>>> policy
>>>>>>>>
>>>>>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback
>to
>>>>>>>>>>>> walk
>>>>>>>>>>>>
>>>>>>>> a
>>>>>>>>
>>>>>>>>>>>> cpu
>>>>>>>>>>>> power policy list. Every cpu pm feature which wants to be
>>>>>>>>>>>> adjusted
>>>>>>>>>>>>
>>>>>>>> by
>>>>>>>>
>>>>>>>>>>>> this option and verified to be supported will register its
>>>>>>>>>>>> callback function to the list, so that it can be called and
>>>>>>>>>>>> adjusted by pmconfig.
>>>>>>>>>>>>     --------------------------------------------------------
>>>>>>>>>>>>     /etc/power.conf | pm_ioctl(cpu_power_policy, policy)
>>>>>>>>>>>>     |
>>>>>>>>>>>> cpu_power_policy_callb (policy)
>>>>>>>>>>>>     |
>>>>>>>>>>>>     ----> registered pm feature callback 1 (ENERGY_PERF_BIAS)
>>>>>>>>>>>> |
>>>>>>>>>>>>     ----> registered pm feature callback 2
>>>>>>>>>>>>     ...
>>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>>> Currently, only energy_perf_bias feature is registered,
>>>>>>>>>>>> because my intention is to support adjusting
>energy_perf_bias
>>>>>>>>>>>> MSR without reboot. I guess
>>>>>>>>>>>>
>>>>>>>> we
>>>>>>>>
>>>>>>>>>>>> probably
>>>>>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state
>>>>>>>>>>>> support, my quick thought is, this option will override
>>>>>>>>>>>> "cpupm" and "cpu-deep-idle" setting.
>>>>>>>>>>>>
>>>>>>>>>>>> Welcome your any comments and suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -Aubrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> pm-discuss mailing list
>>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> pm-discuss mailing list
>>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> pm-discuss mailing list
>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> pm-discuss mailing list
>>>>>> pm-discuss at opensolaris.org
>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>> _______________________________________________
>>>>> pm-discuss mailing list
>>>>> pm-discuss at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>> _______________________________________________
>>>> tesla-dev mailing list
>>>> tesla-dev at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev
>>>
>>> Liu Jiang (Gerry)
>>> OpenSolaris, OTC, SSG, Intel
>>> _______________________________________________
>>> pm-discuss mailing list
>>> pm-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>
>>--
>>
>>---------------------
>>     Julia Harper, julia.harper at oracle.com
>
>_______________________________________________
>pm-discuss mailing list
>pm-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/pm-discuss

Reply via email to