Bill Holler wrote:
>
>Hi,
>
>I forgot to mention that cpu_pm_policy is just a policy.
>There is no guaranty it maps to a specific MSR or hardware
>implementation.

Yes, I would like to propose a new option for CPU power 
management policy. This policy is a CPU bias between performance
and power, the future CPU power management enhancement work
can be based on this policy.
- the default policy should keep the current "out of the box"
behavior unchanged, we'll try to save more power without performance
hurt.
- there will be more power management futures coming on the future
processor, like ENERGY_PERFORMANCE_BIAS, we can register these new
futures under the policy framework, and offer a knob to the user to
change these settings on the fly.
- laptop users who want to prolong the battery life and less heat and
smaller fan noise may want the system to work in some edge situation:
for example, currently CPU can work in the highest clock if cpupm is 
disabled, but no choice to let CPU always work in the lowest clock.
Similarly, Always enter deepest c-state is another choice to save more
power. What's more, power aware dispatcher could be more flexible to
pick up CPU and dispatch thread if there is a policy indicator.
- Some users doesn't care about power. Yes, we already have the options
to let them to set ENERGY_PERFORMANCE_BIAS to be performance bias, to
close c-state/p-state, and so on and so forth. But it's more friendly
to the user to just change only one option.

Here, the policy only focus on CPU. If you think we should have a policy
for the memory, for the devices, or we should have a system-wide policy,
let's do this. cpu_pm_policy can be one part of system-wide policy.
If nobody have thoughts on it, I'll continue to prepare a PSARC file to
add cpu_pm_policy keyword.

>
>For example Solaris could be dynamically setting the
>ENERGY_PERFORMANCE_BIAS register to different
>settings depending on things such as system-load,

Yes, such of these settings can be dynamically changed if we see the benefit.

>the priority of the application being scheduled, a power policy
>of the application,

Making the thread power aware need another bunch of interfaces I think. For
example, cmt_balance() can choose the different processor group according to
the perf/power bias of the thread.

> or power policy of the zone.

Zone policy is an interesting topic. Different zone could have different CPU
resource, or can share the global CPU resource, different zone could have 
different
power policy, or they can inherit the global cpu_pm_policy setting. The virtual 
container
could have many, but the hardware resource is unique. I think this can be 
enhanced
in the zone management, which will not be covered in my proposal, :)

Thanks,
-Aubrey

>
>Regards,
>Bill
>
>
>On 03/03/10 16:21, Bill Holler wrote:
>> +1.
>>
>> Hi Aubrey,
>>
>> I also think it is time to move forward with this proposal.
>> Generally we want the system to work best "out of the box"
>> with no tuning.  On the other hand, vendors will keep
>> improving products with new features, and there will
>> always be some specific applications were custom settings
>> may be better.  I feel this proposal supports innovation and
>> application specific customization in line with the
>> OpenSolaris community goals.
>>
>> This proposal applies to all types of CPUs.  It uses
>> "cpu_pm_policy" instead of for example mentioning a
>> specific CPU's MSR.  ;-)  This proposal will be useful
>> with other CPUs if/when they have hardware mechanisms
>> for tuning power / performance.
>>
>>
>> In the arc case we want to mention that there could
>> be a policy conflict between this component setting and
>> a system-power-policy, external Power Caping, etc.
>> Generally we want users to use the default or a higher
>> level policy such as the system power policy.
>> Unfortunately the system power policy may not be
>> fine-grain or diverse enough for some applications to
>> specify cpu power policy.  In that case cpu_pm_policy
>> will be useful.  My thought is: the user must really know
>> what they want if they specify a component policy
>> such as cpu_pm_policy instead of just using the
>> system power policy.  For that reason I feel cpu_pm_policy
>> should override the system-power-policy at the cpupm level.
>>
>> Power Caping is different.  Power Capping is an external
>> policy.  It is currently "owned" by the SP external to the
>> OS.  Power Caping should override a local cpu_pm_policy.
>>
>>
>> Implementation comments:
>> IMHO mcpu_pm_policy pointer should be in the
>> mcpu_pm_mach_state structure instead of in the machcpu.
>> We may want to allow the user to specify a number
>> instead of just Perf, Balanced, Power, Default?
>>
>> Regards,
>> Bill
>>
>>
>> On 02/20/10 18:43, Li, Aubrey wrote:
>>> Hi Bill,
>>>
>>> I think it's time to continue this proposal, since b134 is closed and
>>> the
>>> build is not limited now. power/perf bias setting is a start point
>>> for future power related work, I'll prepare a PSARC file for the new
>>> option if
>>> this is acceptable. No is also a good answer with good reason.
>>>
>>> Thanks,
>>> -Aubrey
>>>
>>>
>>>> Bill.Holler Wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> This proposal is for a mechanism to set the new MSR
>>>>> IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
>>>>> feature.  The MSR effects overall power/performance.
>>>>> It gives a hint to the processor & package for desired
>>>>> power/performance characteristics.  It is related to p-states
>>>>> and c-states (and may effect these features), but this feature
>>>>> can have other socket/system-level effects as well.
>>>>> The programmers guides do not go into details what the
>>>>> other effects can be.  :-(
>>>>>
>>>> The perf and power impact of this MSR is model specific.
>>>> It's able to throttle turbo on WSM and probably help to do more
>>>> hardware decision in future. For example, when the short interrupt
>>>> storm is detected, it can demote CC6 request to CC3.
>>>>
>>>>
>>>>> On 11/05/09 05:15, minskey guo wrote:
>>>>>
>>>>>> Jedy Wang ??:
>>>>>>
>>>>>>> Hi Li,
>>>>>>>
>>>>>>> As far as I know, gnome-power-manager has removed the support for
>>>>>>> changing governor which is the same as profile I think. I
>remember
>>>>>>> someone wrote a blog explaining the reason but I can not find it
>>>>>>> now.
>>>>>>>
>>>> I
>>>>
>>>>>>> wonder why what makes us still need to implement this feature.
>>>>>>>
>>>>>> In linux world, there is ondemand governor in kernel. It sets cpu
>>>>>> freqency
>>>>>> according to cpu's current load. So, somebody consider that
>eveybody
>>>>>> should use that governor, and let CPUs finish their jobs asap and
>>>>>>
>>>> then
>>>>
>>>>>> enter
>>>>>> into C states for power-saving. Comparing to P state, c-state does
>>>>>>
>>>> save
>>>>
>>>>>> more power. That's why gnome removed it.
>>>>>>
>>>> This is also model specific and depends on if the frequency and
>voltage
>>>> and
>>>> power are linear. That's true on latest processor but not on earlier
>>>> processor.
>>>>
>>>> I'm not sure why gnome removed it, but seems not a good idea to me.
>>>> Some
>>>> users want max perf and others want longer battery life.
>>>>
>>>>
>>>>> Yes, a good p-state + c-state implementation is not easy
>>>>> to tune for more power savings.  Running in lower p-states
>>>>> when a CPU is busy burns more power due to shorter time
>>>>> in deeper C-states.  Entering deeper C-states too aggressively
>>>>> also burns more power (on both an idle and busy system) due
>>>>> to unnecessary wakeup latency.  ;-)  Without knowing the
>>>>> details, it seems likely that the gnome-power-manager
>>>>> was removed because setting it made worse decisions
>>>>> than a runtime prediction.
>>>>>
>>>>>
>>>>> Solaris currently has mechanisms to turn P-state and
>>>>> deeper C-state support on/off.
>>>>>
>>>>> A requirement is that the Energy Perf Bias MSR can be
>>>>> set on systems not running a GUI.  We would like to support
>>>>> a possible future Gnome interface to set this MSR if/when it
>>>>> exists.  The proposal provides a mechanism that works on
>>>>> systems without Gnome.
>>>>>
>>>> Right, most of servers do not run gnome. I don't expect gnome
>support
>>>> but it would be great if it will, :-)
>>>>
>>>> IMHO, we should use this global cpu power policy setting instead of
>>>> "cpupm"
>>>> and "cpu-deep-idle", this is more friendly to the user. The users
>just
>>>> want more
>>>> perf or more power, I think they don't care if the system support
>p/c-
>>>> state at the
>>>> same time. "cpupm" is a confusion only for p-state. we call "cpupm"
>>>> before we
>>>> have deep idle support. Actually cpu-deep-idle is also one part of
>cpu
>>>> power
>>>> management, :)
>>>>
>>>>>
>>>>>> but, someone doesn't care power-saving, when comparing it to other
>>>>>> factors. For example, if you are plagued by the noise of CPU fan,
>and
>>>>>> expect quiet it then you can lower cpu frequency, which results in
>>>>>> lower heat, and then fan can be stopped.
>>>>>>
>>>>>> personally, I vote +1 for this project if I could vote, but I
>don't
>>>>>>
>>>> like
>>>>
>>>>>> the names of "perf-bias" etc :)
>>>>>>
>>>>>>
>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR
>>>>>> comes ? Is it a part of IPS feature ?
>>>>>>
>>>>> Intel's Software Developer's Manuals 2A describes
>>>>> CPUID detection of IA32_ENERGY_PERF_BIAS_MSR
>>>>> and volume 3A describes the MSR.
>>>>> http://www.intel.com/products/processor/manuals/
>>>>> Sorry, I do not know what IPS stands for?
>>>>>
>>>> cough, cough, IPS is not a released feature and should not be
>discussed
>>>> here, ;p
>>>>
>>>> Thanks,
>>>> -Aubrey
>>>>
>>>>
>>>>> Regards,
>>>>> Bill
>>>>>
>>>>>
>>>>>
>>>>>> -minskey
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I remember why already support 2 profile through gnome-power-
>manager
>>>>>>>
>>>> on
>>>>
>>>>>>> Solaris. What's the difference between them?
>>>>>>>
>>>>>>> I do not understand the exact meaning perf-bias, balanced and
>power-
>>>>>>>
>>>> bias
>>>>
>>>>>>> either. Does not perf-bias means the cpu frequency will be always
>at
>>>>>>>
>>>> the
>>>>
>>>>>>> highest level?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Jedy
>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> When we enable intel energy performance bias feature, we found
>the
>>>>>>>> power
>>>>>>>> profile implementation is necessary. Here I did a draft for cpu
>>>>>>>> level power policy.
>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/
>>>>>>>>
>>>>>>>> The proposal added a new keyword to /etc/power.conf
>>>>>>>> "cpu-power-policy",
>>>>>>>> And we have 4 options for this new keyword:
>>>>>>>> 1) perf-bias
>>>>>>>> 2) balanced
>>>>>>>> 3) power-bias
>>>>>>>> 4) default, the same as perf-bias.
>>>>>>>>
>>>>>>>> /etc/power.conf accepts the user input and passes the prefered
>>>>>>>>
>>>> policy
>>>>
>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback to
>walk
>>>>>>>>
>>>> a
>>>>
>>>>>>>> cpu
>>>>>>>> power policy list. Every cpu pm feature which wants to be
>adjusted
>>>>>>>>
>>>> by
>>>>
>>>>>>>> this option and verified to be supported will register its
>callback
>>>>>>>> function
>>>>>>>> to the list, so that it can be called and adjusted by pmconfig.
>>>>>>>> --------------------------------------------------------
>>>>>>>> /etc/power.conf
>>>>>>>>     |
>>>>>>>>     pm_ioctl(cpu_power_policy, policy)
>>>>>>>>     |
>>>>>>>> cpu_power_policy_callb (policy)
>>>>>>>>     |
>>>>>>>>     ----> registered pm feature callback 1 (ENERGY_PERF_BIAS)
>>>>>>>>     |
>>>>>>>>     ----> registered pm feature callback 2
>>>>>>>>     ...
>>>>>>>> ---------------------------------------------------------
>>>>>>>> Currently, only energy_perf_bias feature is registered, because
>my
>>>>>>>> intention is
>>>>>>>> to support adjusting energy_perf_bias MSR without reboot. I
>guess
>>>>>>>>
>>>> we
>>>>
>>>>>>>> probably
>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state
>support,
>>>>>>>> my quick thought is, this option will override "cpupm" and
>>>>>>>> "cpu-deep-idle" setting.
>>>>>>>>
>>>>>>>> Welcome your any comments and suggestions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Aubrey
>>>>>>>> _______________________________________________
>>>>>>>> pm-discuss mailing list
>>>>>>>> pm-discuss at opensolaris.org
>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pm-discuss mailing list
>>>>>>> pm-discuss at opensolaris.org
>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> pm-discuss mailing list
>>>>>> pm-discuss at opensolaris.org
>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>>
>>>>> _______________________________________________
>>>>> pm-discuss mailing list
>>>>> pm-discuss at opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>>
>>>> _______________________________________________
>>>> pm-discuss mailing list
>>>> pm-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>>>>
>>
>> _______________________________________________
>> pm-discuss mailing list
>> pm-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>
>_______________________________________________
>pm-discuss mailing list
>pm-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/pm-discuss

Reply via email to