Hi Bill, Here I made a change to propose system-wide policy support. http://cr.opensolaris.org/~aubrey/sys_pm_policy_v1/ The user profile from /etc/power.conf is still passed to the kernel thru pm_ioctl, then call pm_set_system_policy(). Currently there is only cpu pm policy setting there, if memory/other devices need a bias as well, they can also be added to that function. cpu pm policy related implementation has minor change against last webrev, mcpu_pm_policy pointer has been moved from machcpu to mcpu_pm_mach_state structure according to your suggestion.
Any comments and suggestions are highly appreciated. Thanks, -Aubrey Li, Aubrey wrote: > >It looks like memory PM need such a bias as well. So I'd like to change >the proposal to use the keyword "sys-pm-policy" instead. The mechanism >will use the existing callb implementation to pass the user policy from >/etc/power.conf to the kernel and walk the module registered list to >call >module hook function to set the pm policy individually. > >I'm not sure if any other device driver need or be happy with this >proposal. >It would be great if the device driver developer can share some thoughts >here. > >Thanks, >-Aubrey > >Julia.Harper wrote: >> >>I assume that this knob (profile) when turned way down would basically >>put the >>system into "power savings" mode -- where the set of power states is >>restricted. >> That is, no matter how long the utilization level demands more power, >>the >>highest power states (for the cpus, memory, whatever) will never be >>entered. We >>should probably use terminology that makes this clear. >> >>-- jdh >> >> >>Liu, Jiang wrote: >>> I prefer the solution to introduce a global power profile for all >>devices. Currently >>> we need such a profile for CPUPM. In future when supporting memory >>power >>> management, we may need a similiar profile for memory PM. And user >>won't >>> like two variables/profiles for the same objective. >>> >>> Li, Aubrey <> wrote: >>>> Bill Holler wrote: >>>>> Hi, >>>>> >>>>> I forgot to mention that cpu_pm_policy is just a policy. >>>>> There is no guaranty it maps to a specific MSR or hardware >>>>> implementation. >>>> Yes, I would like to propose a new option for CPU power management >>>> policy. This policy is a CPU bias between performance and power, the >>>> future CPU power management enhancement work can be based on this >>>> policy. - the default policy should keep the current "out of the >box" >>>> behavior unchanged, we'll try to save more power without performance >>>> hurt. >>>> - there will be more power management futures coming on the future >>>> processor, like ENERGY_PERFORMANCE_BIAS, we can register these new >>>> futures under the policy framework, and offer a knob to the user to >>>> change these settings on the fly. >>>> - laptop users who want to prolong the battery life and less heat >and >>>> smaller fan noise may want the system to work in some edge situation: >>>> for example, currently CPU can work in the highest clock if cpupm is >>>> disabled, but no choice to let CPU always work in the lowest clock. >>>> Similarly, Always enter deepest c-state is another choice to save >>>> more power. What's more, power aware dispatcher could be more >>>> flexible to pick up CPU and dispatch thread if there is a policy >>>> indicator. - Some users doesn't care about power. Yes, we already >>>> have the options to let them to set ENERGY_PERFORMANCE_BIAS to be >>>> performance bias, to close c-state/p-state, and so on and so forth. >>>> But it's more friendly to the user to just change only one option. >>>> >>>> Here, the policy only focus on CPU. If you think we should have a >>>> policy for the memory, for the devices, or we should have a >>>> system-wide policy, let's do this. cpu_pm_policy can be one part of >>>> system-wide policy. >>>> If nobody have thoughts on it, I'll continue to prepare a PSARC file >>>> to add cpu_pm_policy keyword. >>>> >>>>> For example Solaris could be dynamically setting the >>>>> ENERGY_PERFORMANCE_BIAS register to different settings depending on >>>>> things such as system-load, >>>> Yes, such of these settings can be dynamically changed if we see the >>>> benefit. >>>> >>>>> the priority of the application being scheduled, a power policy of >>>>> the application, >>>> Making the thread power aware need another bunch of interfaces I >>>> think. For example, cmt_balance() can choose the different processor >>>> group according to the perf/power bias of the thread. >>>> >>>>> or power policy of the zone. >>>> Zone policy is an interesting topic. Different zone could have >>>> different CPU resource, or can share the global CPU resource, >>>> different zone could have different power policy, or they can >inherit >>>> the global cpu_pm_policy setting. The virtual container could have >>>> many, but the hardware resource is unique. I think this can be >>>> enhanced in the zone management, which will not be covered in my >>>> proposal, :) >>>> >>>> Thanks, >>>> -Aubrey >>>> >>>>> Regards, >>>>> Bill >>>>> >>>>> >>>>> On 03/03/10 16:21, Bill Holler wrote: >>>>>> +1. >>>>>> >>>>>> Hi Aubrey, >>>>>> >>>>>> I also think it is time to move forward with this proposal. >>>>>> Generally we want the system to work best "out of the box" >>>>>> with no tuning. On the other hand, vendors will keep improving >>>>>> products with new features, and there will always be some specific >>>>>> applications were custom settings may be better. I feel this >>>>>> proposal supports innovation and application specific >customization >>>>>> in line with the OpenSolaris community goals. >>>>>> >>>>>> This proposal applies to all types of CPUs. It uses >>"cpu_pm_policy" >>>>>> instead of for example mentioning a specific CPU's MSR. ;-) This >>>>>> proposal will be useful with other CPUs if/when they have hardware >>>>>> mechanisms for tuning power / performance. >>>>>> >>>>>> >>>>>> In the arc case we want to mention that there could be a policy >>>>>> conflict between this component setting and a system-power-policy, >>>>>> external Power Caping, etc. Generally we want users to use the >>>>>> default or a higher level policy such as the system power policy. >>>>>> Unfortunately the system power policy may not be fine-grain or >>>>>> diverse enough for some applications to specify cpu power policy. >>>>>> In that case cpu_pm_policy will be useful. My thought is: the >user >>>>>> must really know what they want if they specify a component policy >>>>>> such as cpu_pm_policy instead of just using the system power >>>>>> policy. For that reason I feel cpu_pm_policy should override the >>>>>> system-power-policy at the cpupm level. >>>>>> >>>>>> Power Caping is different. Power Capping is an external policy. >>It >>>>>> is currently "owned" by the SP external to the OS. Power Caping >>>>>> should override a local cpu_pm_policy. >>>>>> >>>>>> >>>>>> Implementation comments: >>>>>> IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state >>>>>> structure instead of in the machcpu. >>>>>> We may want to allow the user to specify a number instead of just >>>>>> Perf, Balanced, Power, Default? >>>>>> >>>>>> Regards, >>>>>> Bill >>>>>> >>>>>> >>>>>> On 02/20/10 18:43, Li, Aubrey wrote: >>>>>>> Hi Bill, >>>>>>> >>>>>>> I think it's time to continue this proposal, since b134 is closed >>>>>>> and the build is not limited now. power/perf bias setting is a >>>>>>> start point for future power related work, I'll prepare a PSARC >>>>>>> file for the new option if this is acceptable. No is also a good >>>>>>> answer with good reason. >>>>>>> >>>>>>> Thanks, >>>>>>> -Aubrey >>>>>>> >>>>>>> >>>>>>>> Bill.Holler Wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> This proposal is for a mechanism to set the new MSR >>>>>>>>> IA32_ENERGY_PERF_BIAS_MSR. This is a new hardware >>>>>>>>> feature. The MSR effects overall power/performance. >>>>>>>>> It gives a hint to the processor & package for desired >>>>>>>>> power/performance characteristics. It is related to p-states >>and >>>>>>>>> c-states (and may effect these features), but this feature can >>>>>>>>> have other socket/system-level effects as well. >>>>>>>>> The programmers guides do not go into details what the other >>>>>>>>> effects can be. :-( >>>>>>>>> >>>>>>>> The perf and power impact of this MSR is model specific. >>>>>>>> It's able to throttle turbo on WSM and probably help to do more >>>>>>>> hardware decision in future. For example, when the short >>interrupt >>>>>>>> storm is detected, it can demote CC6 request to CC3. >>>>>>>> >>>>>>>> >>>>>>>>> On 11/05/09 05:15, minskey guo wrote: >>>>>>>>> >>>>>>>>>> Jedy Wang ??: >>>>>>>>>> >>>>>>>>>>> Hi Li, >>>>>>>>>>> >>>>>>>>>>> As far as I know, gnome-power-manager has removed the support >>>>>>>>>>> for changing governor which is the same as profile I think. I >>>>>>>>>>> remember someone wrote a blog explaining the reason but I can >>>>>>>>>>> not find it now. >>>>>>>>>>> >>>>>>>> I >>>>>>>> >>>>>>>>>>> wonder why what makes us still need to implement this feature. >>>>>>>>>>> >>>>>>>>>> In linux world, there is ondemand governor in kernel. It sets >>>>>>>>>> cpu freqency according to cpu's current load. So, somebody >>>>>>>>>> consider that >>>>> eveybody >>>>>>>>>> should use that governor, and let CPUs finish their jobs asap >>>>>>>>>> and >>>>>>>>>> >>>>>>>> then >>>>>>>> >>>>>>>>>> enter >>>>>>>>>> into C states for power-saving. Comparing to P state, c-state >>>>>>>>>> does >>>>>>>>>> >>>>>>>> save >>>>>>>> >>>>>>>>>> more power. That's why gnome removed it. >>>>>>>>>> >>>>>>>> This is also model specific and depends on if the frequency and >>>>>>>> voltage and power are linear. That's true on latest processor >but >>>>>>>> not on earlier processor. >>>>>>>> >>>>>>>> I'm not sure why gnome removed it, but seems not a good idea to >>>>>>>> me. Some users want max perf and others want longer battery life. >>>>>>>> >>>>>>>> >>>>>>>>> Yes, a good p-state + c-state implementation is not easy to >tune >>>>>>>>> for more power savings. Running in lower p-states when a CPU >is >>>>>>>>> busy burns more power due to shorter time in deeper C-states. >>>>>>>>> Entering deeper C-states too aggressively also burns more power >>>>>>>>> (on both an idle and busy system) due to unnecessary wakeup >>>>>>>>> latency. ;-) Without knowing the details, it seems likely >that >>>>>>>>> the gnome-power-manager was removed because setting it made >>worse >>>>>>>>> decisions than a runtime prediction. >>>>>>>>> >>>>>>>>> >>>>>>>>> Solaris currently has mechanisms to turn P-state and deeper >>>>>>>>> C-state support on/off. >>>>>>>>> >>>>>>>>> A requirement is that the Energy Perf Bias MSR can be set on >>>>>>>>> systems not running a GUI. We would like to support a possible >>>>>>>>> future Gnome interface to set this MSR if/when it exists. The >>>>>>>>> proposal provides a mechanism that works on systems without >>>>>>>>> Gnome. >>>>>>>>> >>>>>>>> Right, most of servers do not run gnome. I don't expect gnome >>>>>>>> support but it would be great if it will, :-) >>>>>>>> >>>>>>>> IMHO, we should use this global cpu power policy setting instead >>>>>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to the >>>>>>>> user. The users just want more perf or more power, I think they >>>>>>>> don't care if the system support p/c- state at the same time. >>>>>>>> "cpupm" is a confusion only for p-state. we call "cpupm" before >>>>>>>> we have deep idle support. Actually cpu-deep-idle is also one >>>>>>>> part of cpu power management, :) >>>>>>>> >>>>>>>>>> but, someone doesn't care power-saving, when comparing it to >>>>>>>>>> other factors. For example, if you are plagued by the noise of >>>>>>>>>> CPU fan, >>>>> and >>>>>>>>>> expect quiet it then you can lower cpu frequency, which >results >>>>>>>>>> in lower heat, and then fan can be stopped. >>>>>>>>>> >>>>>>>>>> personally, I vote +1 for this project if I could vote, but I >>>>>>>>>> don't >>>>>>>>>> >>>>>>>> like >>>>>>>> >>>>>>>>>> the names of "perf-bias" etc :) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR >>>>>>>>>> comes ? Is it a part of IPS feature ? >>>>>>>>>> >>>>>>>>> Intel's Software Developer's Manuals 2A describes CPUID >>detection >>>>>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the MSR. >>>>>>>>> http://www.intel.com/products/processor/manuals/ >>>>>>>>> Sorry, I do not know what IPS stands for? >>>>>>>>> >>>>>>>> cough, cough, IPS is not a released feature and should not be >>>>>>>> discussed here, ;p >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Aubrey >>>>>>>> >>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Bill >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -minskey >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I remember why already support 2 profile through gnome-power- >>>>>>>>>>> manager >>>>>>>>>>> >>>>>>>> on >>>>>>>> >>>>>>>>>>> Solaris. What's the difference between them? >>>>>>>>>>> >>>>>>>>>>> I do not understand the exact meaning perf-bias, balanced and >>>>>>>>>>> power- >>>>>>>>>>> >>>>>>>> bias >>>>>>>> >>>>>>>>>>> either. Does not perf-bias means the cpu frequency will be >>>>>>>>>>> always >>>>> at >>>>>>>> the >>>>>>>> >>>>>>>>>>> highest level? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Jedy >>>>>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> When we enable intel energy performance bias feature, we >>>>>>>>>>>> found the power profile implementation is necessary. Here I >>>>>>>>>>>> did a draft for cpu level power policy. >>>>>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/ >>>>>>>>>>>> >>>>>>>>>>>> The proposal added a new keyword to /etc/power.conf >>>>>>>>>>>> "cpu-power-policy", And we have 4 options for this new >>>>>>>>>>>> keyword: 1) perf-bias 2) balanced >>>>>>>>>>>> 3) power-bias >>>>>>>>>>>> 4) default, the same as perf-bias. >>>>>>>>>>>> >>>>>>>>>>>> /etc/power.conf accepts the user input and passes the >>prefered >>>>>>>>>>>> >>>>>>>> policy >>>>>>>> >>>>>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback >to >>>>>>>>>>>> walk >>>>>>>>>>>> >>>>>>>> a >>>>>>>> >>>>>>>>>>>> cpu >>>>>>>>>>>> power policy list. Every cpu pm feature which wants to be >>>>>>>>>>>> adjusted >>>>>>>>>>>> >>>>>>>> by >>>>>>>> >>>>>>>>>>>> this option and verified to be supported will register its >>>>>>>>>>>> callback function to the list, so that it can be called and >>>>>>>>>>>> adjusted by pmconfig. >>>>>>>>>>>> -------------------------------------------------------- >>>>>>>>>>>> /etc/power.conf | pm_ioctl(cpu_power_policy, policy) >>>>>>>>>>>> | >>>>>>>>>>>> cpu_power_policy_callb (policy) >>>>>>>>>>>> | >>>>>>>>>>>> ----> registered pm feature callback 1 (ENERGY_PERF_BIAS) >>>>>>>>>>>> | >>>>>>>>>>>> ----> registered pm feature callback 2 >>>>>>>>>>>> ... >>>>>>>>>>>> --------------------------------------------------------- >>>>>>>>>>>> Currently, only energy_perf_bias feature is registered, >>>>>>>>>>>> because my intention is to support adjusting >energy_perf_bias >>>>>>>>>>>> MSR without reboot. I guess >>>>>>>>>>>> >>>>>>>> we >>>>>>>> >>>>>>>>>>>> probably >>>>>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state >>>>>>>>>>>> support, my quick thought is, this option will override >>>>>>>>>>>> "cpupm" and "cpu-deep-idle" setting. >>>>>>>>>>>> >>>>>>>>>>>> Welcome your any comments and suggestions. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -Aubrey >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> pm-discuss mailing list >>>>>>>>>>>> pm-discuss at opensolaris.org >>>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> pm-discuss mailing list >>>>>>>>>>> pm-discuss at opensolaris.org >>>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> pm-discuss mailing list >>>>>>>>>> pm-discuss at opensolaris.org >>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> pm-discuss mailing list >>>>>>>>> pm-discuss at opensolaris.org >>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> pm-discuss mailing list >>>>>>>> pm-discuss at opensolaris.org >>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>> >>>>>> _______________________________________________ >>>>>> pm-discuss mailing list >>>>>> pm-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>> _______________________________________________ >>>>> pm-discuss mailing list >>>>> pm-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>> _______________________________________________ >>>> tesla-dev mailing list >>>> tesla-dev at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev >>> >>> Liu Jiang (Gerry) >>> OpenSolaris, OTC, SSG, Intel >>> _______________________________________________ >>> pm-discuss mailing list >>> pm-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >> >>-- >> >>--------------------- >> Julia Harper, julia.harper at oracle.com > >_______________________________________________ >pm-discuss mailing list >pm-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/pm-discuss
