+1. Hi Aubrey,
I also think it is time to move forward with this proposal. Generally we want the system to work best "out of the box" with no tuning. On the other hand, vendors will keep improving products with new features, and there will always be some specific applications were custom settings may be better. I feel this proposal supports innovation and application specific customization in line with the OpenSolaris community goals. This proposal applies to all types of CPUs. It uses "cpu_pm_policy" instead of for example mentioning a specific CPU's MSR. ;-) This proposal will be useful with other CPUs if/when they have hardware mechanisms for tuning power / performance. In the arc case we want to mention that there could be a policy conflict between this component setting and a system-power-policy, external Power Caping, etc. Generally we want users to use the default or a higher level policy such as the system power policy. Unfortunately the system power policy may not be fine-grain or diverse enough for some applications to specify cpu power policy. In that case cpu_pm_policy will be useful. My thought is: the user must really know what they want if they specify a component policy such as cpu_pm_policy instead of just using the system power policy. For that reason I feel cpu_pm_policy should override the system-power-policy at the cpupm level. Power Caping is different. Power Capping is an external policy. It is currently "owned" by the SP external to the OS. Power Caping should override a local cpu_pm_policy. Implementation comments: IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state structure instead of in the machcpu. We may want to allow the user to specify a number instead of just Perf, Balanced, Power, Default? Regards, Bill On 02/20/10 18:43, Li, Aubrey wrote: > Hi Bill, > > I think it's time to continue this proposal, since b134 is closed and the > build is not limited now. power/perf bias setting is a start point for > future power related work, I'll prepare a PSARC file for the new option if > this is acceptable. No is also a good answer with good reason. > > Thanks, > -Aubrey > > >> Bill.Holler Wrote: >> >>> Hi, >>> >>> This proposal is for a mechanism to set the new MSR >>> IA32_ENERGY_PERF_BIAS_MSR. This is a new hardware >>> feature. The MSR effects overall power/performance. >>> It gives a hint to the processor & package for desired >>> power/performance characteristics. It is related to p-states >>> and c-states (and may effect these features), but this feature >>> can have other socket/system-level effects as well. >>> The programmers guides do not go into details what the >>> other effects can be. :-( >>> >> The perf and power impact of this MSR is model specific. >> It's able to throttle turbo on WSM and probably help to do more >> hardware decision in future. For example, when the short interrupt >> storm is detected, it can demote CC6 request to CC3. >> >> >>> On 11/05/09 05:15, minskey guo wrote: >>> >>>> Jedy Wang ??: >>>> >>>>> Hi Li, >>>>> >>>>> As far as I know, gnome-power-manager has removed the support for >>>>> changing governor which is the same as profile I think. I remember >>>>> someone wrote a blog explaining the reason but I can not find it now. >>>>> >> I >> >>>>> wonder why what makes us still need to implement this feature. >>>>> >>>> In linux world, there is ondemand governor in kernel. It sets cpu >>>> freqency >>>> according to cpu's current load. So, somebody consider that eveybody >>>> should use that governor, and let CPUs finish their jobs asap and >>>> >> then >> >>>> enter >>>> into C states for power-saving. Comparing to P state, c-state does >>>> >> save >> >>>> more power. That's why gnome removed it. >>>> >> This is also model specific and depends on if the frequency and voltage >> and >> power are linear. That's true on latest processor but not on earlier >> processor. >> >> I'm not sure why gnome removed it, but seems not a good idea to me. Some >> users want max perf and others want longer battery life. >> >> >>> Yes, a good p-state + c-state implementation is not easy >>> to tune for more power savings. Running in lower p-states >>> when a CPU is busy burns more power due to shorter time >>> in deeper C-states. Entering deeper C-states too aggressively >>> also burns more power (on both an idle and busy system) due >>> to unnecessary wakeup latency. ;-) Without knowing the >>> details, it seems likely that the gnome-power-manager >>> was removed because setting it made worse decisions >>> than a runtime prediction. >>> >>> >>> Solaris currently has mechanisms to turn P-state and >>> deeper C-state support on/off. >>> >>> A requirement is that the Energy Perf Bias MSR can be >>> set on systems not running a GUI. We would like to support >>> a possible future Gnome interface to set this MSR if/when it >>> exists. The proposal provides a mechanism that works on >>> systems without Gnome. >>> >> Right, most of servers do not run gnome. I don't expect gnome support >> but it would be great if it will, :-) >> >> IMHO, we should use this global cpu power policy setting instead of >> "cpupm" >> and "cpu-deep-idle", this is more friendly to the user. The users just >> want more >> perf or more power, I think they don't care if the system support p/c- >> state at the >> same time. "cpupm" is a confusion only for p-state. we call "cpupm" >> before we >> have deep idle support. Actually cpu-deep-idle is also one part of cpu >> power >> management, :) >> >>> >>>> but, someone doesn't care power-saving, when comparing it to other >>>> factors. For example, if you are plagued by the noise of CPU fan, and >>>> expect quiet it then you can lower cpu frequency, which results in >>>> lower heat, and then fan can be stopped. >>>> >>>> personally, I vote +1 for this project if I could vote, but I don't >>>> >> like >> >>>> the names of "perf-bias" etc :) >>>> >>>> >>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR >>>> comes ? Is it a part of IPS feature ? >>>> >>> Intel's Software Developer's Manuals 2A describes >>> CPUID detection of IA32_ENERGY_PERF_BIAS_MSR >>> and volume 3A describes the MSR. >>> http://www.intel.com/products/processor/manuals/ >>> Sorry, I do not know what IPS stands for? >>> >> cough, cough, IPS is not a released feature and should not be discussed >> here, ;p >> >> Thanks, >> -Aubrey >> >> >>> Regards, >>> Bill >>> >>> >>> >>>> -minskey >>>> >>>> >>>> >>>> >>>>> I remember why already support 2 profile through gnome-power-manager >>>>> >> on >> >>>>> Solaris. What's the difference between them? >>>>> >>>>> I do not understand the exact meaning perf-bias, balanced and power- >>>>> >> bias >> >>>>> either. Does not perf-bias means the cpu frequency will be always at >>>>> >> the >> >>>>> highest level? >>>>> >>>>> Regards, >>>>> >>>>> Jedy >>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> When we enable intel energy performance bias feature, we found the >>>>>> power >>>>>> profile implementation is necessary. Here I did a draft for cpu >>>>>> level power policy. >>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/ >>>>>> >>>>>> The proposal added a new keyword to /etc/power.conf >>>>>> "cpu-power-policy", >>>>>> And we have 4 options for this new keyword: >>>>>> 1) perf-bias >>>>>> 2) balanced >>>>>> 3) power-bias >>>>>> 4) default, the same as perf-bias. >>>>>> >>>>>> /etc/power.conf accepts the user input and passes the prefered >>>>>> >> policy >> >>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback to walk >>>>>> >> a >> >>>>>> cpu >>>>>> power policy list. Every cpu pm feature which wants to be adjusted >>>>>> >> by >> >>>>>> this option and verified to be supported will register its callback >>>>>> function >>>>>> to the list, so that it can be called and adjusted by pmconfig. >>>>>> -------------------------------------------------------- >>>>>> /etc/power.conf >>>>>> | >>>>>> pm_ioctl(cpu_power_policy, policy) >>>>>> | >>>>>> cpu_power_policy_callb (policy) >>>>>> | >>>>>> ----> registered pm feature callback 1 (ENERGY_PERF_BIAS) >>>>>> | >>>>>> ----> registered pm feature callback 2 >>>>>> ... >>>>>> --------------------------------------------------------- >>>>>> Currently, only energy_perf_bias feature is registered, because my >>>>>> intention is >>>>>> to support adjusting energy_perf_bias MSR without reboot. I guess >>>>>> >> we >> >>>>>> probably >>>>>> can add p/t/c-state support later. When we add p/t/c-state support, >>>>>> my quick thought is, this option will override "cpupm" and >>>>>> "cpu-deep-idle" setting. >>>>>> >>>>>> Welcome your any comments and suggestions. >>>>>> >>>>>> Thanks, >>>>>> -Aubrey >>>>>> _______________________________________________ >>>>>> pm-discuss mailing list >>>>>> pm-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> pm-discuss mailing list >>>>> pm-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> pm-discuss mailing list >>>> pm-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>> >>> _______________________________________________ >>> pm-discuss mailing list >>> pm-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>> >> _______________________________________________ >> pm-discuss mailing list >> pm-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>
