I prefer the solution to introduce a global power profile for all devices. Currently we need such a profile for CPUPM. In future when supporting memory power management, we may need a similiar profile for memory PM. And user won't like two variables/profiles for the same objective.
Li, Aubrey <> wrote: > Bill Holler wrote: >> >> Hi, >> >> I forgot to mention that cpu_pm_policy is just a policy. >> There is no guaranty it maps to a specific MSR or hardware >> implementation. > > Yes, I would like to propose a new option for CPU power management > policy. This policy is a CPU bias between performance and power, the > future CPU power management enhancement work can be based on this > policy. - the default policy should keep the current "out of the box" > behavior unchanged, we'll try to save more power without performance > hurt. > - there will be more power management futures coming on the future > processor, like ENERGY_PERFORMANCE_BIAS, we can register these new > futures under the policy framework, and offer a knob to the user to > change these settings on the fly. > - laptop users who want to prolong the battery life and less heat and > smaller fan noise may want the system to work in some edge situation: > for example, currently CPU can work in the highest clock if cpupm is > disabled, but no choice to let CPU always work in the lowest clock. > Similarly, Always enter deepest c-state is another choice to save > more power. What's more, power aware dispatcher could be more > flexible to pick up CPU and dispatch thread if there is a policy > indicator. - Some users doesn't care about power. Yes, we already > have the options to let them to set ENERGY_PERFORMANCE_BIAS to be > performance bias, to close c-state/p-state, and so on and so forth. > But it's more friendly to the user to just change only one option. > > Here, the policy only focus on CPU. If you think we should have a > policy for the memory, for the devices, or we should have a > system-wide policy, let's do this. cpu_pm_policy can be one part of > system-wide policy. > If nobody have thoughts on it, I'll continue to prepare a PSARC file > to add cpu_pm_policy keyword. > >> >> For example Solaris could be dynamically setting the >> ENERGY_PERFORMANCE_BIAS register to different settings depending on >> things such as system-load, > > Yes, such of these settings can be dynamically changed if we see the > benefit. > >> the priority of the application being scheduled, a power policy of >> the application, > > Making the thread power aware need another bunch of interfaces I > think. For example, cmt_balance() can choose the different processor > group according to the perf/power bias of the thread. > >> or power policy of the zone. > > Zone policy is an interesting topic. Different zone could have > different CPU resource, or can share the global CPU resource, > different zone could have different power policy, or they can inherit > the global cpu_pm_policy setting. The virtual container could have > many, but the hardware resource is unique. I think this can be > enhanced in the zone management, which will not be covered in my > proposal, :) > > Thanks, > -Aubrey > >> >> Regards, >> Bill >> >> >> On 03/03/10 16:21, Bill Holler wrote: >>> +1. >>> >>> Hi Aubrey, >>> >>> I also think it is time to move forward with this proposal. >>> Generally we want the system to work best "out of the box" >>> with no tuning. On the other hand, vendors will keep improving >>> products with new features, and there will always be some specific >>> applications were custom settings may be better. I feel this >>> proposal supports innovation and application specific customization >>> in line with the OpenSolaris community goals. >>> >>> This proposal applies to all types of CPUs. It uses "cpu_pm_policy" >>> instead of for example mentioning a specific CPU's MSR. ;-) This >>> proposal will be useful with other CPUs if/when they have hardware >>> mechanisms for tuning power / performance. >>> >>> >>> In the arc case we want to mention that there could be a policy >>> conflict between this component setting and a system-power-policy, >>> external Power Caping, etc. Generally we want users to use the >>> default or a higher level policy such as the system power policy. >>> Unfortunately the system power policy may not be fine-grain or >>> diverse enough for some applications to specify cpu power policy. >>> In that case cpu_pm_policy will be useful. My thought is: the user >>> must really know what they want if they specify a component policy >>> such as cpu_pm_policy instead of just using the system power >>> policy. For that reason I feel cpu_pm_policy should override the >>> system-power-policy at the cpupm level. >>> >>> Power Caping is different. Power Capping is an external policy. It >>> is currently "owned" by the SP external to the OS. Power Caping >>> should override a local cpu_pm_policy. >>> >>> >>> Implementation comments: >>> IMHO mcpu_pm_policy pointer should be in the mcpu_pm_mach_state >>> structure instead of in the machcpu. >>> We may want to allow the user to specify a number instead of just >>> Perf, Balanced, Power, Default? >>> >>> Regards, >>> Bill >>> >>> >>> On 02/20/10 18:43, Li, Aubrey wrote: >>>> Hi Bill, >>>> >>>> I think it's time to continue this proposal, since b134 is closed >>>> and the build is not limited now. power/perf bias setting is a >>>> start point for future power related work, I'll prepare a PSARC >>>> file for the new option if this is acceptable. No is also a good >>>> answer with good reason. >>>> >>>> Thanks, >>>> -Aubrey >>>> >>>> >>>>> Bill.Holler Wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> This proposal is for a mechanism to set the new MSR >>>>>> IA32_ENERGY_PERF_BIAS_MSR. This is a new hardware >>>>>> feature. The MSR effects overall power/performance. >>>>>> It gives a hint to the processor & package for desired >>>>>> power/performance characteristics. It is related to p-states and >>>>>> c-states (and may effect these features), but this feature can >>>>>> have other socket/system-level effects as well. >>>>>> The programmers guides do not go into details what the other >>>>>> effects can be. :-( >>>>>> >>>>> The perf and power impact of this MSR is model specific. >>>>> It's able to throttle turbo on WSM and probably help to do more >>>>> hardware decision in future. For example, when the short interrupt >>>>> storm is detected, it can demote CC6 request to CC3. >>>>> >>>>> >>>>>> On 11/05/09 05:15, minskey guo wrote: >>>>>> >>>>>>> Jedy Wang ??: >>>>>>> >>>>>>>> Hi Li, >>>>>>>> >>>>>>>> As far as I know, gnome-power-manager has removed the support >>>>>>>> for changing governor which is the same as profile I think. I >>>>>>>> remember someone wrote a blog explaining the reason but I can >>>>>>>> not find it now. >>>>>>>> >>>>> I >>>>> >>>>>>>> wonder why what makes us still need to implement this feature. >>>>>>>> >>>>>>> In linux world, there is ondemand governor in kernel. It sets >>>>>>> cpu freqency according to cpu's current load. So, somebody >>>>>>> consider that >> eveybody >>>>>>> should use that governor, and let CPUs finish their jobs asap >>>>>>> and >>>>>>> >>>>> then >>>>> >>>>>>> enter >>>>>>> into C states for power-saving. Comparing to P state, c-state >>>>>>> does >>>>>>> >>>>> save >>>>> >>>>>>> more power. That's why gnome removed it. >>>>>>> >>>>> This is also model specific and depends on if the frequency and >>>>> voltage and power are linear. That's true on latest processor but >>>>> not on earlier processor. >>>>> >>>>> I'm not sure why gnome removed it, but seems not a good idea to >>>>> me. Some users want max perf and others want longer battery life. >>>>> >>>>> >>>>>> Yes, a good p-state + c-state implementation is not easy to tune >>>>>> for more power savings. Running in lower p-states when a CPU is >>>>>> busy burns more power due to shorter time in deeper C-states. >>>>>> Entering deeper C-states too aggressively also burns more power >>>>>> (on both an idle and busy system) due to unnecessary wakeup >>>>>> latency. ;-) Without knowing the details, it seems likely that >>>>>> the gnome-power-manager was removed because setting it made worse >>>>>> decisions than a runtime prediction. >>>>>> >>>>>> >>>>>> Solaris currently has mechanisms to turn P-state and deeper >>>>>> C-state support on/off. >>>>>> >>>>>> A requirement is that the Energy Perf Bias MSR can be set on >>>>>> systems not running a GUI. We would like to support a possible >>>>>> future Gnome interface to set this MSR if/when it exists. The >>>>>> proposal provides a mechanism that works on systems without >>>>>> Gnome. >>>>>> >>>>> Right, most of servers do not run gnome. I don't expect gnome >>>>> support but it would be great if it will, :-) >>>>> >>>>> IMHO, we should use this global cpu power policy setting instead >>>>> of "cpupm" and "cpu-deep-idle", this is more friendly to the >>>>> user. The users just want more perf or more power, I think they >>>>> don't care if the system support p/c- state at the same time. >>>>> "cpupm" is a confusion only for p-state. we call "cpupm" before >>>>> we have deep idle support. Actually cpu-deep-idle is also one >>>>> part of cpu power management, :) >>>>> >>>>>> >>>>>>> but, someone doesn't care power-saving, when comparing it to >>>>>>> other factors. For example, if you are plagued by the noise of >>>>>>> CPU fan, >> and >>>>>>> expect quiet it then you can lower cpu frequency, which results >>>>>>> in lower heat, and then fan can be stopped. >>>>>>> >>>>>>> personally, I vote +1 for this project if I could vote, but I >>>>>>> don't >>>>>>> >>>>> like >>>>> >>>>>>> the names of "perf-bias" etc :) >>>>>>> >>>>>>> >>>>>>> Besides, can somebody tell me where IA32_ENERGY_PERF_BIAS_MSR >>>>>>> comes ? Is it a part of IPS feature ? >>>>>>> >>>>>> Intel's Software Developer's Manuals 2A describes CPUID detection >>>>>> of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the MSR. >>>>>> http://www.intel.com/products/processor/manuals/ >>>>>> Sorry, I do not know what IPS stands for? >>>>>> >>>>> cough, cough, IPS is not a released feature and should not be >>>>> discussed here, ;p >>>>> >>>>> Thanks, >>>>> -Aubrey >>>>> >>>>> >>>>>> Regards, >>>>>> Bill >>>>>> >>>>>> >>>>>> >>>>>>> -minskey >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I remember why already support 2 profile through gnome-power- >>>>>>>> manager >>>>>>>> >>>>> on >>>>> >>>>>>>> Solaris. What's the difference between them? >>>>>>>> >>>>>>>> I do not understand the exact meaning perf-bias, balanced and >>>>>>>> power- >>>>>>>> >>>>> bias >>>>> >>>>>>>> either. Does not perf-bias means the cpu frequency will be >>>>>>>> always >> at >>>>>>>> >>>>> the >>>>> >>>>>>>> highest level? >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Jedy >>>>>>>> On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> When we enable intel energy performance bias feature, we >>>>>>>>> found the power profile implementation is necessary. Here I >>>>>>>>> did a draft for cpu level power policy. >>>>>>>>> http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/ >>>>>>>>> >>>>>>>>> The proposal added a new keyword to /etc/power.conf >>>>>>>>> "cpu-power-policy", And we have 4 options for this new >>>>>>>>> keyword: 1) perf-bias 2) balanced >>>>>>>>> 3) power-bias >>>>>>>>> 4) default, the same as perf-bias. >>>>>>>>> >>>>>>>>> /etc/power.conf accepts the user input and passes the prefered >>>>>>>>> >>>>> policy >>>>> >>>>>>>>> to the kernel thru ioctl. Then pm_ioctl calls the callback to >>>>>>>>> walk >>>>>>>>> >>>>> a >>>>> >>>>>>>>> cpu >>>>>>>>> power policy list. Every cpu pm feature which wants to be >>>>>>>>> adjusted >>>>>>>>> >>>>> by >>>>> >>>>>>>>> this option and verified to be supported will register its >>>>>>>>> callback function to the list, so that it can be called and >>>>>>>>> adjusted by pmconfig. >>>>>>>>> -------------------------------------------------------- >>>>>>>>> /etc/power.conf | pm_ioctl(cpu_power_policy, policy) >>>>>>>>> | >>>>>>>>> cpu_power_policy_callb (policy) >>>>>>>>> | >>>>>>>>> ----> registered pm feature callback 1 (ENERGY_PERF_BIAS) >>>>>>>>> | >>>>>>>>> ----> registered pm feature callback 2 >>>>>>>>> ... >>>>>>>>> --------------------------------------------------------- >>>>>>>>> Currently, only energy_perf_bias feature is registered, >>>>>>>>> because my intention is to support adjusting energy_perf_bias >>>>>>>>> MSR without reboot. I guess >>>>>>>>> >>>>> we >>>>> >>>>>>>>> probably >>>>>>>>> can add p/t/c-state support later. When we add p/t/c-state >>>>>>>>> support, my quick thought is, this option will override >>>>>>>>> "cpupm" and "cpu-deep-idle" setting. >>>>>>>>> >>>>>>>>> Welcome your any comments and suggestions. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -Aubrey >>>>>>>>> _______________________________________________ >>>>>>>>> pm-discuss mailing list >>>>>>>>> pm-discuss at opensolaris.org >>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> pm-discuss mailing list >>>>>>>> pm-discuss at opensolaris.org >>>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> pm-discuss mailing list >>>>>>> pm-discuss at opensolaris.org >>>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>>> >>>>>> _______________________________________________ >>>>>> pm-discuss mailing list >>>>>> pm-discuss at opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>>> >>>>> _______________________________________________ >>>>> pm-discuss mailing list >>>>> pm-discuss at opensolaris.org >>>>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >>>>> >>> >>> _______________________________________________ >>> pm-discuss mailing list >>> pm-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/pm-discuss >> >> _______________________________________________ >> pm-discuss mailing list >> pm-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/pm-discuss > _______________________________________________ > tesla-dev mailing list > tesla-dev at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/tesla-dev Liu Jiang (Gerry) OpenSolaris, OTC, SSG, Intel
