My apologies for not telling you guys sooner.

Will definitely keep you in the loop.

Thanks
Margot



On 04/ 1/10 09:16 AM, Li, Aubrey wrote:
okay, thanks for the info, that sounds good.
Margot, could you please add me to the ARC case interested list?
Or please keep me updated the progress.

Thanks,
-Aubrey

Garrett D'Amore wrote:
On 04/ 1/10 08:28 AM, Margot Hackett Miller wrote:
We are in the process now of moving what's in power.conf
to SMF.   I will be filing an ARC case in a few weeks for it.
Then I recommend doing that case *before* introducing another power.conf
tunable.

     - Garrett
Margot


On 03/31/10 10:29 PM, Garrett D'Amore wrote:
On 03/31/10 09:09 PM, Randy Fishel wrote:
    This might be a bit contentious, as there not only is effort to
migrate the configuration to SMF, there is a consideration to define
something similar to system-pm-policy.  On the other hand, there
also
is lacking architecture and there doesn't seem to be much momentum
in
providing it.
Strongly concur on this.  This (and other PM management settings)
belongs in SMF now.

This will probably be derailed if you try to integrate this change
without doing it via SMF.

     - Garrett
    I am also leaving for vacation on Friday morning.  I will take a
printout with me in hopes of maybe reviewing it over the next week.
It may also give others the opportunity to see how this might fit
into
the "new" architecture.

    Cheers!

     ---- Randy

On Thu, 1 Apr 2010, Li, Aubrey wrote:

Just wanna move forward for this work, here is a PSARC onepager,
Any inputs
are really appreciated!

Thanks,
-Aubrey

======== system-pm-policy_onepager_v1.txt
=================================
Template Version: @(#)onepager.txt 1.35 07/11/07 SMI

1. Introduction
     1.1. Project/Component Working Name:
          system-pm-policy keyword

     1.2. Name of Document Author/Supplier:
          Author: Aubrey Li<[email protected]>

     1.3. Date of This Document:
          April 28 , 2010

2. Project Summary
     2.1. Project Description:
          Solaris support for the system-pm-policy keyword in
power.conf(4).
          A mechanism is desired to set system wide power
performance bias.

     2.2. Risks and Assumptions:
          Very few customers will use this keyword. Most customers
will desire
          power performance balanced policy to be the default.

4. Technical Description:
      4.1. Details:

          pmconfig(1M) parses /etc/power.conf, if the
system-pm-policy keword
          is in power.conf(4), it passes the user preferred policy
to the kernel
          thru pm_ioctl by the command PM_SET_SYSTEM_POLICY.
pm_ioctl() then
          calls pm_set_system_policy() to set the global policy
variable and
          calls the power managable modules to pass the policy down.

          Currently pm_set_system_policy() only set the CPU power
management
          policy, and could set memory and other devices power
management policy
          in future. CPU pm policy setting is machine specific.

          CPU has a few power management features, like C-state,
P-state, energy
          performance bias etc. Every CPU pm feature which wants to
inherit the
          system-pm-policy will register its callback function to a
list, when
          pmconfig passes the policy to the kernel, the kernel will
walk the list
          to call the callback function and hence set the user
perferred policy
          to the different modules.

          /etc/power.conf may have [system-pm-policy<value>]
            |
            v
          pmconfig
            |
            v
          pm_ioctl(PM_SET_SYSTEM_POLICY, policy)
            |
            v
          pm_set_system_policy(policy)
            |
            ---->   CPU pm policy callback
            |     |
            |     ---->   registered CPU pm feature 1
callback(ENERGY_PERF_BIAS)
            |     |
            |     ---->   ...
            |
            ---->   Memory pm policy callback in future
            |
            ---->   ...


          Power performance balanced policy will be set by default,
this keeps the
          current out-of-box setting unchanged. The system which has
extreme
          performance requirements could disable the power
management features by
          performance bias setting. If laptop runs on a battery, or
the system in
          the low utilization prefers power than performance,
system-pm-policy could
          be set to power bias and save more power, this could lead
to the lowest
          CPU clock and always deepest idle state.

          Different power manageable devices could inherit the
system wide policy
          completely, or they can maintain a specific pm policy
themselves but the
          system wide policy must be the biggest weight coefficient
to their own
          mechanism.


      4.2. Bug/RFE Number(s): xxxxxxx

      4.5. Interfaces:
          This project will import these existing interfaces.
          Interface stability will be "committed".

          Import:
                  power.conf(4) (PSARC/1992/202)
                  pmconfig(1m)

          Export:
                  system-pm-policy

          system-pm-policy keyword.
          A system-pm-policy entry can be added to power.conf(4) to
set the system
          wide power policy. If this entry is present and set to
default or it is
          not present then the default balanced policy will be used,
this keeps the
          current behavior unchanged. The other options will tune
the policy to power
          bias or performance bias.

          power.conf(4) man page addition:

          a system-pm-policy may be used to set system wide power
policy. The format
          of the system-pm-policy entry is system-pm-policy policy.

       Acceptable policy values are:

       default    Power performance balanced policy.

       perf-bias  The system drives to maximum performance at any
energy cost.

       balanced   Balanced performance vs. power and energy

       power-bias Max energy efficient.

       absent     If the system-pm-policy keyword is absent from
power.conf(4),
                  the behavior is the same as the default case.

      4.6. Doc Impact:
          power.conf man page.  See above.

      4.7. Admin/Config Impact:
          Administrators of systems can use this option to match the
different power
          performance requirement.

      4.8. HA Impact: None.

      4.9. I18N/L10N Impact: No.

      4.10. Packaging&   Delivery:
          This change will be delivered as part of the Deep C-State
RFE.
          These changes will be made at the same time:
                  kernel package
                  power.conf package
                  pmconfig package

      4.11. Security Impact: None.

      4.12. Dependencies: power.conf, pmconfig(1M)

6. Resources and Schedule:
     6.1. Projected Availability: April 2010

     6.4. Product Approval Committee requested information:
          6.4.1. Consolidation C-team Name:
                  ON
     6.5. ARC review type: FastTrack
     6.6. ARC Exposure:   open

7. Prototype Availability:
     7.1. Prototype Availability:
          Prototype available on OpenSolaris in April 2010.

========================================================================
===========

Li, Aubrey wrote:
Hi Bill,

Here I made a change to propose system-wide policy support.
http://cr.opensolaris.org/~aubrey/sys_pm_policy_v1/
The user profile from /etc/power.conf is still passed to the
kernel
thru pm_ioctl, then call pm_set_system_policy(). Currently there
is only
cpu pm policy setting there, if memory/other devices need a bias
as well,
they can also be added to that function.
cpu pm policy related implementation has minor change against last
webrev,
mcpu_pm_policy pointer has been moved from machcpu to
mcpu_pm_mach_state
structure according to your suggestion.

Any comments and suggestions are highly appreciated.

Thanks,
-Aubrey

Li, Aubrey wrote:
It looks like memory PM need such a bias as well. So I'd like to
change
the proposal to use the keyword "sys-pm-policy" instead. The
mechanism
will use the existing callb implementation to pass the user
policy from
/etc/power.conf to the kernel and walk the module registered list
to
call
module hook function to set the pm policy individually.

I'm not sure if any other device driver need or be happy with
this
proposal.
It would be great if the device driver developer can share some
thoughts
here.

Thanks,
-Aubrey

Julia.Harper wrote:
I assume that this knob (profile) when turned way down would
basically
put the
system into "power savings" mode -- where the set of power
states is
restricted.
   That is, no matter how long the utilization level demands more
power,
the
highest power states (for the cpus, memory, whatever) will never
be
entered.  We
should probably use terminology that makes this clear.

-- jdh


Liu, Jiang wrote:
I prefer the solution to introduce a global power profile for
all
devices. Currently
we need such a profile for CPUPM. In future when supporting
memory
power
management, we may need a similiar profile for memory PM. And
user
won't
like two variables/profiles for the same objective.

Li, Aubrey<>   wrote:
Bill Holler wrote:
Hi,

I forgot to mention that cpu_pm_policy is just a policy.
There is no guaranty it maps to a specific MSR or hardware
implementation.
Yes, I would like to propose a new option for CPU power
management
policy. This policy is a CPU bias between performance and
power,
the
future CPU power management enhancement work can be based on
this
policy. - the default policy should keep the current "out of
the
box"
behavior unchanged, we'll try to save more power without
performance
hurt.
- there will be more power management futures coming on the
future
processor, like ENERGY_PERFORMANCE_BIAS, we can register these
new
futures under the policy framework, and offer a knob to the
user to
change these settings on the fly.
- laptop users who want to prolong the battery life and less
heat
and
smaller fan noise may want the system to work in some edge
situation:
for example, currently CPU can work in the highest clock if
cpupm
is
disabled, but no choice to let CPU always work in the lowest
clock.
Similarly, Always enter deepest c-state is another choice to
save
more power. What's more, power aware dispatcher could be more
flexible to pick up CPU and dispatch thread if there is a
policy
indicator. - Some users doesn't care about power. Yes, we
already
have the options to let them to set ENERGY_PERFORMANCE_BIAS to
be
performance bias, to close c-state/p-state, and so on and so
forth.
But it's more friendly to the user to just change only one
option.

Here, the policy only focus on CPU. If you think we should
have a
policy for the memory, for the devices, or we should have a
system-wide policy, let's do this. cpu_pm_policy can be one
part of
system-wide policy.
If nobody have thoughts on it, I'll continue to prepare a
PSARC
file
to add cpu_pm_policy keyword.

For example Solaris could be dynamically setting the
ENERGY_PERFORMANCE_BIAS register to different settings
depending
on
things such as system-load,
Yes, such of these settings can be dynamically changed if we
see
the
benefit.

the priority of the application being scheduled, a power
policy of
the application,
Making the thread power aware need another bunch of interfaces
I
think. For example, cmt_balance() can choose the different
processor
group according to the perf/power bias of the thread.

or power policy of the zone.
Zone policy is an interesting topic. Different zone could have
different CPU resource, or can share the global CPU resource,
different zone could have different power policy, or they can
inherit
the global cpu_pm_policy setting. The virtual container could
have
many, but the hardware resource is unique. I think this can be
enhanced in the zone management, which will not be covered in
my
proposal, :)

Thanks,
-Aubrey

Regards,
Bill


On 03/03/10 16:21, Bill Holler wrote:
+1.

Hi Aubrey,

I also think it is time to move forward with this proposal.
Generally we want the system to work best "out of the box"
with no tuning.  On the other hand, vendors will keep
improving
products with new features, and there will always be some
specific
applications were custom settings may be better.  I feel
this
proposal supports innovation and application specific
customization
in line with the OpenSolaris community goals.

This proposal applies to all types of CPUs.  It uses
"cpu_pm_policy"
instead of for example mentioning a specific CPU's MSR.  ;-)
This
proposal will be useful with other CPUs if/when they have
hardware
mechanisms for tuning power / performance.


In the arc case we want to mention that there could be a
policy
conflict between this component setting and a
system-power-policy,
external Power Caping, etc. Generally we want users to use
the
default or a higher level policy such as the system power
policy.
Unfortunately the system power policy may not be fine-grain
or
diverse enough for some applications to specify cpu power
policy.
In that case cpu_pm_policy will be useful.  My thought is:
the
user
must really know what they want if they specify a component
policy
such as cpu_pm_policy instead of just using the system power
policy.  For that reason I feel cpu_pm_policy should
override the
system-power-policy at the cpupm level.

Power Caping is different.  Power Capping is an external
policy.
It
is currently "owned" by the SP external to the OS.  Power
Caping
should override a local cpu_pm_policy.


Implementation comments:
IMHO mcpu_pm_policy pointer should be in the
mcpu_pm_mach_state
structure instead of in the machcpu.
We may want to allow the user to specify a number instead of
just
Perf, Balanced, Power, Default?

Regards,
Bill


On 02/20/10 18:43, Li, Aubrey wrote:
Hi Bill,

I think it's time to continue this proposal, since b134 is
closed
and the build is not limited now. power/perf bias setting
is a
start point for future power related work, I'll prepare a
PSARC
file for the new option if this is acceptable. No is also a
good
answer with good reason.

Thanks,
-Aubrey


Bill.Holler Wrote:

Hi,

This proposal is for a mechanism to set the new MSR
IA32_ENERGY_PERF_BIAS_MSR.   This is a new hardware
feature.  The MSR effects overall power/performance.
It gives a hint to the processor&   package for desired
power/performance characteristics.  It is related to
p-states
and
c-states (and may effect these features), but this
feature can
have other socket/system-level effects as well.
The programmers guides do not go into details what the
other
effects can be.  :-(

The perf and power impact of this MSR is model specific.
It's able to throttle turbo on WSM and probably help to do
more
hardware decision in future. For example, when the short
interrupt
storm is detected, it can demote CC6 request to CC3.


On 11/05/09 05:15, minskey guo wrote:

Jedy Wang ??:

Hi Li,

As far as I know, gnome-power-manager has removed the
support
for changing governor which is the same as profile I
think.
I
remember someone wrote a blog explaining the reason but
I
can
not find it now.

I

wonder why what makes us still need to implement this
feature.
In linux world, there is ondemand governor in kernel. It
sets
cpu freqency according to cpu's current load. So,
somebody
consider that
eveybody
should use that governor, and let CPUs finish their jobs
asap
and

then

enter
into C states for power-saving. Comparing to P state,
c-state
does

save

more power. That's why gnome removed it.

This is also model specific and depends on if the
frequency and
voltage and power are linear. That's true on latest
processor
but
not on earlier processor.

I'm not sure why gnome removed it, but seems not a good
idea to
me. Some users want max perf and others want longer
battery
life.
Yes, a good p-state + c-state implementation is not easy
to
tune
for more power savings.  Running in lower p-states when a
CPU
is
busy burns more power due to shorter time in deeper
C-states.
Entering deeper C-states too aggressively also burns more
power
(on both an idle and busy system) due to unnecessary
wakeup
latency.  ;-)  Without knowing the details, it seems
likely
that
the gnome-power-manager was removed because setting it
made
worse
decisions than a runtime prediction.


Solaris currently has mechanisms to turn P-state and
deeper
C-state support on/off.

A requirement is that the Energy Perf Bias MSR can be set
on
systems not running a GUI.  We would like to support a
possible
future Gnome interface to set this MSR if/when it
exists.  The
proposal provides a mechanism that works on systems
without
Gnome.

Right, most of servers do not run gnome. I don't expect
gnome
support but it would be great if it will, :-)

IMHO, we should use this global cpu power policy setting
instead
of "cpupm" and "cpu-deep-idle", this is more friendly to
the
user. The users just want more perf or more power, I think
they
don't care if the system support p/c- state at the same
time.
"cpupm" is a confusion only for p-state. we call "cpupm"
before
we have deep idle support. Actually cpu-deep-idle is also
one
part of cpu power management, :)

but, someone doesn't care power-saving, when comparing
it to
other factors. For example, if you are plagued by the
noise
of
CPU fan,
and
expect quiet it then you can lower cpu frequency, which
results
in lower heat, and then fan can be stopped.

personally, I vote +1 for this project if I could vote,
but I
don't

like

the names of "perf-bias" etc :)


Besides, can somebody tell me where
IA32_ENERGY_PERF_BIAS_MSR
comes ? Is it a part of IPS feature ?

Intel's Software Developer's Manuals 2A describes CPUID
detection
of IA32_ENERGY_PERF_BIAS_MSR and volume 3A describes the
MSR.
http://www.intel.com/products/processor/manuals/
Sorry, I do not know what IPS stands for?

cough, cough, IPS is not a released feature and should not
be
discussed here, ;p

Thanks,
-Aubrey


Regards,
Bill



-minskey




I remember why already support 2 profile through gnome-
power-
manager

on

Solaris. What's the difference between them?

I do not understand the exact meaning perf-bias,
balanced
and
power-

bias

either. Does not perf-bias means the cpu frequency will
be
always
at
the

highest level?

Regards,

Jedy
On Wed, 2009-11-04 at 08:47 +0800, Li, Aubrey wrote:


Hi,

When we enable intel energy performance bias feature,
we
found the power profile implementation is necessary.
Here I
did a draft for cpu level power policy.
http://cr.opensolaris.org/~aubrey/cpu_power_policy_v1/

The proposal added a new keyword to /etc/power.conf
"cpu-power-policy", And we have 4 options for this new
keyword: 1) perf-bias 2) balanced
3) power-bias
4) default, the same as perf-bias.

/etc/power.conf accepts the user input and passes the
prefered
policy

to the kernel thru ioctl. Then pm_ioctl calls the
callback
to
walk

a

cpu
power policy list. Every cpu pm feature which wants to
be
adjusted

by

this option and verified to be supported will register
its
callback function to the list, so that it can be
called and
adjusted by pmconfig.

------------------------------------------------------
-
-
      /etc/power.conf | pm_ioctl(cpu_power_policy,
policy)
      |
cpu_power_policy_callb (policy)
      |
      ---->   registered pm feature callback 1
(ENERGY_PERF_BIAS)
|
      ---->   registered pm feature callback 2
      ...
------------------------------------------------------
---
Currently, only energy_perf_bias feature is registered,
because my intention is to support adjusting
energy_perf_bias
MSR without reboot. I guess

we

probably
can add p/t/c-state support later. When we add
p/t/c-state
support, my quick thought is, this option will
override
"cpupm" and "cpu-deep-idle" setting.

Welcome your any comments and suggestions.

Thanks,
-Aubrey
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-
discuss

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss



_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
tesla-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/tesla-dev
Liu Jiang (Gerry)
OpenSolaris, OTC, SSG, Intel
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
--

---------------------
      Julia Harper, [email protected]
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

Reply via email to