I almost finished the reorganization of c-state driver. This is
for c-state observation only. The system can't enter deep c-state
without Bill's HPET work. Due to cpu driver re-structed recently,
c-state has to follow it. Now I'm making a request for code review.

The webrev for cstate driver can be found at:
http://cr.opensolaris.org/~aubrey/cstate/

The patch is against onnv_97(rev 7367), Changes as follows:

1) A kstat member added in cpu_info module.
------------------
$kstat -m cpu_info | grep supported_max_cstates
        supported_max_cstates           3
------snip------
        
2) A kstat module added, named "cpudrv", exporting c-state latency(us),
the method to enter c-state(FFH, SIO) and Power(mW). We could add
more like the total times of entering each c-state, c-state residency
time, etc
for development and observation.
----------------
$kstat -m cpudrv
module: cpudrv                          instance: 0     
name:   c1                              class:    misc
        address_space_id                FFixedHW
        crtime                          24.073615727
        latency                         1
        power                           1000
        snaptime                        262.816570865

module: cpudrv                          instance: 0     
name:   c2                              class:    misc
        address_space_id                SystemIO
        crtime                          24.073622285
        latency                         1
        power                           500
        snaptime                        262.81687418

module: cpudrv                          instance: 0     
name:   c3                              class:    misc
        address_space_id                SystemIO
        crtime                          24.073627073
        latency                         57
        power                           100
        snaptime                        262.817009506
------snip------

3) C-state info is obtained from ACPI _CST objects. So, we can't do
anything
if BIOS doesn't export this object out to OS.

4) Currently, we only support c-state on the Nehalem platform. this
check was
added in the driver to support c-state on the Nehalem platfrom only.

5) Theoretically, C-state coordination has 3 types. But Nehalem platform
only
support HW_ALL type. So currently c-state domain creation only support
this type.
And the dependency is determined by the core_id.

6) _CST notification handler added to accept dynamically change of
c-state type
and number.

7) The idle thread proc pointer "idle_cpu" has been changed to a per-cpu
function pointer, so that we can support different max cstates on the
different
c-state domain. This has to touch the common code, including SPARC, I'm
glad to accept a better idle.

8) On the early boot, "cp->idle_cpu" is assigned to "generic_idle_cpu()"
and
then "cpu_idle()" or "cpu_idle_mwait()". During cpudrv attaches, or _CST
notification event occurs, if deep cstate(C2 or high) support detected, 
cp->idle_cpu will be changed to point to "cpu_acpi_idle()", which
supports to
enter deep c-state. And another shadow pointer(cp->shadow_idle_cpu)
saves
the old "cp->idle_cpu". So that if the next idle type is C1, we don't
need to check
if monitor/mwait supported or not, we call "cp->shadow_idle_cpu" to
enter C1 directly.

9) The next c-state type is determined by a prediction algorithm, based
on the last
c-state residency, if the time is large enough, we consider to enter a
deeper c-state
next time. Oppositely, if the time becomes shorter than the current
c-state latency,
we'll make a demotion to enter a higher c-state next time.

Any suggestion and comments are greatly appreciated!

Thanks,
-Aubrey

Reply via email to