[tesla-dev] [Bug 6232] Implement code review feedback for Saurabh Misra

[email protected] Mon, 26 Jan 2009 13:09:05 -0800 (PST)

http://defect.opensolaris.org/bz/show_bug.cgi?id=6232






--- Comment #2 from Bill Holler <bill.holler at sun.com>  2009-01-26 13:09:06 
---
(In reply to comment #0)

> usr/src/uts/i86pc/os/cpupm/cpu_idle.c
>   - Deep C state and C1 state is represented in mc_haltset. Will it be more
> expensive to wakeup arbitrary CPU in cstate_wakeup() when the CPU passed is 
> not
> in halted cpuset. We may prefer to wakeup CPUs in C1 state rather than in deep
> C state. Line 162-177.


This is http://defect.opensolaris.org/bz/show_bug.cgi?id=4616.
Power savings went up nicely.  Unfortunately performance went
down when the "shallowest idle" CPU is selected.  Thus is the
frustrating nature of trading performance for power.  :-(
We need a better need a better C-state throttle mechanism before
switching this on.  (Also the scheduling algorithm you mentioned
tends to favor scheduling threads onto CPUs with high interrupt
loads etc.)

In general we would like to move the dispatcher towards looking
at Power Domains (cores) instead of looking at individual CPUs.
CMT load balancing levels are only aware of their cpus.
Instead cmt_pgs should be aware of their child cmt_pgs.
The motivation is: the C-state of the hardware core is the higher
of its sibling CPUs on hyper threaded architectures.  :-(
The c-state of a CPU alone may not be sufficient to know its
hardware c-state. 

A higher power savings policy could prefer consolidating on
cores instead of looking for the shallowest idle CPU.
Higher power saving policies are a future OpenSolaris project.


>    - We can also consider time-stamping CPU idle loop. If a CPU has been in
> idle state for long then put the CPU in deep-C state. Starting with mwait 
> first
> and then progressing to deep-C sleep state. This way if the CPU were to
> awakened soon then we will not go through expensive deep-C sleep state
> transition.


This will require testing.
This current thinking is to attempt to predict short idle periods
and just go to C1 when the system thinks the CPU will not be
able to enter deeper C-states.

We have been spending the last few months basically just
testing different idle policy algorithms.  We will probably
put back more changes in this area before Nevada putback. 

>   - I think we would want to consider the size of the system while waking up
> CPUs and/or runq of active CPUs. For instance, on a laptop/desktop system, we
> shouldn't end up waking up other CPUs through setbackdq()/setfrontdq() if 
> there
> is a sudden burst of workload (callback disp_enq_thread is invoked whenever
> there is a thread being enqueued in the run queue). I guess runq balance code
> may take care of this but just checking. On large systems it will have
> cascading effect but I guess we have to do better in numbers and saving power
> is secondary at that point of time.

As you noted: maintaining performance is the highest
priority because deep c-states will be enabled by default.
Currently the scheduler looks through the CMT levels
for thread placement based on LOAD_BALANCING or
COALESCING at each level.  (A level is something like
a shared socket, shared cache, shared pipeline etc.)

Certainly a "prefer-power" policy may use different scheduling
policies.

We are actively investigating other algorithms.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[tesla-dev] [Bug 6232] Implement code review feedback for Saurabh Misra

Reply via email to