Hi Bill,
        Thanks for your great suggestions. 
        I'll update the onepager according to your suggestions and repost it 
after getting enough feedbacks.

Bill.Holler at Sun.COM <mailto:Bill.Holler at Sun.COM> wrote:
> Hi Liu Jiang,
> 
> This proposal looks great!
> 
> Here are some suggestions:
> 
> 1) This proposal does not specify the context the callbacks will
> execute in. Presumably idle callbacks will run in the idle-thread
> context.   This proposal should state this.
> Idle threads cannot block.  Idle threads also cannot spin when the
> owner of the synchronization object expects to be able to wakeup
> the CPU from idle.
For cpu_idle_enter_cbfn, it will be called only in idle thread context.
For cpu_idle_exit_cbfn, it may be called in idle thread context or interrupt 
context which depends on the hardware mechanism(instructions) used to put CPU 
into low power state. For example, cpu_idle_exit_cbfn will be called in 
interrupt context if OS uses hlt instruction to put CPU into C1 state on x86. 
On the other hand, it will be called in idle thread context if OS uses mwait to 
put CPU into low power state with interrupt disabled.
So there's a flag to indicate whether cpu_idle_exit_cbfn is called in interrupt 
context.

> 
> 2) Clients may only be interested in a sub-set of c-states.
> cpu_idle_register_callback() could take another argument
> that specified which c-state(s) the callback should be invoked.
> I do not know if clients will want this option.
> 
> CPUs will likely only enter C1 idle when they do not expect to be
> idle very long on C-State enabled kernels.
I don't know whether this is a common requirement too.
How about to let the callback filter out C states not interested at?

> 
> 3) A "max idle time" argument could be added to cpu_idle_enter_cbfn().
> As part of the C-State project CPUs now read their LAPIC Current
> Count register to see when the next Local APIC Timer interrupt
> will fire.  Would FIPE be interested in this max value?
In the original design and implementation, we do have a parameter named 'hint' 
to pass "predicted max idle time" to cpu_idle_enter_cbfn.
I have removed it from the onepager due to following reasons,
1) parameter cstate roughly has the same info.
2) There are arguments whether we should extend parameter 'hint' as a structure 
to contain more info other than "predicted max idle time" for extensibility.
3) Our code is based on onnv_92 which has no support for deeper C state and 
tickless timer yet. I have no idea about the tickless timer interface at that 
time.
If tickless timer or C-State project provides some interface to query next 
timer expiration time, we could pass the predicted max idle time to callbacks.

> 
> 4) Clients may not want their callback invoked if the CPU will not
> be idle longer than a specified time due to the LAPIC Current Count
> register.  Do you want cpu_idle_register_callback() to be able to
> specify a minimum "max idle-time threshold" argument?  The callbeck
> would not be called if the CPU knows it will not be idle that long.
How about to let callbacks make such a decision to simplify the framework? Or 
any concern about performance issues?

> 
> 5) Should any of the FIPE "system idleness" calculations be part
> of the core kernel?  Other systems may be interested in this
> information.
We may optionally expose the idleness prediction algorithm if it's useful to 
other componets.

> 
> 6) What does this sentence in 4.2.4 mean?
> "If it returns non-zero value, CPU should cancel entering idle state"?
> Does this mean "enter ACPI C1 instead of a deeper C-State"?
> Does FIPE need this?
> 
> Can this functionality be removed?
> CPUs have to at least enter C1 when idle.  There can be large
> performance penalties if CPUs never halt in their idle loop....
> 
Currently FIPE doesn't make use of this feature. It's here just for future 
extensibility.
According to your questions about C1, it would be better to remove this feature.
How about the idea to demote deep C state instead of cancel entering idle state 
according to callback return value?
For example, cpu_idle_enter_cbfn is called with an indicator to enter C3, it 
may optionally argue about that by returning a indicator to suggest entering C1 
instead of C3.
By adopting sucn a change, all questions relative to C1 could be solved.

> 7) How will rollback be handled when a callback cancels the idle
> after other callbacks have already been called?
> 
> Will the cpu_idle_exit_cbfn() functions be called for all
> callbacks that have already been invoked?
Yes, currently cpu_idle_exit_cbfn will be called with a flag indicating 
cancellation for all callbacks that have already been invoked.

> 
> Will the CPU then just enter C1 idle without calling callbacks
> again?  The lower priority callbacks will not get invoked in this
> case.
Yes, currently lower priority callbacks will not get invoked.

> 
> The CPU needs to eventually at least enter the C1 idle state.
Great thanks to your reminder. This is a question not covered by current 
design, I need to change it.

> 
> Best regards,
> Bill
> 
> 
> On 11/12/08 06:25, Liu, Jiang wrote:
>> Hi all,
>>      We're planning to develop a memory power management driver for some
>> platforms which need to add new interface to current OpenSolaris
>> kernel. I have written a PSARC onepager  for the new interfaces,
>> please help to comment on the interface design. The attachment is
>> information about the memory power management driver for your
>> reference.   Thanks!     
>> 
>> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
>> 
>> 1. Introduction
>>     1.1. Project/Component Working Name:
>>      CPU idle notification interface
>> 
>>     1.2. Name of Document Author/Supplier:
>>      Author: Gerry Liu <jiang.liu at intel.com>
>> 
>>     1.3. Date of This Document:
>>      October 21, 2008
>> 
>> 4. Technical Description:
>>     4.1. Problem
>>      A CPU idle state change notification mechanism is needed to signal
>>      other components which are interesting in the state change events
>>      when CPU enters/leaves idle state. This mechanism could be used by
>>      following components: A) Memory power saving driver
>>      B) Lazy TLB flush on x86 system
>>      C) CPU power management framework
>> 
>>     4.2. Proposal
>>      We propose to add following data structures/interfaces to
>> OpenSolaris kernel. 
>> 
>>     4.2.1. CPU idle state change notification callback data
>>      structures typedef int  (*cpu_idle_enter_cbfn)(void *arg, int
>>      flags,      int cstate); Callback function prototype for entering
>> idle state notification. 
>> 
>>      typedef void (*cpu_idle_exit_cbfn)(void *arg, int flags);
>>      Callback function prototype for leaving idle state notification.
>> 
>>      typedef struct cpu_idle_callback {
>>              int                     version;
>>              cpu_idle_enter_cbfn     idle_enter;
>>              cpu_idle_exit_cbfn      idle_exit;
>>      } cpu_idle_callback_t;
>>      Data structure for CPU idle state change notification callback.
>> 
>>     4.2.2. Register CPU idle state change notification callback
>>      int cpu_idle_register_callback(uint_t priority,
>>          cpu_idle_callback_t *callbackp, void *arg);
>>      This interface registers a callback to be called when CPU idle state
>>      changes. All registered callbacks will be called in priority order
>>      from high to low when CPU enters idle state and will be called in
>>      the reverse order when CPU leaves idle state.
>> 
>>     4.2.3. Unregister CPU idle callback interface
>>      int cpu_idle_unregister_callback(uint_t priority,
>>          cpu_idle_callback_t *callbackp, void *arg);
>>      This interface deregisters a registered callback.
>> 
>>     4.2.4. Notify entering idle state
>>      int cpu_idle_enter(int flags, int cstate, int64_t hints);
>>      This interface signals that a specific CPU is entering idle state.
>>      If it returns non-zero value, CPU should cancel entering idle state.
>> 
>>     4.2.5. Signal exiting idle state interface
>>      void cpu_idle_exit(int flags);
>>      This interface signals that a specific CPU has left idle state.
>> 
>> 
>> 6. Resources and Schedule:
>>     6.4. Steering Committee requested information
>>      6.4.1. Consolidation C-team Name:
>>              ON
>>     6.5. ARC review type: FastTrack
>>     6.6. ARC Exposure: open
>> 
>> Liu Jiang (Gerry)
>> Senior Software Engineer
>> OpenSolaris, OTC
>> Tel: (8610)82611515-1643
>> iNet: 8758-1643
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> tesla-dev mailing list
>> tesla-dev at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/tesla-dev

Reply via email to