---------- Forwarded message ----------
Date: Wed, 18 Feb 2009 12:20:32 -0800
From: Garrett D'Amore <[email protected]>
To: Randy Fishel <randy.fishel at sun.com>
Cc: PSARC-ext at sun.com, jiang.liu at intel.com, frank.wang at intel.com,
tesla-dev at opensolaris.org
Subject: Re: CPU Idle Notification [PSARC/2009/115 FastTrack timeout 02/25/2009]
Randy Fishel wrote:
> I am sponsoring the following fasttrack for Gerry Liu from Intel. It
> proposes a notification interface initially used and provided by memory
> power management work done by Intel engineers in the OpenSolaris
> Tesla project.
>
> It requests micro/patch release binding. All interfaces will have
> a "Volatile" stability. Timeout is February 25th.
>
> This project, release binding, and stability has been reviewed and
> approved by the OpenSolaris Tesla project leaders and contributors.
>
>
>
> 1. Introduction
> 1.1. Project/Component Working Name:
> CPU idle notification interface
>
> 1.2. Name of Document Author/Supplier:
> Author: Gerry Liu <jiang.liu at intel.com>
>
> 1.3. Date of This Document:
> November 16, 2008
>
> 4. Technical Description:
> 4.1. Problem
> A CPU idle notification mechanism is needed to signal other
> components which are interested in the CPU idle state change events
> when CPU enters/exits idle state. This mechanism could be used by
> following components:
> A) Memory power saving driver
> B) Lazy TLB flush on x86 system
> C) CPU power management framework
>
> 4.2. Proposal
> We propose to add following data structures/interfaces to
> OpenSolaris kernel.
>
> 4.2.1 CPU idle notification information data structure
> typedef struct cpu_idle_info {
> int ci_flags;
> int ci_intr_count; /* Interrupt count. */
> int ci_idle_state; /* Idle state to enter. */
> hrtime_t ci_idle_latency; /* Idle round trip latency. */
> hrtime_t ci_max_idle_time; /* Predicted max idle time. */
> hrtime_t ci_last_idle_time;/* Last idle period. */
> hrtime_t ci_last_busy_time;/* Last busy period. */
> } cpu_idle_info_t;
> This structure is used to pass CPU idle information to callbacks.
> It could be extended to support architecture/platform specific
> information in future.
> Valid flags for cpu_idle_info_t:
> CPU_IDLE_CI_FLAG_IDLE_STATE: field ci_idle_state is valid.
> CPU_IDLE_CI_FLAG_IDLE_LATENCY: field ci_idle_latency is valid.
> CPU_IDLE_CB_FLAG_MAX_IDLE_TIME: field ci_max_idle_time is valid.
>
>
Is there a reason that a single structure is required (are some of these fields
linked and need to be dealt with atomically?) If not, then I'd prefer to have
separate name-value pairs for each kind of datum. Perhaps something like a
named property interface.
My experience is that structures with lots of detail tend to have problems
evolving over time (you wind up with lots of "reserved" or "obsolete" fields...)
> typedef void (*cpu_idle_enter_cbfn)(void *arg,
> cpu_idle_info_t *infop);
> Entering idle state notification callback must obey all constraints
> which applies to idle thread becuase it will be called in idle thread
> context.
> Argument arg is the parameter passed in when registering callback.
>
> 4.2.3 Prototype of exiting idle state notification callback
> typedef void (*cpu_idle_exit_cbfn)(void *arg, int flags);
> Exiting idle state notification callback will be called in idle
> thread context or interrupt context. There are flags to distinguish
> the calling contexts.
> arg is the parameter passed in when registering callback.
> Valid flags for exiting idle state notification callback:
> CPU_IDLE_CB_FLAG_INTR: called in interrupt context
> CPU_IDLE_CB_FLAG_IDLE: called in idle thread context
>
> 4.2.4 CPU idle notification callback data structures
> typedef struct cpu_idle_callback {
> int version;
> cpu_idle_enter_cbfn idle_enter;
> cpu_idle_exit_cbfn idle_exit;
> } cpu_idle_callback_t;
> At least one of idle_enter and idle_exit is non-NULL.
>
What is the value of "version" in this structure?
> 4.2.5. Register CPU idle notification callback
> int cpu_idle_register_callback(uint_t priority,
> cpu_idle_callback_t *callbackp, void *arg, void **hdlpp);
> This interface registers a callback to be called when CPU idle state
> changes. All registered callbacks will be called in priority order
> from high to low when CPU enters idle state and will be called in
> reverse order when CPU exits idle state.
> Argument priority is used to determine the calling order of
> registered callbacks.
> Argument arg will be passed back to registered callback and how to
> use it is determined by callback.
>
> 4.2.6. Deregister CPU notification callback
> int cpu_idle_unregister_callback(uint_t priority,
> cpu_idle_callback_t *callbackp, void *arg, void *hdlp);
> This interface deregisters a registered callback.
>
> 4.2.7. Signal entering idle state event
> void cpu_idle_enter(cpu_idle_info_t *infop);
> This interface notifies idle notification subsystem that a specific
> CPU is entering into idle state.
>
> 4.2.8. Signal exiting idle state event
> void cpu_idle_exit(int flags);
> This interface notifies idle notification subsystem that a specific
> CPU is exiting from idle state.
>
This seems specific to "idle" notification. Would it be useful to have
something that dealt with other kinds of CPU state transitions as well?
(C-states, for example?)
-- Garrett
> 6. Resources and Schedule:
> 6.4. Steering Committee requested information
> 6.4.1. Consolidation C-team Name:
> ON
> 6.5. ARC review type: FastTrack
> 6.6. ARC Exposure: open
>
>