David Vengerov wrote:
>>
>> What sorts of inputs (observability) do you think will be needed for
>> this?
>
> I think that whatever is currently observable in the Solaris kernel
> should be sufficient to start with. That is, even if the controller
> just knows the run queue length on each CPU, it can already improve
> the system's power efficiency by lowering the clock rate of CPUs that
> are idle or have few threads in their run queues (if the workload
> consists of many short-lived transactions). A more effective approach,
> however, is for the thread migration policy to cooperate with the
> power management policy and try to keep as many CPUs idle as possible
> without "infringing" on the SLAs made with the running applications.
Agreed. Dispatcher awareness of CPU power states will be needed so that
it can avoid scheduling threads to run on CPUs that have been clocked
down. An interface between the dispatcher and controller where
utilization information can be conveyed would be good as well...since
that would lessen the extent to which the controller needs to "poll" the
CPUs for their idleness/busyness.
> However, in order to do this the controller should be aware of the
> service quality received by the applications, by receiving some
> continual feedback from the system about the response time of
> completed transactions (or better yet about the SLA rewards/penalties,
> which can already include many different performance considerations).
> So the administrators can be given an option of specifying a
> performance measure that should be sent to the power management/thread
> migration controller.
Yes, we need a way to enable performance and SLA observability.
> If they choose not to provide a performance measure, then the system
> will behave according to the default policy, which can be chosen from
> the set: {most performance oriented (spread the load equally), most
> power efficient (keep all threads on a single CPU and turn others
> off), 50-50 tradeoff between performance and efficiency (use half of
> the CPUs), etc.}.
For a default, I was leaning towards "maximize performance, but don't
squander power". That way, without doing anything system administrators
/ users will still get the performance levels they have come to
expect..but will also see overall efficiency improvements, since average
system utilization is generally very low.
> There can be some special cases when performance feedback from
> applications is not needed. For example,
> we can decide that more than one thread per CPU results in a
> noticeable performance degradation, and so the power management policy
> can then learn what clock frequencies to assign to existing CPUs based
> on observing how many of those CPUs have some threads running on them.
> Do you think this is an important case to address? If so, then a
> separate management policy should be developed for this case, which
> will be "turned on" whenever such case is detected.
I'm not sure I understand the scenario. The existing CMT scheduling
policy will still be active...trying to load balance work
across CPUs that have a shared physical relationship. The power
components of the scheduling policy may be trying to counter
this by coalescing work (where possible). Finding the optimal balance
between coalescence and load balancing is the interesting objective. Is
that what you mean?
Thanks,
-Eric