Eric Saxe wrote:

> David Vengerov wrote:
>
>>>
>>> From your description, it sounds like the opportunity here is using 
>>> ML to learn the best
>>> policies for controlling power manageable resources, such that 
>>> efficiency, performance,
>>> and adaptability (in the face of changing system utilization) are 
>>> maximized. Is that right? 
>>
>>
>> Yes, and the system can also jointly adapt the thread migration 
>> policies so as to achieve the same objective.
>
> What sorts of inputs (observability) do you think will be needed for 
> this? 

I think that whatever is currently observable in the Solaris kernel 
should be sufficient to start with. That is, even if the controller just 
knows the run queue length on each CPU, it can already improve the 
system's power efficiency by lowering the clock rate of CPUs that are 
idle or have few threads in their run queues (if the workload consists 
of many short-lived transactions). A more effective approach, however, 
is for the thread migration policy to cooperate with the power 
management policy and try to keep as many CPUs idle as possible without 
"infringing" on the SLAs made with the running applications. However, in 
order to do this the controller should be aware of the service quality 
received by the applications, by receiving some continual feedback from 
the system about the response time of completed transactions (or better 
yet about the SLA rewards/penalties, which can already include many 
different performance considerations). So the administrators can be 
given an option of specifying a performance measure that should be sent 
to the power management/thread migration controller. If they choose not 
to provide a performance measure, then the system will behave according 
to the default policy, which can be chosen from the set: {most 
performance oriented (spread the load equally), most power efficient 
(keep all threads on a single CPU and turn others off), 50-50 tradeoff 
between performance and efficiency (use half of the CPUs), etc.}.

There can be some special cases when performance feedback from 
applications is not needed. For example,
we can decide that more than one thread per CPU results in a noticeable 
performance degradation, and so the power management policy can then 
learn what clock frequencies to assign to existing CPUs based on 
observing how many of those CPUs have some threads running on them. Do 
you think this is an important case to address? If so, then a separate 
management policy should be developed for this case, which will be 
"turned on" whenever such case is detected.

David


Reply via email to