subject:"\\\\\\\[RFC PATCH 0\\\\\\\/2\\\\\\\] cpufreq\\\\\\\: Introduce LAB cpufreq governor."

On 10 April 2013 11:38, Lukasz Majewski  wrote:
> Hi Vincent,
>
>> On 10 April 2013 10:44, Lukasz Majewski 
>> wrote:
>> > Hi Vincent,
>> >
>> >>
>> >>
>> >> On Tuesday, 9 April 2013, Lukasz Majewski 
>> >> wrote:
>> >> > Hi Viresh and Vincent,
>> >> >
>> >> >> On 9 April 2013 16:07, Lukasz Majewski 
>> >> >> wrote:
>> >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
>> >> >> > Our approach is a bit different than cpufreq_ondemand one.
>> >> >> > Ondemand takes the per CPU idle time, then on that basis
>> >> >> > calculates per cpu load. The next step is to choose the
>> >> >> > highest load and then use this value to properly scale
>> >> >> > frequency.
>> >> >> >
>> >> >> > On the other hand LAB tries to model different behavior:
>> >> >> >
>> >> >> > As a first step we applied Vincent Guittot's "pack small
>> >> >> > tasks" [*] patch to improve "race to idle" behavior:
>> >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
>> >> >>
>> >> >> Luckily he is part of my team :)
>> >> >>
>> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
>> >> >>
>> >> >> BTW, he is using ondemand governor for all his work.
>> >> >>
>> >> >> > Afterwards, we decided to investigate different approach for
>> >> >> > power governing:
>> >> >> >
>> >> >> > Use the number of sleeping CPUs (not the maximal per-CPU
>> >> >> > load) to change frequency. We thereof depend on [*] to "pack"
>> >> >> > as many tasks to CPU as possible and allow other to sleep.
>> >> >>
>> >> >> He packs only small tasks.
>> >> >
>> >> > What's about packing not only small tasks? I will investigate the
>> >> > possibility to aggressively pack (even with a cost of performance
>> >> > degradation) as many tasks as possible to a single CPU.
>> >>
>> >> Hi Lukasz,
>> >>
>> >> I've got same comment on my current patch and I'm preparing a new
>> >> version that can pack tasks more agressively based on the same
>> >> buddy mecanism. This will be done at the cost of performance of
>> >> course.
>> >
>> > Can you share your development tree?
>>
>> The dev is not finished yet but i will share it as soon as possible
>
> Ok
>
>>
>> >
>> >>
>> >>
>> >> >
>> >> > It seems a good idea for a power consumption reduction.
>> >>
>> >> In fact, it's not always true and depends several inputs like the
>> >> number of tasks that run simultaneously
>> >
>> > In my understanding, we can try to couple (affine) maximal number of
>> > task with a CPU. Performance shall decrease, but we will avoid
>> > costs of tasks migration.
>> >
>> > If I remember correctly, I've asked you about some testbench/test
>> > program for scheduler evaluation. I assume that nothing has changed
>> > and there isn't any "common" set of scheduler tests?
>>
>> There are a bunch of bench that are used to evaluate scheduler like
>> hackbench, pgbench but they generally fills all CPU in order to test
>> max performance. Are you looking for such kind of bench ?
>
> I'd rather see a bit different set of tests - something similar to
> "cyclic" tests for PREEMPT_RT patch.
>
> For sched work it would be welcome to spawn a lot of processes with
> different duration and workload. And on this basis observe if e.g. 2 or
> 3 processors are idle.

sanjay is working on something like that:
https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master

>
>>
>> >
>> >>
>> >> >
>> >> >> And if there are many small tasks we are
>> >> >> packing, then load must be high and so ondemand gov will
>> >> >> increase freq.
>> >> >
>> >> > This is of course true for "packing" all tasks to a single CPU.
>> >> > If we stay at the power consumption envelope, we can even
>> >> > overclock the frequency.
>> >> >
>> >> > But what if other - lets say 3 CPUs - are under heavy workload?
>> >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed
>> >> > out this can cause dangerous temperature increase.
>> >>
>> >> IIUC, your main concern is to stay in a power consumption budget to
>> >> not over heat and have to face the side effect of high temperature
>> >> like a decrease of power efficiency. So your governor modifies the
>> >> max frequency based on the number of running/idle CPU
>> > Yes, this is correct.
>> >
>> >> to have an
>> >> almost stable power consumtpion ?
>> >
>> > From our observation it seems, that for 3 or 4 running CPUs under
>> > heavy load we see much more power consumption reduction.
>>
>> That's logic because you will reduce the voltage
>>
>> >
>> > To put it in another way - ondemand would increase frequency to max
>> > for all 4 CPUs. On the other hand, if user experience drops to the
>> > acceptable level we can reduce power consumption.
>> >
>> > Reducing frequency and CPU voltage (by DVS) causes as a side effect,
>> > that temperature stays at acceptable level.
>> >
>> >>
>> >> Have you also looked at the power clamp driver that have similar
>> >> target ?
>> >
>> > I might be wrong here, but in my

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-10 Thread Lorenzo Pieralisi

On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote:

[...]

> > Have you also looked at the power clamp driver that have similar
> > target ?
> 
> I might be wrong here, but in my opinion the power clamp driver is a bit
> different:
> 
> 1. It is dedicated to Intel SoCs, which provide special set of
> registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
> enter certain C state for a given duration. Idle duration is calculated
> by per CPU set of high priority kthreads (which also program [*]
> registers). 
> 

Those registers are used for compensation (ie user asked a given idle
ratio but HW stats show a mismatch) and they are not "programmed"
they are just read. That code is Intel specific but it can be easily ported
to ARM, I did that and most of the code is common with zero dependency
on the architecture.

> 2. ARM SoCs don't have such infrastructure, so we depend on SW here.

Well, it is true that most of the SoCs I am working on do not have
a programming interface to monitor C-state residency, granted, this is
a problem. If those stats can be retrieved somehow (I did that on our TC2
platform) then power clamp can be used on ARM with minor modifications.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Vincent,

> On 10 April 2013 10:44, Lukasz Majewski 
> wrote:
> > Hi Vincent,
> >
> >>
> >>
> >> On Tuesday, 9 April 2013, Lukasz Majewski 
> >> wrote:
> >> > Hi Viresh and Vincent,
> >> >
> >> >> On 9 April 2013 16:07, Lukasz Majewski 
> >> >> wrote:
> >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
> >> >> > Our approach is a bit different than cpufreq_ondemand one.
> >> >> > Ondemand takes the per CPU idle time, then on that basis
> >> >> > calculates per cpu load. The next step is to choose the
> >> >> > highest load and then use this value to properly scale
> >> >> > frequency.
> >> >> >
> >> >> > On the other hand LAB tries to model different behavior:
> >> >> >
> >> >> > As a first step we applied Vincent Guittot's "pack small
> >> >> > tasks" [*] patch to improve "race to idle" behavior:
> >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
> >> >>
> >> >> Luckily he is part of my team :)
> >> >>
> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
> >> >>
> >> >> BTW, he is using ondemand governor for all his work.
> >> >>
> >> >> > Afterwards, we decided to investigate different approach for
> >> >> > power governing:
> >> >> >
> >> >> > Use the number of sleeping CPUs (not the maximal per-CPU
> >> >> > load) to change frequency. We thereof depend on [*] to "pack"
> >> >> > as many tasks to CPU as possible and allow other to sleep.
> >> >>
> >> >> He packs only small tasks.
> >> >
> >> > What's about packing not only small tasks? I will investigate the
> >> > possibility to aggressively pack (even with a cost of performance
> >> > degradation) as many tasks as possible to a single CPU.
> >>
> >> Hi Lukasz,
> >>
> >> I've got same comment on my current patch and I'm preparing a new
> >> version that can pack tasks more agressively based on the same
> >> buddy mecanism. This will be done at the cost of performance of
> >> course.
> >
> > Can you share your development tree?
> 
> The dev is not finished yet but i will share it as soon as possible

Ok

> 
> >
> >>
> >>
> >> >
> >> > It seems a good idea for a power consumption reduction.
> >>
> >> In fact, it's not always true and depends several inputs like the
> >> number of tasks that run simultaneously
> >
> > In my understanding, we can try to couple (affine) maximal number of
> > task with a CPU. Performance shall decrease, but we will avoid
> > costs of tasks migration.
> >
> > If I remember correctly, I've asked you about some testbench/test
> > program for scheduler evaluation. I assume that nothing has changed
> > and there isn't any "common" set of scheduler tests?
> 
> There are a bunch of bench that are used to evaluate scheduler like
> hackbench, pgbench but they generally fills all CPU in order to test
> max performance. Are you looking for such kind of bench ?

I'd rather see a bit different set of tests - something similar to
"cyclic" tests for PREEMPT_RT patch.

For sched work it would be welcome to spawn a lot of processes with
different duration and workload. And on this basis observe if e.g. 2 or
3 processors are idle.

> 
> >
> >>
> >> >
> >> >> And if there are many small tasks we are
> >> >> packing, then load must be high and so ondemand gov will
> >> >> increase freq.
> >> >
> >> > This is of course true for "packing" all tasks to a single CPU.
> >> > If we stay at the power consumption envelope, we can even
> >> > overclock the frequency.
> >> >
> >> > But what if other - lets say 3 CPUs - are under heavy workload?
> >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed
> >> > out this can cause dangerous temperature increase.
> >>
> >> IIUC, your main concern is to stay in a power consumption budget to
> >> not over heat and have to face the side effect of high temperature
> >> like a decrease of power efficiency. So your governor modifies the
> >> max frequency based on the number of running/idle CPU
> > Yes, this is correct.
> >
> >> to have an
> >> almost stable power consumtpion ?
> >
> > From our observation it seems, that for 3 or 4 running CPUs under
> > heavy load we see much more power consumption reduction.
> 
> That's logic because you will reduce the voltage
> 
> >
> > To put it in another way - ondemand would increase frequency to max
> > for all 4 CPUs. On the other hand, if user experience drops to the
> > acceptable level we can reduce power consumption.
> >
> > Reducing frequency and CPU voltage (by DVS) causes as a side effect,
> > that temperature stays at acceptable level.
> >
> >>
> >> Have you also looked at the power clamp driver that have similar
> >> target ?
> >
> > I might be wrong here, but in my opinion the power clamp driver is
> > a bit different:
> 
> yes, it periodically forces the cluster in a low power state
> 
> >
> > 1. It is dedicated to Intel SoCs, which provide special set of
> > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor
> > to enter certain C state for a given duration. Idle

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

On 10 April 2013 10:44, Lukasz Majewski  wrote:
> Hi Vincent,
>
>>
>>
>> On Tuesday, 9 April 2013, Lukasz Majewski 
>> wrote:
>> > Hi Viresh and Vincent,
>> >
>> >> On 9 April 2013 16:07, Lukasz Majewski 
>> >> wrote:
>> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
>> >> > Our approach is a bit different than cpufreq_ondemand one.
>> >> > Ondemand takes the per CPU idle time, then on that basis
>> >> > calculates per cpu load. The next step is to choose the highest
>> >> > load and then use this value to properly scale frequency.
>> >> >
>> >> > On the other hand LAB tries to model different behavior:
>> >> >
>> >> > As a first step we applied Vincent Guittot's "pack small
>> >> > tasks" [*] patch to improve "race to idle" behavior:
>> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
>> >>
>> >> Luckily he is part of my team :)
>> >>
>> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
>> >>
>> >> BTW, he is using ondemand governor for all his work.
>> >>
>> >> > Afterwards, we decided to investigate different approach for
>> >> > power governing:
>> >> >
>> >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to
>> >> > change frequency. We thereof depend on [*] to "pack" as many
>> >> > tasks to CPU as possible and allow other to sleep.
>> >>
>> >> He packs only small tasks.
>> >
>> > What's about packing not only small tasks? I will investigate the
>> > possibility to aggressively pack (even with a cost of performance
>> > degradation) as many tasks as possible to a single CPU.
>>
>> Hi Lukasz,
>>
>> I've got same comment on my current patch and I'm preparing a new
>> version that can pack tasks more agressively based on the same buddy
>> mecanism. This will be done at the cost of performance of course.
>
> Can you share your development tree?

The dev is not finished yet but i will share it as soon as possible

>
>>
>>
>> >
>> > It seems a good idea for a power consumption reduction.
>>
>> In fact, it's not always true and depends several inputs like the
>> number of tasks that run simultaneously
>
> In my understanding, we can try to couple (affine) maximal number of
> task with a CPU. Performance shall decrease, but we will avoid costs of
> tasks migration.
>
> If I remember correctly, I've asked you about some testbench/test
> program for scheduler evaluation. I assume that nothing has changed and
> there isn't any "common" set of scheduler tests?

There are a bunch of bench that are used to evaluate scheduler like
hackbench, pgbench but they generally fills all CPU in order to test
max performance. Are you looking for such kind of bench ?

>
>>
>> >
>> >> And if there are many small tasks we are
>> >> packing, then load must be high and so ondemand gov will increase
>> >> freq.
>> >
>> > This is of course true for "packing" all tasks to a single CPU. If
>> > we stay at the power consumption envelope, we can even overclock the
>> > frequency.
>> >
>> > But what if other - lets say 3 CPUs - are under heavy workload?
>> > Ondemand will switch frequency to maximum, and as Jonghwa pointed
>> > out this can cause dangerous temperature increase.
>>
>> IIUC, your main concern is to stay in a power consumption budget to
>> not over heat and have to face the side effect of high temperature
>> like a decrease of power efficiency. So your governor modifies the
>> max frequency based on the number of running/idle CPU
> Yes, this is correct.
>
>> to have an
>> almost stable power consumtpion ?
>
> From our observation it seems, that for 3 or 4 running CPUs under heavy
> load we see much more power consumption reduction.

That's logic because you will reduce the voltage

>
> To put it in another way - ondemand would increase frequency to max for
> all 4 CPUs. On the other hand, if user experience drops to the
> acceptable level we can reduce power consumption.
>
> Reducing frequency and CPU voltage (by DVS) causes as a side effect,
> that temperature stays at acceptable level.
>
>>
>> Have you also looked at the power clamp driver that have similar
>> target ?
>
> I might be wrong here, but in my opinion the power clamp driver is a bit
> different:

yes, it periodically forces the cluster in a low power state

>
> 1. It is dedicated to Intel SoCs, which provide special set of
> registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
> enter certain C state for a given duration. Idle duration is calculated
> by per CPU set of high priority kthreads (which also program [*]
> registers).

IIRC, a trial on ARM platform have been done by lorenzo and daniel.
Lorenzo, Daniel, have you more information ?

>
> 2. ARM SoCs don't have such infrastructure, so we depend on SW here.
> Scheduler has to remove tasks from a particular CPU and "execute" on
> it the idle_task.
> Moreover at Exynos4 thermal control loop depends on SW, since we can
> only read SoC temperature via TMU (Thermal Management Unit) block.

The idle duration is quite

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Vincent,

> 
> 
> On Tuesday, 9 April 2013, Lukasz Majewski 
> wrote:
> > Hi Viresh and Vincent,
> >
> >> On 9 April 2013 16:07, Lukasz Majewski 
> >> wrote:
> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
> >> > Our approach is a bit different than cpufreq_ondemand one.
> >> > Ondemand takes the per CPU idle time, then on that basis
> >> > calculates per cpu load. The next step is to choose the highest
> >> > load and then use this value to properly scale frequency.
> >> >
> >> > On the other hand LAB tries to model different behavior:
> >> >
> >> > As a first step we applied Vincent Guittot's "pack small
> >> > tasks" [*] patch to improve "race to idle" behavior:
> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
> >>
> >> Luckily he is part of my team :)
> >>
> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
> >>
> >> BTW, he is using ondemand governor for all his work.
> >>
> >> > Afterwards, we decided to investigate different approach for
> >> > power governing:
> >> >
> >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to
> >> > change frequency. We thereof depend on [*] to "pack" as many
> >> > tasks to CPU as possible and allow other to sleep.
> >>
> >> He packs only small tasks.
> >
> > What's about packing not only small tasks? I will investigate the
> > possibility to aggressively pack (even with a cost of performance
> > degradation) as many tasks as possible to a single CPU.
> 
> Hi Lukasz,
> 
> I've got same comment on my current patch and I'm preparing a new
> version that can pack tasks more agressively based on the same buddy
> mecanism. This will be done at the cost of performance of course.

Can you share your development tree?

> 
> 
> >
> > It seems a good idea for a power consumption reduction.
> 
> In fact, it's not always true and depends several inputs like the
> number of tasks that run simultaneously

In my understanding, we can try to couple (affine) maximal number of
task with a CPU. Performance shall decrease, but we will avoid costs of
tasks migration. 

If I remember correctly, I've asked you about some testbench/test
program for scheduler evaluation. I assume that nothing has changed and
there isn't any "common" set of scheduler tests?

> 
> >
> >> And if there are many small tasks we are
> >> packing, then load must be high and so ondemand gov will increase
> >> freq.
> >
> > This is of course true for "packing" all tasks to a single CPU. If
> > we stay at the power consumption envelope, we can even overclock the
> > frequency.
> >
> > But what if other - lets say 3 CPUs - are under heavy workload?
> > Ondemand will switch frequency to maximum, and as Jonghwa pointed
> > out this can cause dangerous temperature increase.
> 
> IIUC, your main concern is to stay in a power consumption budget to
> not over heat and have to face the side effect of high temperature
> like a decrease of power efficiency. So your governor modifies the
> max frequency based on the number of running/idle CPU
Yes, this is correct.

> to have an
> almost stable power consumtpion ?

>From our observation it seems, that for 3 or 4 running CPUs under heavy
load we see much more power consumption reduction.

To put it in another way - ondemand would increase frequency to max for
all 4 CPUs. On the other hand, if user experience drops to the
acceptable level we can reduce power consumption. 

Reducing frequency and CPU voltage (by DVS) causes as a side effect,
that temperature stays at acceptable level.

> 
> Have you also looked at the power clamp driver that have similar
> target ?

I might be wrong here, but in my opinion the power clamp driver is a bit
different:

1. It is dedicated to Intel SoCs, which provide special set of
registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
enter certain C state for a given duration. Idle duration is calculated
by per CPU set of high priority kthreads (which also program [*]
registers). 

2. ARM SoCs don't have such infrastructure, so we depend on SW here.
Scheduler has to remove tasks from a particular CPU and "execute" on
it the idle_task.
Moreover at Exynos4 thermal control loop depends on SW, since we can
only read SoC temperature via TMU (Thermal Management Unit) block.

Correct me again, but it seems to me that on ARM we can use CPU hotplug
(which as Tomas Glexner stated recently is going to be "refactored" :-)
) or "ask" scheduler to use smallest possible number of CPUs and enter C
state for idling CPUs.

> 
> 
> Vincent
> 
> >
> >>
> >> > Contrary, when all cores are heavily loaded, we decided to reduce
> >> > frequency by around 30%. With this approach user experience
> >> > recution is still acceptable (with much less power consumption).
> >>
> >> Don't know.. running many cpus at lower freq for long duration will
> >> probably take more power than running them at high freq for short
> >> duration and making system idle again.
> >>
> >> > We have

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

On 9 April 2013 20:52, Vincent Guittot  wrote:
>
>
> On Tuesday, 9 April 2013, Lukasz Majewski  wrote:
>> Hi Viresh and Vincent,
>>
>>> On 9 April 2013 16:07, Lukasz Majewski  wrote:
>>> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
>>> > Our approach is a bit different than cpufreq_ondemand one. Ondemand
>>> > takes the per CPU idle time, then on that basis calculates per cpu
>>> > load. The next step is to choose the highest load and then use this
>>> > value to properly scale frequency.
>>> >
>>> > On the other hand LAB tries to model different behavior:
>>> >
>>> > As a first step we applied Vincent Guittot's "pack small tasks" [*]
>>> > patch to improve "race to idle" behavior:
>>> >
>>> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
>>>
>>> Luckily he is part of my team :)
>>>
>>> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
>>>
>>> BTW, he is using ondemand governor for all his work.
>>>
>>> > Afterwards, we decided to investigate different approach for power
>>> > governing:
>>> >
>>> > Use the number of sleeping CPUs (not the maximal per-CPU load) to
>>> > change frequency. We thereof depend on [*] to "pack" as many tasks
>>> > to CPU as possible and allow other to sleep.
>>>
>>> He packs only small tasks.
>>
>> What's about packing not only small tasks? I will investigate the
>> possibility to aggressively pack (even with a cost of performance
>> degradation) as many tasks as possible to a single CPU.
>
> Hi Lukasz,
>
> I've got same comment on my current patch and I'm preparing a new version
> that can pack tasks more agressively based on the same buddy mecanism. This
> will be done at the cost of performance of course.
>
>
>
>>
>> It seems a good idea for a power consumption reduction.
>
> In fact, it's not always true and depends several inputs like the number of
> tasks that run simultaneously
>
>
>>
>>> And if there are many small tasks we are
>>> packing, then load must be high and so ondemand gov will increase
>>> freq.
>>
>> This is of course true for "packing" all tasks to a single CPU. If we
>> stay at the power consumption envelope, we can even overclock the
>> frequency.
>>
>> But what if other - lets say 3 CPUs - are under heavy workload?
>> Ondemand will switch frequency to maximum, and as Jonghwa pointed out
>> this can cause dangerous temperature increase.
>
> IIUC, your main concern is to stay in a power consumption budget to not over
> heat and have to face the side effect of high temperature like a decrease of
> power efficiency. So your governor modifies the max frequency based on the
> number of running/idle CPU to have an almost stable power consumtpion ?
>
> Have you also looked at the power clamp driver that have similar target ?
>
>
> Vincent
>
>
>>
>>>
>>> > Contrary, when all cores are heavily loaded, we decided to reduce
>>> > frequency by around 30%. With this approach user experience
>>> > recution is still acceptable (with much less power consumption).
>>>
>>> Don't know.. running many cpus at lower freq for long duration will
>>> probably take more power than running them at high freq for short
>>> duration and making system idle again.
>>>
>>> > We have posted this "RFC" patch mainly for discussion, and I think
>>> > it fits its purpose :-).
>>>
>>> Yes, no issues with your RFC idea.. its perfect..
>>>
>>> @Vincent: Can you please follow this thread a bit and tell us what
>>> your views are?
>>>
>>> --
>>> viresh
>>
>>
>>
>> --
>> Best regards,
>>
>> Lukasz Majewski
>>
>> Samsung R Poland (SRPOL) | Linux Platform Group
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

On 9 April 2013 20:52, Vincent Guittot vincent.guit...@linaro.org wrote:


 On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote:
 Hi Viresh and Vincent,

 On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote:
  On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
  Our approach is a bit different than cpufreq_ondemand one. Ondemand
  takes the per CPU idle time, then on that basis calculates per cpu
  load. The next step is to choose the highest load and then use this
  value to properly scale frequency.
 
  On the other hand LAB tries to model different behavior:
 
  As a first step we applied Vincent Guittot's pack small tasks [*]
  patch to improve race to idle behavior:
 
  http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks

 Luckily he is part of my team :)

 http://www.linaro.org/linux-on-arm/meet-the-team/power-management

 BTW, he is using ondemand governor for all his work.

  Afterwards, we decided to investigate different approach for power
  governing:
 
  Use the number of sleeping CPUs (not the maximal per-CPU load) to
  change frequency. We thereof depend on [*] to pack as many tasks
  to CPU as possible and allow other to sleep.

 He packs only small tasks.

 What's about packing not only small tasks? I will investigate the
 possibility to aggressively pack (even with a cost of performance
 degradation) as many tasks as possible to a single CPU.

 Hi Lukasz,

 I've got same comment on my current patch and I'm preparing a new version
 that can pack tasks more agressively based on the same buddy mecanism. This
 will be done at the cost of performance of course.




 It seems a good idea for a power consumption reduction.

 In fact, it's not always true and depends several inputs like the number of
 tasks that run simultaneously



 And if there are many small tasks we are
 packing, then load must be high and so ondemand gov will increase
 freq.

 This is of course true for packing all tasks to a single CPU. If we
 stay at the power consumption envelope, we can even overclock the
 frequency.

 But what if other - lets say 3 CPUs - are under heavy workload?
 Ondemand will switch frequency to maximum, and as Jonghwa pointed out
 this can cause dangerous temperature increase.

 IIUC, your main concern is to stay in a power consumption budget to not over
 heat and have to face the side effect of high temperature like a decrease of
 power efficiency. So your governor modifies the max frequency based on the
 number of running/idle CPU to have an almost stable power consumtpion ?

 Have you also looked at the power clamp driver that have similar target ?


 Vincent




  Contrary, when all cores are heavily loaded, we decided to reduce
  frequency by around 30%. With this approach user experience
  recution is still acceptable (with much less power consumption).

 Don't know.. running many cpus at lower freq for long duration will
 probably take more power than running them at high freq for short
 duration and making system idle again.

  We have posted this RFC patch mainly for discussion, and I think
  it fits its purpose :-).

 Yes, no issues with your RFC idea.. its perfect..

 @Vincent: Can you please follow this thread a bit and tell us what
 your views are?

 --
 viresh



 --
 Best regards,

 Lukasz Majewski

 Samsung RD Poland (SRPOL) | Linux Platform Group

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Vincent,

 
 
 On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com
 wrote:
  Hi Viresh and Vincent,
 
  On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com
  wrote:
   On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
   Our approach is a bit different than cpufreq_ondemand one.
   Ondemand takes the per CPU idle time, then on that basis
   calculates per cpu load. The next step is to choose the highest
   load and then use this value to properly scale frequency.
  
   On the other hand LAB tries to model different behavior:
  
   As a first step we applied Vincent Guittot's pack small
   tasks [*] patch to improve race to idle behavior:
   http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
 
  Luckily he is part of my team :)
 
  http://www.linaro.org/linux-on-arm/meet-the-team/power-management
 
  BTW, he is using ondemand governor for all his work.
 
   Afterwards, we decided to investigate different approach for
   power governing:
  
   Use the number of sleeping CPUs (not the maximal per-CPU load) to
   change frequency. We thereof depend on [*] to pack as many
   tasks to CPU as possible and allow other to sleep.
 
  He packs only small tasks.
 
  What's about packing not only small tasks? I will investigate the
  possibility to aggressively pack (even with a cost of performance
  degradation) as many tasks as possible to a single CPU.
 
 Hi Lukasz,
 
 I've got same comment on my current patch and I'm preparing a new
 version that can pack tasks more agressively based on the same buddy
 mecanism. This will be done at the cost of performance of course.

Can you share your development tree?

 
 
 
  It seems a good idea for a power consumption reduction.
 
 In fact, it's not always true and depends several inputs like the
 number of tasks that run simultaneously

In my understanding, we can try to couple (affine) maximal number of
task with a CPU. Performance shall decrease, but we will avoid costs of
tasks migration. 

If I remember correctly, I've asked you about some testbench/test
program for scheduler evaluation. I assume that nothing has changed and
there isn't any common set of scheduler tests?

 
 
  And if there are many small tasks we are
  packing, then load must be high and so ondemand gov will increase
  freq.
 
  This is of course true for packing all tasks to a single CPU. If
  we stay at the power consumption envelope, we can even overclock the
  frequency.
 
  But what if other - lets say 3 CPUs - are under heavy workload?
  Ondemand will switch frequency to maximum, and as Jonghwa pointed
  out this can cause dangerous temperature increase.
 
 IIUC, your main concern is to stay in a power consumption budget to
 not over heat and have to face the side effect of high temperature
 like a decrease of power efficiency. So your governor modifies the
 max frequency based on the number of running/idle CPU
Yes, this is correct.

 to have an
 almost stable power consumtpion ?

From our observation it seems, that for 3 or 4 running CPUs under heavy
load we see much more power consumption reduction.

To put it in another way - ondemand would increase frequency to max for
all 4 CPUs. On the other hand, if user experience drops to the
acceptable level we can reduce power consumption. 

Reducing frequency and CPU voltage (by DVS) causes as a side effect,
that temperature stays at acceptable level.

 
 Have you also looked at the power clamp driver that have similar
 target ?

I might be wrong here, but in my opinion the power clamp driver is a bit
different:

1. It is dedicated to Intel SoCs, which provide special set of
registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
enter certain C state for a given duration. Idle duration is calculated
by per CPU set of high priority kthreads (which also program [*]
registers). 

2. ARM SoCs don't have such infrastructure, so we depend on SW here.
Scheduler has to remove tasks from a particular CPU and execute on
it the idle_task.
Moreover at Exynos4 thermal control loop depends on SW, since we can
only read SoC temperature via TMU (Thermal Management Unit) block.


Correct me again, but it seems to me that on ARM we can use CPU hotplug
(which as Tomas Glexner stated recently is going to be refactored :-)
) or ask scheduler to use smallest possible number of CPUs and enter C
state for idling CPUs.



 
 
 Vincent
 
 
 
   Contrary, when all cores are heavily loaded, we decided to reduce
   frequency by around 30%. With this approach user experience
   recution is still acceptable (with much less power consumption).
 
  Don't know.. running many cpus at lower freq for long duration will
  probably take more power than running them at high freq for short
  duration and making system idle again.
 
   We have posted this RFC patch mainly for discussion, and I
   think it fits its purpose :-).
 
  Yes, no issues with your RFC idea.. its perfect..
 
  @Vincent: Can you please follow this thread

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com wrote:
 Hi Vincent,



 On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com
 wrote:
  Hi Viresh and Vincent,
 
  On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com
  wrote:
   On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
   Our approach is a bit different than cpufreq_ondemand one.
   Ondemand takes the per CPU idle time, then on that basis
   calculates per cpu load. The next step is to choose the highest
   load and then use this value to properly scale frequency.
  
   On the other hand LAB tries to model different behavior:
  
   As a first step we applied Vincent Guittot's pack small
   tasks [*] patch to improve race to idle behavior:
   http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
 
  Luckily he is part of my team :)
 
  http://www.linaro.org/linux-on-arm/meet-the-team/power-management
 
  BTW, he is using ondemand governor for all his work.
 
   Afterwards, we decided to investigate different approach for
   power governing:
  
   Use the number of sleeping CPUs (not the maximal per-CPU load) to
   change frequency. We thereof depend on [*] to pack as many
   tasks to CPU as possible and allow other to sleep.
 
  He packs only small tasks.
 
  What's about packing not only small tasks? I will investigate the
  possibility to aggressively pack (even with a cost of performance
  degradation) as many tasks as possible to a single CPU.

 Hi Lukasz,

 I've got same comment on my current patch and I'm preparing a new
 version that can pack tasks more agressively based on the same buddy
 mecanism. This will be done at the cost of performance of course.

 Can you share your development tree?

The dev is not finished yet but i will share it as soon as possible




 
  It seems a good idea for a power consumption reduction.

 In fact, it's not always true and depends several inputs like the
 number of tasks that run simultaneously

 In my understanding, we can try to couple (affine) maximal number of
 task with a CPU. Performance shall decrease, but we will avoid costs of
 tasks migration.

 If I remember correctly, I've asked you about some testbench/test
 program for scheduler evaluation. I assume that nothing has changed and
 there isn't any common set of scheduler tests?

There are a bunch of bench that are used to evaluate scheduler like
hackbench, pgbench but they generally fills all CPU in order to test
max performance. Are you looking for such kind of bench ?



 
  And if there are many small tasks we are
  packing, then load must be high and so ondemand gov will increase
  freq.
 
  This is of course true for packing all tasks to a single CPU. If
  we stay at the power consumption envelope, we can even overclock the
  frequency.
 
  But what if other - lets say 3 CPUs - are under heavy workload?
  Ondemand will switch frequency to maximum, and as Jonghwa pointed
  out this can cause dangerous temperature increase.

 IIUC, your main concern is to stay in a power consumption budget to
 not over heat and have to face the side effect of high temperature
 like a decrease of power efficiency. So your governor modifies the
 max frequency based on the number of running/idle CPU
 Yes, this is correct.

 to have an
 almost stable power consumtpion ?

 From our observation it seems, that for 3 or 4 running CPUs under heavy
 load we see much more power consumption reduction.

That's logic because you will reduce the voltage


 To put it in another way - ondemand would increase frequency to max for
 all 4 CPUs. On the other hand, if user experience drops to the
 acceptable level we can reduce power consumption.

 Reducing frequency and CPU voltage (by DVS) causes as a side effect,
 that temperature stays at acceptable level.


 Have you also looked at the power clamp driver that have similar
 target ?

 I might be wrong here, but in my opinion the power clamp driver is a bit
 different:

yes, it periodically forces the cluster in a low power state


 1. It is dedicated to Intel SoCs, which provide special set of
 registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
 enter certain C state for a given duration. Idle duration is calculated
 by per CPU set of high priority kthreads (which also program [*]
 registers).

IIRC, a trial on ARM platform have been done by lorenzo and daniel.
Lorenzo, Daniel, have you more information ?


 2. ARM SoCs don't have such infrastructure, so we depend on SW here.
 Scheduler has to remove tasks from a particular CPU and execute on
 it the idle_task.
 Moreover at Exynos4 thermal control loop depends on SW, since we can
 only read SoC temperature via TMU (Thermal Management Unit) block.

The idle duration is quite small and should not perturb normal behavior

Vincent


 Correct me again, but it seems to me that on ARM we can use CPU hotplug
 (which as Tomas Glexner stated recently is going to be refactored :-)
 ) or ask scheduler to use

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Vincent,

 On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com
 wrote:
  Hi Vincent,
 
 
 
  On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com
  wrote:
   Hi Viresh and Vincent,
  
   On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com
   wrote:
On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
Our approach is a bit different than cpufreq_ondemand one.
Ondemand takes the per CPU idle time, then on that basis
calculates per cpu load. The next step is to choose the
highest load and then use this value to properly scale
frequency.
   
On the other hand LAB tries to model different behavior:
   
As a first step we applied Vincent Guittot's pack small
tasks [*] patch to improve race to idle behavior:
http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
  
   Luckily he is part of my team :)
  
   http://www.linaro.org/linux-on-arm/meet-the-team/power-management
  
   BTW, he is using ondemand governor for all his work.
  
Afterwards, we decided to investigate different approach for
power governing:
   
Use the number of sleeping CPUs (not the maximal per-CPU
load) to change frequency. We thereof depend on [*] to pack
as many tasks to CPU as possible and allow other to sleep.
  
   He packs only small tasks.
  
   What's about packing not only small tasks? I will investigate the
   possibility to aggressively pack (even with a cost of performance
   degradation) as many tasks as possible to a single CPU.
 
  Hi Lukasz,
 
  I've got same comment on my current patch and I'm preparing a new
  version that can pack tasks more agressively based on the same
  buddy mecanism. This will be done at the cost of performance of
  course.
 
  Can you share your development tree?
 
 The dev is not finished yet but i will share it as soon as possible

Ok

 
 
 
 
  
   It seems a good idea for a power consumption reduction.
 
  In fact, it's not always true and depends several inputs like the
  number of tasks that run simultaneously
 
  In my understanding, we can try to couple (affine) maximal number of
  task with a CPU. Performance shall decrease, but we will avoid
  costs of tasks migration.
 
  If I remember correctly, I've asked you about some testbench/test
  program for scheduler evaluation. I assume that nothing has changed
  and there isn't any common set of scheduler tests?
 
 There are a bunch of bench that are used to evaluate scheduler like
 hackbench, pgbench but they generally fills all CPU in order to test
 max performance. Are you looking for such kind of bench ?

I'd rather see a bit different set of tests - something similar to
cyclic tests for PREEMPT_RT patch.

For sched work it would be welcome to spawn a lot of processes with
different duration and workload. And on this basis observe if e.g. 2 or
3 processors are idle.

 
 
 
  
   And if there are many small tasks we are
   packing, then load must be high and so ondemand gov will
   increase freq.
  
   This is of course true for packing all tasks to a single CPU.
   If we stay at the power consumption envelope, we can even
   overclock the frequency.
  
   But what if other - lets say 3 CPUs - are under heavy workload?
   Ondemand will switch frequency to maximum, and as Jonghwa pointed
   out this can cause dangerous temperature increase.
 
  IIUC, your main concern is to stay in a power consumption budget to
  not over heat and have to face the side effect of high temperature
  like a decrease of power efficiency. So your governor modifies the
  max frequency based on the number of running/idle CPU
  Yes, this is correct.
 
  to have an
  almost stable power consumtpion ?
 
  From our observation it seems, that for 3 or 4 running CPUs under
  heavy load we see much more power consumption reduction.
 
 That's logic because you will reduce the voltage
 
 
  To put it in another way - ondemand would increase frequency to max
  for all 4 CPUs. On the other hand, if user experience drops to the
  acceptable level we can reduce power consumption.
 
  Reducing frequency and CPU voltage (by DVS) causes as a side effect,
  that temperature stays at acceptable level.
 
 
  Have you also looked at the power clamp driver that have similar
  target ?
 
  I might be wrong here, but in my opinion the power clamp driver is
  a bit different:
 
 yes, it periodically forces the cluster in a low power state
 
 
  1. It is dedicated to Intel SoCs, which provide special set of
  registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor
  to enter certain C state for a given duration. Idle duration is
  calculated by per CPU set of high priority kthreads (which also
  program [*] registers).
 
 IIRC, a trial on ARM platform have been done by lorenzo and daniel.
 Lorenzo, Daniel, have you more information ?

More information would be welcome :-)

 
 
  2. ARM SoCs don't have such infrastructure, so we depend on SW here.
  Scheduler has to

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-10 Thread Lorenzo Pieralisi

On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote:

[...]

  Have you also looked at the power clamp driver that have similar
  target ?
 
 I might be wrong here, but in my opinion the power clamp driver is a bit
 different:
 
 1. It is dedicated to Intel SoCs, which provide special set of
 registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
 enter certain C state for a given duration. Idle duration is calculated
 by per CPU set of high priority kthreads (which also program [*]
 registers). 
 

Those registers are used for compensation (ie user asked a given idle
ratio but HW stats show a mismatch) and they are not programmed
they are just read. That code is Intel specific but it can be easily ported
to ARM, I did that and most of the code is common with zero dependency
on the architecture.

 2. ARM SoCs don't have such infrastructure, so we depend on SW here.

Well, it is true that most of the SoCs I am working on do not have
a programming interface to monitor C-state residency, granted, this is
a problem. If those stats can be retrieved somehow (I did that on our TC2
platform) then power clamp can be used on ARM with minor modifications.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

On 10 April 2013 11:38, Lukasz Majewski l.majew...@samsung.com wrote:
Hi Vincent,

On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com
wrote:
Hi Vincent,

On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com
wrote:
Hi Viresh and Vincent,

On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com
wrote:
On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
Our approach is a bit different than cpufreq_ondemand one.
Ondemand takes the per CPU idle time, then on that basis
calculates per cpu load. The next step is to choose the
highest load and then use this value to properly scale
frequency.

On the other hand LAB tries to model different behavior:

As a first step we applied Vincent Guittot's pack small
tasks [*] patch to improve race to idle behavior:
http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks

Luckily he is part of my team :)

http://www.linaro.org/linux-on-arm/meet-the-team/power-management

BTW, he is using ondemand governor for all his work.

Afterwards, we decided to investigate different approach for
power governing:

Use the number of sleeping CPUs (not the maximal per-CPU
load) to change frequency. We thereof depend on [*] to pack
as many tasks to CPU as possible and allow other to sleep.

He packs only small tasks.

What's about packing not only small tasks? I will investigate the
possibility to aggressively pack (even with a cost of performance
degradation) as many tasks as possible to a single CPU.

Hi Lukasz,

I've got same comment on my current patch and I'm preparing a new
version that can pack tasks more agressively based on the same
buddy mecanism. This will be done at the cost of performance of
course.

Can you share your development tree?

The dev is not finished yet but i will share it as soon as possible

It seems a good idea for a power consumption reduction.

In fact, it's not always true and depends several inputs like the
number of tasks that run simultaneously

In my understanding, we can try to couple (affine) maximal number of
task with a CPU. Performance shall decrease, but we will avoid
costs of tasks migration.

If I remember correctly, I've asked you about some testbench/test
program for scheduler evaluation. I assume that nothing has changed
and there isn't any common set of scheduler tests?

There are a bunch of bench that are used to evaluate scheduler like
hackbench, pgbench but they generally fills all CPU in order to test
max performance. Are you looking for such kind of bench ?

I'd rather see a bit different set of tests - something similar to
cyclic tests for PREEMPT_RT patch.

For sched work it would be welcome to spawn a lot of processes with
different duration and workload. And on this basis observe if e.g. 2 or
3 processors are idle.

sanjay is working on something like that:
https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master

And if there are many small tasks we are
packing, then load must be high and so ondemand gov will
increase freq.

This is of course true for packing all tasks to a single CPU.
If we stay at the power consumption envelope, we can even
overclock the frequency.

But what if other - lets say 3 CPUs - are under heavy workload?
Ondemand will switch frequency to maximum, and as Jonghwa pointed
out this can cause dangerous temperature increase.

IIUC, your main concern is to stay in a power consumption budget to
not over heat and have to face the side effect of high temperature
like a decrease of power efficiency. So your governor modifies the
max frequency based on the number of running/idle CPU
Yes, this is correct.

to have an
almost stable power consumtpion ?

From our observation it seems, that for 3 or 4 running CPUs under
heavy load we see much more power consumption reduction.

That's logic because you will reduce the voltage

To put it in another way - ondemand would increase frequency to max
for all 4 CPUs. On the other hand, if user experience drops to the
acceptable level we can reduce power consumption.

Reducing frequency and CPU voltage (by DVS) causes as a side effect,
that temperature stays at acceptable level.

Have you also looked at the power clamp driver that have similar
target ?

I might be wrong here, but in my opinion the power clamp driver is
a bit different:

yes, it periodically forces the cluster in a low power state

1. It is dedicated to Intel SoCs, which provide special set of
registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor
to enter certain C state for a given duration. Idle duration is
calculated by per CPU set of high priority kthreads (which also
program [*] registers).

IIRC, a trial on ARM platform have been

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Viresh and Vincent,

> On 9 April 2013 16:07, Lukasz Majewski  wrote:
> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
> > Our approach is a bit different than cpufreq_ondemand one. Ondemand
> > takes the per CPU idle time, then on that basis calculates per cpu
> > load. The next step is to choose the highest load and then use this
> > value to properly scale frequency.
> >
> > On the other hand LAB tries to model different behavior:
> >
> > As a first step we applied Vincent Guittot's "pack small tasks" [*]
> > patch to improve "race to idle" behavior:
> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
> 
> Luckily he is part of my team :)
> 
> http://www.linaro.org/linux-on-arm/meet-the-team/power-management
> 
> BTW, he is using ondemand governor for all his work.
> 
> > Afterwards, we decided to investigate different approach for power
> > governing:
> >
> > Use the number of sleeping CPUs (not the maximal per-CPU load) to
> > change frequency. We thereof depend on [*] to "pack" as many tasks
> > to CPU as possible and allow other to sleep.
> 
> He packs only small tasks. 

What's about packing not only small tasks? I will investigate the
possibility to aggressively pack (even with a cost of performance
degradation) as many tasks as possible to a single CPU. 

It seems a good idea for a power consumption reduction. 

> And if there are many small tasks we are
> packing, then load must be high and so ondemand gov will increase
> freq.

This is of course true for "packing" all tasks to a single CPU. If we
stay at the power consumption envelope, we can even overclock the
frequency.

But what if other - lets say 3 CPUs - are under heavy workload? 
Ondemand will switch frequency to maximum, and as Jonghwa pointed out
this can cause dangerous temperature increase.

> 
> > Contrary, when all cores are heavily loaded, we decided to reduce
> > frequency by around 30%. With this approach user experience
> > recution is still acceptable (with much less power consumption).
> 
> Don't know.. running many cpus at lower freq for long duration will
> probably take more power than running them at high freq for short
> duration and making system idle again.
> 
> > We have posted this "RFC" patch mainly for discussion, and I think
> > it fits its purpose :-).
> 
> Yes, no issues with your RFC idea.. its perfect..
> 
> @Vincent: Can you please follow this thread a bit and tell us what
> your views are?
> 
> --
> viresh



-- 
Best regards,

Lukasz Majewski

Samsung R Poland (SRPOL) | Linux Platform Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-09 Thread jonghwa3 . lee

Hi, sorry for my late reply.
I just want to add comment to assist Lukasz's.
I put my comments below of Lukasz's.

On 2013년 04월 09일 19:37, Lukasz Majewski wrote:

> Hi Viresh,
> 
> First of all I'd like to apologize for a late response.
> Please find my comments below. 
> 
>> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
>>  wrote:
>>> <>
>>>  One of the problem of ondemand is that it considers the most busy
>>> cpu only while doesn't care how many cpu is in busy state at the
>>> moment. This may results in unnecessary power consumption, and it'll
>>> be critical for the system having limited power source.
>>>
>>>  To get the best energy efficiency, LAB governor considers not only
>>> idle time but also the number of idle cpus. It primarily focuses on
>>> supplying adequate performance to users within the limited resource.
>>> It checks the number of idle cpus then controls the maximum
>>> frequency dynamically. And it also applies different frequency
>>> increasing level depends on that information. In simple terms the
>>> less the number of busy cpus, the better performance will be given.
>>>  In addition, stable system's power consumption in the busy state
>>> can be achieved also with using LAB governor. This will help to
>>> manage and estimate power consumption in certain system.
>>
>> Hi Jonghwa,
>>
>> First of all, i should accept that i haven't got to the minute details
>> about your
>> patch until now but have done a broad review of it.
>>
>> There are many things that i am concerned about:
>> - I don't want an additional governor to be added to cpufreq unless
>> there is a very very strong reason for it. See what happened to
>> earlier attempts:
>>
>> https://lkml.org/lkml/2012/2/7/504
>>
>> But it doesn't mean you can't get it in. :)
>>
>> - What the real logic behind your patchset: I haven't got it
>> completely with your
>> mails. So what you said is:
>>
>>   - The lesser the number of busy cpus: you want to run at higher
>> freqs
>>   - The more the number of busy cpus: you want to run at lower freqs
>>
>> But the basic idea i had about this stuff was: The more the number of
>> busy cpus, the more loaded the system is, otherwise scheduler wouldn't
>> have used so many cpus and so there is need to run at higher frequency
>> rather than a lower one. Which would save power in a sense.. Finish
>> work early and let most of the cpus enter idle state as early as
>> possible. But with your solution we would run at lower frequencies
>> and so these cpus will take longer to get into idle state again. This
>> will really kill lot of power.
> 
> Our approach is a bit different than cpufreq_ondemand one. Ondemand
> takes the per CPU idle time, then on that basis calculates per cpu load.
> The next step is to choose the highest load and then use this value to
> properly scale frequency.
> 
> On the other hand LAB tries to model different behavior:
> 
> As a first step we applied Vincent Guittot's "pack small tasks" [*]
> patch to improve "race to idle" behavior:
> http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
>   
> 
> Afterwards, we decided to investigate different approach for power
> governing:
> 
> Use the number of sleeping CPUs (not the maximal per-CPU load) to
> change frequency. We thereof depend on [*] to "pack" as many tasks to
> CPU as possible and allow other to sleep. 
> On this basis we can increase (even overclock) frequency (when other
> CPUs sleep) to improve performance of the only running CPU. 
> 
> Contrary, when all cores are heavily loaded, we decided to reduce
> frequency by around 30%. With this approach user experience recution is
> still acceptable (with much less power consumption).
> When system is "idle" - only small background tasks are running, the
> frequency is reduced to minimum. 
> 
> To sum up:
> 
> Different approach (number of idle CPUs) is used by us to control
> ondemand governor. We also heavily depend on [*] patch set.
>


In additions, it is hard to say just letting high performance to busy system is
the best way for reducing the power consumption. Yes, as like Viresh said, we
can save the time of busy working, but that is not always perfect
solution. If we push all CPUs to keep high performance as much as they can,
the temperature will increase rapidly. In my test, on-demand governor reached
the thermal limits frequently, while lab governor didn't. And the high
temperature also effects the power consumption increasing. I got a rough result
which has 10% differences in power consumption between 20% temperature differed
conditions.

<>

  Temperature Power consumption(mWh) Loss(%)
 65'C  53.89  Base
 80'C  59.88   10

So, to reduce the power consumption, it looks we have to give our cares more to
avoid the situation where system goes to high temperature. It is more meaningful
for the mobile environment. In mobile devices, high

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-09 Thread Viresh Kumar

On 9 April 2013 16:07, Lukasz Majewski  wrote:
>> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
> Our approach is a bit different than cpufreq_ondemand one. Ondemand
> takes the per CPU idle time, then on that basis calculates per cpu load.
> The next step is to choose the highest load and then use this value to
> properly scale frequency.
>
> On the other hand LAB tries to model different behavior:
>
> As a first step we applied Vincent Guittot's "pack small tasks" [*]
> patch to improve "race to idle" behavior:
> http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks

Luckily he is part of my team :)

http://www.linaro.org/linux-on-arm/meet-the-team/power-management

BTW, he is using ondemand governor for all his work.

> Afterwards, we decided to investigate different approach for power
> governing:
>
> Use the number of sleeping CPUs (not the maximal per-CPU load) to
> change frequency. We thereof depend on [*] to "pack" as many tasks to
> CPU as possible and allow other to sleep.

He packs only small tasks. And if there are many small tasks we are
packing, then load must be high and so ondemand gov will increase freq.

> Contrary, when all cores are heavily loaded, we decided to reduce
> frequency by around 30%. With this approach user experience recution is
> still acceptable (with much less power consumption).

Don't know.. running many cpus at lower freq for long duration will probably
take more power than running them at high freq for short duration and making
system idle again.

> We have posted this "RFC" patch mainly for discussion, and I think it
> fits its purpose :-).

Yes, no issues with your RFC idea.. its perfect..

@Vincent: Can you please follow this thread a bit and tell us what your views
are?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Viresh,

First of all I'd like to apologize for a late response.
Please find my comments below. 

> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
>  wrote:
> > <>
> >  One of the problem of ondemand is that it considers the most busy
> > cpu only while doesn't care how many cpu is in busy state at the
> > moment. This may results in unnecessary power consumption, and it'll
> > be critical for the system having limited power source.
> >
> >  To get the best energy efficiency, LAB governor considers not only
> > idle time but also the number of idle cpus. It primarily focuses on
> > supplying adequate performance to users within the limited resource.
> > It checks the number of idle cpus then controls the maximum
> > frequency dynamically. And it also applies different frequency
> > increasing level depends on that information. In simple terms the
> > less the number of busy cpus, the better performance will be given.
> >  In addition, stable system's power consumption in the busy state
> > can be achieved also with using LAB governor. This will help to
> > manage and estimate power consumption in certain system.
> 
> Hi Jonghwa,
> 
> First of all, i should accept that i haven't got to the minute details
> about your
> patch until now but have done a broad review of it.
> 
> There are many things that i am concerned about:
> - I don't want an additional governor to be added to cpufreq unless
> there is a very very strong reason for it. See what happened to
> earlier attempts:
> 
> https://lkml.org/lkml/2012/2/7/504
> 
> But it doesn't mean you can't get it in. :)
> 
> - What the real logic behind your patchset: I haven't got it
> completely with your
> mails. So what you said is:
> 
>   - The lesser the number of busy cpus: you want to run at higher
> freqs
>   - The more the number of busy cpus: you want to run at lower freqs
> 
> But the basic idea i had about this stuff was: The more the number of
> busy cpus, the more loaded the system is, otherwise scheduler wouldn't
> have used so many cpus and so there is need to run at higher frequency
> rather than a lower one. Which would save power in a sense.. Finish
> work early and let most of the cpus enter idle state as early as
> possible. But with your solution we would run at lower frequencies
> and so these cpus will take longer to get into idle state again. This
> will really kill lot of power.

Our approach is a bit different than cpufreq_ondemand one. Ondemand
takes the per CPU idle time, then on that basis calculates per cpu load.
The next step is to choose the highest load and then use this value to
properly scale frequency.

On the other hand LAB tries to model different behavior:

As a first step we applied Vincent Guittot's "pack small tasks" [*]
patch to improve "race to idle" behavior:
http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks

Afterwards, we decided to investigate different approach for power
governing:

Use the number of sleeping CPUs (not the maximal per-CPU load) to
change frequency. We thereof depend on [*] to "pack" as many tasks to
CPU as possible and allow other to sleep. 
On this basis we can increase (even overclock) frequency (when other
CPUs sleep) to improve performance of the only running CPU. 

Contrary, when all cores are heavily loaded, we decided to reduce
frequency by around 30%. With this approach user experience recution is
still acceptable (with much less power consumption).
When system is "idle" - only small background tasks are running, the
frequency is reduced to minimum. 

To sum up:

Different approach (number of idle CPUs) is used by us to control
ondemand governor. We also heavily depend on [*] patch set.

> 
> Think about it.
> 
> - In case you need some sort of support on this use case, why
> replicate ondemand governor again by creating another governor. I
> have had some hard time removing the amount of redundancy inside
> governors and you are again going towards that direction. Modifying
> ondemand governor for this response would be a better option.

We have only posted the "RFC" so, we are open for suggestions. 

At cpufreq_governor.c file the dbs_check_cpu function is responsible
for calculating the maximal load (among CPUs). I think that we could
also count the number of sleeping CPUs (as the average of time
accumulated data). Then we could pass this data to newly written
function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and
pointed to by gov_check_cpu).

This would require change to dbs_check_cpu function and extending
ONDEMAND governor by lab_check_cpu() function.

> 
> - You haven't rebased of latest code from linux-next :)
> 

We have posted this "RFC" patch mainly for discussion, and I think it
fits its purpose :-).

-- 
Best regards,

Lukasz Majewski

Samsung R Poland (SRPOL) | Linux Platform Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

Hi Viresh,

First of all I'd like to apologize for a late response.
Please find my comments below. 

 On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
 jonghwa3@samsung.com wrote:
  Purpose
   One of the problem of ondemand is that it considers the most busy
  cpu only while doesn't care how many cpu is in busy state at the
  moment. This may results in unnecessary power consumption, and it'll
  be critical for the system having limited power source.
 
   To get the best energy efficiency, LAB governor considers not only
  idle time but also the number of idle cpus. It primarily focuses on
  supplying adequate performance to users within the limited resource.
  It checks the number of idle cpus then controls the maximum
  frequency dynamically. And it also applies different frequency
  increasing level depends on that information. In simple terms the
  less the number of busy cpus, the better performance will be given.
   In addition, stable system's power consumption in the busy state
  can be achieved also with using LAB governor. This will help to
  manage and estimate power consumption in certain system.
 
 Hi Jonghwa,
 
 First of all, i should accept that i haven't got to the minute details
 about your
 patch until now but have done a broad review of it.
 
 There are many things that i am concerned about:
 - I don't want an additional governor to be added to cpufreq unless
 there is a very very strong reason for it. See what happened to
 earlier attempts:
 
 https://lkml.org/lkml/2012/2/7/504
 
 But it doesn't mean you can't get it in. :)
 
 - What the real logic behind your patchset: I haven't got it
 completely with your
 mails. So what you said is:
 
   - The lesser the number of busy cpus: you want to run at higher
 freqs
   - The more the number of busy cpus: you want to run at lower freqs
 
 But the basic idea i had about this stuff was: The more the number of
 busy cpus, the more loaded the system is, otherwise scheduler wouldn't
 have used so many cpus and so there is need to run at higher frequency
 rather than a lower one. Which would save power in a sense.. Finish
 work early and let most of the cpus enter idle state as early as
 possible. But with your solution we would run at lower frequencies
 and so these cpus will take longer to get into idle state again. This
 will really kill lot of power.

Our approach is a bit different than cpufreq_ondemand one. Ondemand
takes the per CPU idle time, then on that basis calculates per cpu load.
The next step is to choose the highest load and then use this value to
properly scale frequency.

On the other hand LAB tries to model different behavior:

As a first step we applied Vincent Guittot's pack small tasks [*]
patch to improve race to idle behavior:
http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks


Afterwards, we decided to investigate different approach for power
governing:

Use the number of sleeping CPUs (not the maximal per-CPU load) to
change frequency. We thereof depend on [*] to pack as many tasks to
CPU as possible and allow other to sleep. 
On this basis we can increase (even overclock) frequency (when other
CPUs sleep) to improve performance of the only running CPU. 

Contrary, when all cores are heavily loaded, we decided to reduce
frequency by around 30%. With this approach user experience recution is
still acceptable (with much less power consumption).
When system is idle - only small background tasks are running, the
frequency is reduced to minimum. 

To sum up:

Different approach (number of idle CPUs) is used by us to control
ondemand governor. We also heavily depend on [*] patch set.

 
 Think about it.
 
 - In case you need some sort of support on this use case, why
 replicate ondemand governor again by creating another governor. I
 have had some hard time removing the amount of redundancy inside
 governors and you are again going towards that direction. Modifying
 ondemand governor for this response would be a better option.

We have only posted the RFC so, we are open for suggestions. 

At cpufreq_governor.c file the dbs_check_cpu function is responsible
for calculating the maximal load (among CPUs). I think that we could
also count the number of sleeping CPUs (as the average of time
accumulated data). Then we could pass this data to newly written
function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and
pointed to by gov_check_cpu).

This would require change to dbs_check_cpu function and extending
ONDEMAND governor by lab_check_cpu() function.

 
 - You haven't rebased of latest code from linux-next :)
 

We have posted this RFC patch mainly for discussion, and I think it
fits its purpose :-).



-- 
Best regards,

Lukasz Majewski

Samsung RD Poland (SRPOL) | Linux Platform Group
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-09 Thread Viresh Kumar

On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote:
 On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
 Our approach is a bit different than cpufreq_ondemand one. Ondemand
 takes the per CPU idle time, then on that basis calculates per cpu load.
 The next step is to choose the highest load and then use this value to
 properly scale frequency.

 On the other hand LAB tries to model different behavior:

 As a first step we applied Vincent Guittot's pack small tasks [*]
 patch to improve race to idle behavior:
 http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks

Luckily he is part of my team :)

http://www.linaro.org/linux-on-arm/meet-the-team/power-management

BTW, he is using ondemand governor for all his work.

 Afterwards, we decided to investigate different approach for power
 governing:

 Use the number of sleeping CPUs (not the maximal per-CPU load) to
 change frequency. We thereof depend on [*] to pack as many tasks to
 CPU as possible and allow other to sleep.

He packs only small tasks. And if there are many small tasks we are
packing, then load must be high and so ondemand gov will increase freq.

 Contrary, when all cores are heavily loaded, we decided to reduce
 frequency by around 30%. With this approach user experience recution is
 still acceptable (with much less power consumption).

Don't know.. running many cpus at lower freq for long duration will probably
take more power than running them at high freq for short duration and making
system idle again.

 We have posted this RFC patch mainly for discussion, and I think it
 fits its purpose :-).

Yes, no issues with your RFC idea.. its perfect..

@Vincent: Can you please follow this thread a bit and tell us what your views
are?

--
viresh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-09 Thread jonghwa3 . lee

Hi, sorry for my late reply.
I just want to add comment to assist Lukasz's.
I put my comments below of Lukasz's.

On 2013년 04월 09일 19:37, Lukasz Majewski wrote:

 Hi Viresh,
 
 First of all I'd like to apologize for a late response.
 Please find my comments below. 
 
 On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee
 jonghwa3@samsung.com wrote:
 Purpose
  One of the problem of ondemand is that it considers the most busy
 cpu only while doesn't care how many cpu is in busy state at the
 moment. This may results in unnecessary power consumption, and it'll
 be critical for the system having limited power source.

  To get the best energy efficiency, LAB governor considers not only
 idle time but also the number of idle cpus. It primarily focuses on
 supplying adequate performance to users within the limited resource.
 It checks the number of idle cpus then controls the maximum
 frequency dynamically. And it also applies different frequency
 increasing level depends on that information. In simple terms the
 less the number of busy cpus, the better performance will be given.
  In addition, stable system's power consumption in the busy state
 can be achieved also with using LAB governor. This will help to
 manage and estimate power consumption in certain system.

 Hi Jonghwa,

 First of all, i should accept that i haven't got to the minute details
 about your
 patch until now but have done a broad review of it.

 There are many things that i am concerned about:
 - I don't want an additional governor to be added to cpufreq unless
 there is a very very strong reason for it. See what happened to
 earlier attempts:

 https://lkml.org/lkml/2012/2/7/504

 But it doesn't mean you can't get it in. :)

 - What the real logic behind your patchset: I haven't got it
 completely with your
 mails. So what you said is:

   - The lesser the number of busy cpus: you want to run at higher
 freqs
   - The more the number of busy cpus: you want to run at lower freqs

 But the basic idea i had about this stuff was: The more the number of
 busy cpus, the more loaded the system is, otherwise scheduler wouldn't
 have used so many cpus and so there is need to run at higher frequency
 rather than a lower one. Which would save power in a sense.. Finish
 work early and let most of the cpus enter idle state as early as
 possible. But with your solution we would run at lower frequencies
 and so these cpus will take longer to get into idle state again. This
 will really kill lot of power.
 
 Our approach is a bit different than cpufreq_ondemand one. Ondemand
 takes the per CPU idle time, then on that basis calculates per cpu load.
 The next step is to choose the highest load and then use this value to
 properly scale frequency.
 
 On the other hand LAB tries to model different behavior:
 
 As a first step we applied Vincent Guittot's pack small tasks [*]
 patch to improve race to idle behavior:
 http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks
   
 
 Afterwards, we decided to investigate different approach for power
 governing:
 
 Use the number of sleeping CPUs (not the maximal per-CPU load) to
 change frequency. We thereof depend on [*] to pack as many tasks to
 CPU as possible and allow other to sleep. 
 On this basis we can increase (even overclock) frequency (when other
 CPUs sleep) to improve performance of the only running CPU. 
 
 Contrary, when all cores are heavily loaded, we decided to reduce
 frequency by around 30%. With this approach user experience recution is
 still acceptable (with much less power consumption).
 When system is idle - only small background tasks are running, the
 frequency is reduced to minimum. 
 
 To sum up:
 
 Different approach (number of idle CPUs) is used by us to control
 ondemand governor. We also heavily depend on [*] patch set.



In additions, it is hard to say just letting high performance to busy system is
the best way for reducing the power consumption. Yes, as like Viresh said, we
can save the time of busy working, but that is not always perfect
solution. If we push all CPUs to keep high performance as much as they can,
the temperature will increase rapidly. In my test, on-demand governor reached
the thermal limits frequently, while lab governor didn't. And the high
temperature also effects the power consumption increasing. I got a rough result
which has 10% differences in power consumption between 20% temperature differed
conditions.

Consumed power with maximum frequency at different temperature

  Temperature Power consumption(mWh) Loss(%)
 65'C  53.89  Base
 80'C  59.88   10

So, to reduce the power consumption, it looks we have to give our cares more to
avoid the situation where system goes to high temperature. It is more meaningful
for the mobile environment. In mobile devices, high temperature will also affect
user's experience badly and can't be

Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.