Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 10 April 2013 11:38, Lukasz Majewski wrote: > Hi Vincent, > >> On 10 April 2013 10:44, Lukasz Majewski >> wrote: >> > Hi Vincent, >> > >> >> >> >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski >> >> wrote: >> >> > Hi Viresh and Vincent, >> >> > >> >> >> On 9 April 2013 16:07, Lukasz Majewski >> >> >> wrote: >> >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> >> > calculates per cpu load. The next step is to choose the >> >> >> > highest load and then use this value to properly scale >> >> >> > frequency. >> >> >> > >> >> >> > On the other hand LAB tries to model different behavior: >> >> >> > >> >> >> > As a first step we applied Vincent Guittot's "pack small >> >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> >> >> Luckily he is part of my team :) >> >> >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> >> > power governing: >> >> >> > >> >> >> > Use the number of sleeping CPUs (not the maximal per-CPU >> >> >> > load) to change frequency. We thereof depend on [*] to "pack" >> >> >> > as many tasks to CPU as possible and allow other to sleep. >> >> >> >> >> >> He packs only small tasks. >> >> > >> >> > What's about packing not only small tasks? I will investigate the >> >> > possibility to aggressively pack (even with a cost of performance >> >> > degradation) as many tasks as possible to a single CPU. >> >> >> >> Hi Lukasz, >> >> >> >> I've got same comment on my current patch and I'm preparing a new >> >> version that can pack tasks more agressively based on the same >> >> buddy mecanism. This will be done at the cost of performance of >> >> course. >> > >> > Can you share your development tree? >> >> The dev is not finished yet but i will share it as soon as possible > > Ok > >> >> > >> >> >> >> >> >> > >> >> > It seems a good idea for a power consumption reduction. >> >> >> >> In fact, it's not always true and depends several inputs like the >> >> number of tasks that run simultaneously >> > >> > In my understanding, we can try to couple (affine) maximal number of >> > task with a CPU. Performance shall decrease, but we will avoid >> > costs of tasks migration. >> > >> > If I remember correctly, I've asked you about some testbench/test >> > program for scheduler evaluation. I assume that nothing has changed >> > and there isn't any "common" set of scheduler tests? >> >> There are a bunch of bench that are used to evaluate scheduler like >> hackbench, pgbench but they generally fills all CPU in order to test >> max performance. Are you looking for such kind of bench ? > > I'd rather see a bit different set of tests - something similar to > "cyclic" tests for PREEMPT_RT patch. > > For sched work it would be welcome to spawn a lot of processes with > different duration and workload. And on this basis observe if e.g. 2 or > 3 processors are idle. sanjay is working on something like that: https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master > >> >> > >> >> >> >> > >> >> >> And if there are many small tasks we are >> >> >> packing, then load must be high and so ondemand gov will >> >> >> increase freq. >> >> > >> >> > This is of course true for "packing" all tasks to a single CPU. >> >> > If we stay at the power consumption envelope, we can even >> >> > overclock the frequency. >> >> > >> >> > But what if other - lets say 3 CPUs - are under heavy workload? >> >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> >> > out this can cause dangerous temperature increase. >> >> >> >> IIUC, your main concern is to stay in a power consumption budget to >> >> not over heat and have to face the side effect of high temperature >> >> like a decrease of power efficiency. So your governor modifies the >> >> max frequency based on the number of running/idle CPU >> > Yes, this is correct. >> > >> >> to have an >> >> almost stable power consumtpion ? >> > >> > From our observation it seems, that for 3 or 4 running CPUs under >> > heavy load we see much more power consumption reduction. >> >> That's logic because you will reduce the voltage >> >> > >> > To put it in another way - ondemand would increase frequency to max >> > for all 4 CPUs. On the other hand, if user experience drops to the >> > acceptable level we can reduce power consumption. >> > >> > Reducing frequency and CPU voltage (by DVS) causes as a side effect, >> > that temperature stays at acceptable level. >> > >> >> >> >> Have you also looked at the power clamp driver that have similar >> >> target ? >> > >> > I might be wrong here, but in my
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote: [...] > > Have you also looked at the power clamp driver that have similar > > target ? > > I might be wrong here, but in my opinion the power clamp driver is a bit > different: > > 1. It is dedicated to Intel SoCs, which provide special set of > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to > enter certain C state for a given duration. Idle duration is calculated > by per CPU set of high priority kthreads (which also program [*] > registers). > Those registers are used for compensation (ie user asked a given idle ratio but HW stats show a mismatch) and they are not "programmed" they are just read. That code is Intel specific but it can be easily ported to ARM, I did that and most of the code is common with zero dependency on the architecture. > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Well, it is true that most of the SoCs I am working on do not have a programming interface to monitor C-state residency, granted, this is a problem. If those stats can be retrieved somehow (I did that on our TC2 platform) then power clamp can be used on ARM with minor modifications. Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Vincent, > On 10 April 2013 10:44, Lukasz Majewski > wrote: > > Hi Vincent, > > > >> > >> > >> On Tuesday, 9 April 2013, Lukasz Majewski > >> wrote: > >> > Hi Viresh and Vincent, > >> > > >> >> On 9 April 2013 16:07, Lukasz Majewski > >> >> wrote: > >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > >> >> > Our approach is a bit different than cpufreq_ondemand one. > >> >> > Ondemand takes the per CPU idle time, then on that basis > >> >> > calculates per cpu load. The next step is to choose the > >> >> > highest load and then use this value to properly scale > >> >> > frequency. > >> >> > > >> >> > On the other hand LAB tries to model different behavior: > >> >> > > >> >> > As a first step we applied Vincent Guittot's "pack small > >> >> > tasks" [*] patch to improve "race to idle" behavior: > >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > >> >> > >> >> Luckily he is part of my team :) > >> >> > >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management > >> >> > >> >> BTW, he is using ondemand governor for all his work. > >> >> > >> >> > Afterwards, we decided to investigate different approach for > >> >> > power governing: > >> >> > > >> >> > Use the number of sleeping CPUs (not the maximal per-CPU > >> >> > load) to change frequency. We thereof depend on [*] to "pack" > >> >> > as many tasks to CPU as possible and allow other to sleep. > >> >> > >> >> He packs only small tasks. > >> > > >> > What's about packing not only small tasks? I will investigate the > >> > possibility to aggressively pack (even with a cost of performance > >> > degradation) as many tasks as possible to a single CPU. > >> > >> Hi Lukasz, > >> > >> I've got same comment on my current patch and I'm preparing a new > >> version that can pack tasks more agressively based on the same > >> buddy mecanism. This will be done at the cost of performance of > >> course. > > > > Can you share your development tree? > > The dev is not finished yet but i will share it as soon as possible Ok > > > > >> > >> > >> > > >> > It seems a good idea for a power consumption reduction. > >> > >> In fact, it's not always true and depends several inputs like the > >> number of tasks that run simultaneously > > > > In my understanding, we can try to couple (affine) maximal number of > > task with a CPU. Performance shall decrease, but we will avoid > > costs of tasks migration. > > > > If I remember correctly, I've asked you about some testbench/test > > program for scheduler evaluation. I assume that nothing has changed > > and there isn't any "common" set of scheduler tests? > > There are a bunch of bench that are used to evaluate scheduler like > hackbench, pgbench but they generally fills all CPU in order to test > max performance. Are you looking for such kind of bench ? I'd rather see a bit different set of tests - something similar to "cyclic" tests for PREEMPT_RT patch. For sched work it would be welcome to spawn a lot of processes with different duration and workload. And on this basis observe if e.g. 2 or 3 processors are idle. > > > > >> > >> > > >> >> And if there are many small tasks we are > >> >> packing, then load must be high and so ondemand gov will > >> >> increase freq. > >> > > >> > This is of course true for "packing" all tasks to a single CPU. > >> > If we stay at the power consumption envelope, we can even > >> > overclock the frequency. > >> > > >> > But what if other - lets say 3 CPUs - are under heavy workload? > >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed > >> > out this can cause dangerous temperature increase. > >> > >> IIUC, your main concern is to stay in a power consumption budget to > >> not over heat and have to face the side effect of high temperature > >> like a decrease of power efficiency. So your governor modifies the > >> max frequency based on the number of running/idle CPU > > Yes, this is correct. > > > >> to have an > >> almost stable power consumtpion ? > > > > From our observation it seems, that for 3 or 4 running CPUs under > > heavy load we see much more power consumption reduction. > > That's logic because you will reduce the voltage > > > > > To put it in another way - ondemand would increase frequency to max > > for all 4 CPUs. On the other hand, if user experience drops to the > > acceptable level we can reduce power consumption. > > > > Reducing frequency and CPU voltage (by DVS) causes as a side effect, > > that temperature stays at acceptable level. > > > >> > >> Have you also looked at the power clamp driver that have similar > >> target ? > > > > I might be wrong here, but in my opinion the power clamp driver is > > a bit different: > > yes, it periodically forces the cluster in a low power state > > > > > 1. It is dedicated to Intel SoCs, which provide special set of > > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor > > to enter certain C state for a given duration. Idle
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 10 April 2013 10:44, Lukasz Majewski wrote: > Hi Vincent, > >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski >> wrote: >> > Hi Viresh and Vincent, >> > >> >> On 9 April 2013 16:07, Lukasz Majewski >> >> wrote: >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> > calculates per cpu load. The next step is to choose the highest >> >> > load and then use this value to properly scale frequency. >> >> > >> >> > On the other hand LAB tries to model different behavior: >> >> > >> >> > As a first step we applied Vincent Guittot's "pack small >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> Luckily he is part of my team :) >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> > power governing: >> >> > >> >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to >> >> > change frequency. We thereof depend on [*] to "pack" as many >> >> > tasks to CPU as possible and allow other to sleep. >> >> >> >> He packs only small tasks. >> > >> > What's about packing not only small tasks? I will investigate the >> > possibility to aggressively pack (even with a cost of performance >> > degradation) as many tasks as possible to a single CPU. >> >> Hi Lukasz, >> >> I've got same comment on my current patch and I'm preparing a new >> version that can pack tasks more agressively based on the same buddy >> mecanism. This will be done at the cost of performance of course. > > Can you share your development tree? The dev is not finished yet but i will share it as soon as possible > >> >> >> > >> > It seems a good idea for a power consumption reduction. >> >> In fact, it's not always true and depends several inputs like the >> number of tasks that run simultaneously > > In my understanding, we can try to couple (affine) maximal number of > task with a CPU. Performance shall decrease, but we will avoid costs of > tasks migration. > > If I remember correctly, I've asked you about some testbench/test > program for scheduler evaluation. I assume that nothing has changed and > there isn't any "common" set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? > >> >> > >> >> And if there are many small tasks we are >> >> packing, then load must be high and so ondemand gov will increase >> >> freq. >> > >> > This is of course true for "packing" all tasks to a single CPU. If >> > we stay at the power consumption envelope, we can even overclock the >> > frequency. >> > >> > But what if other - lets say 3 CPUs - are under heavy workload? >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> > out this can cause dangerous temperature increase. >> >> IIUC, your main concern is to stay in a power consumption budget to >> not over heat and have to face the side effect of high temperature >> like a decrease of power efficiency. So your governor modifies the >> max frequency based on the number of running/idle CPU > Yes, this is correct. > >> to have an >> almost stable power consumtpion ? > > From our observation it seems, that for 3 or 4 running CPUs under heavy > load we see much more power consumption reduction. That's logic because you will reduce the voltage > > To put it in another way - ondemand would increase frequency to max for > all 4 CPUs. On the other hand, if user experience drops to the > acceptable level we can reduce power consumption. > > Reducing frequency and CPU voltage (by DVS) causes as a side effect, > that temperature stays at acceptable level. > >> >> Have you also looked at the power clamp driver that have similar >> target ? > > I might be wrong here, but in my opinion the power clamp driver is a bit > different: yes, it periodically forces the cluster in a low power state > > 1. It is dedicated to Intel SoCs, which provide special set of > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to > enter certain C state for a given duration. Idle duration is calculated > by per CPU set of high priority kthreads (which also program [*] > registers). IIRC, a trial on ARM platform have been done by lorenzo and daniel. Lorenzo, Daniel, have you more information ? > > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. > Scheduler has to remove tasks from a particular CPU and "execute" on > it the idle_task. > Moreover at Exynos4 thermal control loop depends on SW, since we can > only read SoC temperature via TMU (Thermal Management Unit) block. The idle duration is quite
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Vincent, > > > On Tuesday, 9 April 2013, Lukasz Majewski > wrote: > > Hi Viresh and Vincent, > > > >> On 9 April 2013 16:07, Lukasz Majewski > >> wrote: > >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > >> > Our approach is a bit different than cpufreq_ondemand one. > >> > Ondemand takes the per CPU idle time, then on that basis > >> > calculates per cpu load. The next step is to choose the highest > >> > load and then use this value to properly scale frequency. > >> > > >> > On the other hand LAB tries to model different behavior: > >> > > >> > As a first step we applied Vincent Guittot's "pack small > >> > tasks" [*] patch to improve "race to idle" behavior: > >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > >> > >> Luckily he is part of my team :) > >> > >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management > >> > >> BTW, he is using ondemand governor for all his work. > >> > >> > Afterwards, we decided to investigate different approach for > >> > power governing: > >> > > >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to > >> > change frequency. We thereof depend on [*] to "pack" as many > >> > tasks to CPU as possible and allow other to sleep. > >> > >> He packs only small tasks. > > > > What's about packing not only small tasks? I will investigate the > > possibility to aggressively pack (even with a cost of performance > > degradation) as many tasks as possible to a single CPU. > > Hi Lukasz, > > I've got same comment on my current patch and I'm preparing a new > version that can pack tasks more agressively based on the same buddy > mecanism. This will be done at the cost of performance of course. Can you share your development tree? > > > > > > It seems a good idea for a power consumption reduction. > > In fact, it's not always true and depends several inputs like the > number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any "common" set of scheduler tests? > > > > >> And if there are many small tasks we are > >> packing, then load must be high and so ondemand gov will increase > >> freq. > > > > This is of course true for "packing" all tasks to a single CPU. If > > we stay at the power consumption envelope, we can even overclock the > > frequency. > > > > But what if other - lets say 3 CPUs - are under heavy workload? > > Ondemand will switch frequency to maximum, and as Jonghwa pointed > > out this can cause dangerous temperature increase. > > IIUC, your main concern is to stay in a power consumption budget to > not over heat and have to face the side effect of high temperature > like a decrease of power efficiency. So your governor modifies the > max frequency based on the number of running/idle CPU Yes, this is correct. > to have an > almost stable power consumtpion ? >From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. > > Have you also looked at the power clamp driver that have similar > target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to remove tasks from a particular CPU and "execute" on it the idle_task. Moreover at Exynos4 thermal control loop depends on SW, since we can only read SoC temperature via TMU (Thermal Management Unit) block. Correct me again, but it seems to me that on ARM we can use CPU hotplug (which as Tomas Glexner stated recently is going to be "refactored" :-) ) or "ask" scheduler to use smallest possible number of CPUs and enter C state for idling CPUs. > > > Vincent > > > > >> > >> > Contrary, when all cores are heavily loaded, we decided to reduce > >> > frequency by around 30%. With this approach user experience > >> > recution is still acceptable (with much less power consumption). > >> > >> Don't know.. running many cpus at lower freq for long duration will > >> probably take more power than running them at high freq for short > >> duration and making system idle again. > >> > >> > We have
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 9 April 2013 20:52, Vincent Guittot wrote: > > > On Tuesday, 9 April 2013, Lukasz Majewski wrote: >> Hi Viresh and Vincent, >> >>> On 9 April 2013 16:07, Lukasz Majewski wrote: >>> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >>> > Our approach is a bit different than cpufreq_ondemand one. Ondemand >>> > takes the per CPU idle time, then on that basis calculates per cpu >>> > load. The next step is to choose the highest load and then use this >>> > value to properly scale frequency. >>> > >>> > On the other hand LAB tries to model different behavior: >>> > >>> > As a first step we applied Vincent Guittot's "pack small tasks" [*] >>> > patch to improve "race to idle" behavior: >>> > >>> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >>> >>> Luckily he is part of my team :) >>> >>> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >>> >>> BTW, he is using ondemand governor for all his work. >>> >>> > Afterwards, we decided to investigate different approach for power >>> > governing: >>> > >>> > Use the number of sleeping CPUs (not the maximal per-CPU load) to >>> > change frequency. We thereof depend on [*] to "pack" as many tasks >>> > to CPU as possible and allow other to sleep. >>> >>> He packs only small tasks. >> >> What's about packing not only small tasks? I will investigate the >> possibility to aggressively pack (even with a cost of performance >> degradation) as many tasks as possible to a single CPU. > > Hi Lukasz, > > I've got same comment on my current patch and I'm preparing a new version > that can pack tasks more agressively based on the same buddy mecanism. This > will be done at the cost of performance of course. > > > >> >> It seems a good idea for a power consumption reduction. > > In fact, it's not always true and depends several inputs like the number of > tasks that run simultaneously > > >> >>> And if there are many small tasks we are >>> packing, then load must be high and so ondemand gov will increase >>> freq. >> >> This is of course true for "packing" all tasks to a single CPU. If we >> stay at the power consumption envelope, we can even overclock the >> frequency. >> >> But what if other - lets say 3 CPUs - are under heavy workload? >> Ondemand will switch frequency to maximum, and as Jonghwa pointed out >> this can cause dangerous temperature increase. > > IIUC, your main concern is to stay in a power consumption budget to not over > heat and have to face the side effect of high temperature like a decrease of > power efficiency. So your governor modifies the max frequency based on the > number of running/idle CPU to have an almost stable power consumtpion ? > > Have you also looked at the power clamp driver that have similar target ? > > > Vincent > > >> >>> >>> > Contrary, when all cores are heavily loaded, we decided to reduce >>> > frequency by around 30%. With this approach user experience >>> > recution is still acceptable (with much less power consumption). >>> >>> Don't know.. running many cpus at lower freq for long duration will >>> probably take more power than running them at high freq for short >>> duration and making system idle again. >>> >>> > We have posted this "RFC" patch mainly for discussion, and I think >>> > it fits its purpose :-). >>> >>> Yes, no issues with your RFC idea.. its perfect.. >>> >>> @Vincent: Can you please follow this thread a bit and tell us what >>> your views are? >>> >>> -- >>> viresh >> >> >> >> -- >> Best regards, >> >> Lukasz Majewski >> >> Samsung R Poland (SRPOL) | Linux Platform Group >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 9 April 2013 20:52, Vincent Guittot vincent.guit...@linaro.org wrote: On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote: Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. Hi Lukasz, I've got same comment on my current patch and I'm preparing a new version that can pack tasks more agressively based on the same buddy mecanism. This will be done at the cost of performance of course. It seems a good idea for a power consumption reduction. In fact, it's not always true and depends several inputs like the number of tasks that run simultaneously And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. IIUC, your main concern is to stay in a power consumption budget to not over heat and have to face the side effect of high temperature like a decrease of power efficiency. So your governor modifies the max frequency based on the number of running/idle CPU to have an almost stable power consumtpion ? Have you also looked at the power clamp driver that have similar target ? Vincent Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. We have posted this RFC patch mainly for discussion, and I think it fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread a bit and tell us what your views are? -- viresh -- Best regards, Lukasz Majewski Samsung RD Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Vincent, On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote: Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. Hi Lukasz, I've got same comment on my current patch and I'm preparing a new version that can pack tasks more agressively based on the same buddy mecanism. This will be done at the cost of performance of course. Can you share your development tree? It seems a good idea for a power consumption reduction. In fact, it's not always true and depends several inputs like the number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any common set of scheduler tests? And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. IIUC, your main concern is to stay in a power consumption budget to not over heat and have to face the side effect of high temperature like a decrease of power efficiency. So your governor modifies the max frequency based on the number of running/idle CPU Yes, this is correct. to have an almost stable power consumtpion ? From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. Have you also looked at the power clamp driver that have similar target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to remove tasks from a particular CPU and execute on it the idle_task. Moreover at Exynos4 thermal control loop depends on SW, since we can only read SoC temperature via TMU (Thermal Management Unit) block. Correct me again, but it seems to me that on ARM we can use CPU hotplug (which as Tomas Glexner stated recently is going to be refactored :-) ) or ask scheduler to use smallest possible number of CPUs and enter C state for idling CPUs. Vincent Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. We have posted this RFC patch mainly for discussion, and I think it fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com wrote: Hi Vincent, On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote: Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. Hi Lukasz, I've got same comment on my current patch and I'm preparing a new version that can pack tasks more agressively based on the same buddy mecanism. This will be done at the cost of performance of course. Can you share your development tree? The dev is not finished yet but i will share it as soon as possible It seems a good idea for a power consumption reduction. In fact, it's not always true and depends several inputs like the number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any common set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. IIUC, your main concern is to stay in a power consumption budget to not over heat and have to face the side effect of high temperature like a decrease of power efficiency. So your governor modifies the max frequency based on the number of running/idle CPU Yes, this is correct. to have an almost stable power consumtpion ? From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. That's logic because you will reduce the voltage To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. Have you also looked at the power clamp driver that have similar target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: yes, it periodically forces the cluster in a low power state 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). IIRC, a trial on ARM platform have been done by lorenzo and daniel. Lorenzo, Daniel, have you more information ? 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to remove tasks from a particular CPU and execute on it the idle_task. Moreover at Exynos4 thermal control loop depends on SW, since we can only read SoC temperature via TMU (Thermal Management Unit) block. The idle duration is quite small and should not perturb normal behavior Vincent Correct me again, but it seems to me that on ARM we can use CPU hotplug (which as Tomas Glexner stated recently is going to be refactored :-) ) or ask scheduler to use
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Vincent, On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com wrote: Hi Vincent, On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote: Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. Hi Lukasz, I've got same comment on my current patch and I'm preparing a new version that can pack tasks more agressively based on the same buddy mecanism. This will be done at the cost of performance of course. Can you share your development tree? The dev is not finished yet but i will share it as soon as possible Ok It seems a good idea for a power consumption reduction. In fact, it's not always true and depends several inputs like the number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any common set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? I'd rather see a bit different set of tests - something similar to cyclic tests for PREEMPT_RT patch. For sched work it would be welcome to spawn a lot of processes with different duration and workload. And on this basis observe if e.g. 2 or 3 processors are idle. And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. IIUC, your main concern is to stay in a power consumption budget to not over heat and have to face the side effect of high temperature like a decrease of power efficiency. So your governor modifies the max frequency based on the number of running/idle CPU Yes, this is correct. to have an almost stable power consumtpion ? From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. That's logic because you will reduce the voltage To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. Have you also looked at the power clamp driver that have similar target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: yes, it periodically forces the cluster in a low power state 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). IIRC, a trial on ARM platform have been done by lorenzo and daniel. Lorenzo, Daniel, have you more information ? More information would be welcome :-) 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Scheduler has to
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote: [...] Have you also looked at the power clamp driver that have similar target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). Those registers are used for compensation (ie user asked a given idle ratio but HW stats show a mismatch) and they are not programmed they are just read. That code is Intel specific but it can be easily ported to ARM, I did that and most of the code is common with zero dependency on the architecture. 2. ARM SoCs don't have such infrastructure, so we depend on SW here. Well, it is true that most of the SoCs I am working on do not have a programming interface to monitor C-state residency, granted, this is a problem. If those stats can be retrieved somehow (I did that on our TC2 platform) then power clamp can be used on ARM with minor modifications. Lorenzo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 10 April 2013 11:38, Lukasz Majewski l.majew...@samsung.com wrote: Hi Vincent, On 10 April 2013 10:44, Lukasz Majewski l.majew...@samsung.com wrote: Hi Vincent, On Tuesday, 9 April 2013, Lukasz Majewski l.majew...@samsung.com wrote: Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. Hi Lukasz, I've got same comment on my current patch and I'm preparing a new version that can pack tasks more agressively based on the same buddy mecanism. This will be done at the cost of performance of course. Can you share your development tree? The dev is not finished yet but i will share it as soon as possible Ok It seems a good idea for a power consumption reduction. In fact, it's not always true and depends several inputs like the number of tasks that run simultaneously In my understanding, we can try to couple (affine) maximal number of task with a CPU. Performance shall decrease, but we will avoid costs of tasks migration. If I remember correctly, I've asked you about some testbench/test program for scheduler evaluation. I assume that nothing has changed and there isn't any common set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? I'd rather see a bit different set of tests - something similar to cyclic tests for PREEMPT_RT patch. For sched work it would be welcome to spawn a lot of processes with different duration and workload. And on this basis observe if e.g. 2 or 3 processors are idle. sanjay is working on something like that: https://git.linaro.org/gitweb?p=people/sanjayrawat/cyclicTest.git;a=shortlog;h=refs/heads/master And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. IIUC, your main concern is to stay in a power consumption budget to not over heat and have to face the side effect of high temperature like a decrease of power efficiency. So your governor modifies the max frequency based on the number of running/idle CPU Yes, this is correct. to have an almost stable power consumtpion ? From our observation it seems, that for 3 or 4 running CPUs under heavy load we see much more power consumption reduction. That's logic because you will reduce the voltage To put it in another way - ondemand would increase frequency to max for all 4 CPUs. On the other hand, if user experience drops to the acceptable level we can reduce power consumption. Reducing frequency and CPU voltage (by DVS) causes as a side effect, that temperature stays at acceptable level. Have you also looked at the power clamp driver that have similar target ? I might be wrong here, but in my opinion the power clamp driver is a bit different: yes, it periodically forces the cluster in a low power state 1. It is dedicated to Intel SoCs, which provide special set of registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to enter certain C state for a given duration. Idle duration is calculated by per CPU set of high priority kthreads (which also program [*] registers). IIRC, a trial on ARM platform have been
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Viresh and Vincent, > On 9 April 2013 16:07, Lukasz Majewski wrote: > >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > > Our approach is a bit different than cpufreq_ondemand one. Ondemand > > takes the per CPU idle time, then on that basis calculates per cpu > > load. The next step is to choose the highest load and then use this > > value to properly scale frequency. > > > > On the other hand LAB tries to model different behavior: > > > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > > patch to improve "race to idle" behavior: > > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > > Luckily he is part of my team :) > > http://www.linaro.org/linux-on-arm/meet-the-team/power-management > > BTW, he is using ondemand governor for all his work. > > > Afterwards, we decided to investigate different approach for power > > governing: > > > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > > change frequency. We thereof depend on [*] to "pack" as many tasks > > to CPU as possible and allow other to sleep. > > He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. It seems a good idea for a power consumption reduction. > And if there are many small tasks we are > packing, then load must be high and so ondemand gov will increase > freq. This is of course true for "packing" all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. > > > Contrary, when all cores are heavily loaded, we decided to reduce > > frequency by around 30%. With this approach user experience > > recution is still acceptable (with much less power consumption). > > Don't know.. running many cpus at lower freq for long duration will > probably take more power than running them at high freq for short > duration and making system idle again. > > > We have posted this "RFC" patch mainly for discussion, and I think > > it fits its purpose :-). > > Yes, no issues with your RFC idea.. its perfect.. > > @Vincent: Can you please follow this thread a bit and tell us what > your views are? > > -- > viresh -- Best regards, Lukasz Majewski Samsung R Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi, sorry for my late reply. I just want to add comment to assist Lukasz's. I put my comments below of Lukasz's. On 2013년 04월 09일 19:37, Lukasz Majewski wrote: > Hi Viresh, > > First of all I'd like to apologize for a late response. > Please find my comments below. > >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> wrote: >>> <> >>> One of the problem of ondemand is that it considers the most busy >>> cpu only while doesn't care how many cpu is in busy state at the >>> moment. This may results in unnecessary power consumption, and it'll >>> be critical for the system having limited power source. >>> >>> To get the best energy efficiency, LAB governor considers not only >>> idle time but also the number of idle cpus. It primarily focuses on >>> supplying adequate performance to users within the limited resource. >>> It checks the number of idle cpus then controls the maximum >>> frequency dynamically. And it also applies different frequency >>> increasing level depends on that information. In simple terms the >>> less the number of busy cpus, the better performance will be given. >>> In addition, stable system's power consumption in the busy state >>> can be achieved also with using LAB governor. This will help to >>> manage and estimate power consumption in certain system. >> >> Hi Jonghwa, >> >> First of all, i should accept that i haven't got to the minute details >> about your >> patch until now but have done a broad review of it. >> >> There are many things that i am concerned about: >> - I don't want an additional governor to be added to cpufreq unless >> there is a very very strong reason for it. See what happened to >> earlier attempts: >> >> https://lkml.org/lkml/2012/2/7/504 >> >> But it doesn't mean you can't get it in. :) >> >> - What the real logic behind your patchset: I haven't got it >> completely with your >> mails. So what you said is: >> >> - The lesser the number of busy cpus: you want to run at higher >> freqs >> - The more the number of busy cpus: you want to run at lower freqs >> >> But the basic idea i had about this stuff was: The more the number of >> busy cpus, the more loaded the system is, otherwise scheduler wouldn't >> have used so many cpus and so there is need to run at higher frequency >> rather than a lower one. Which would save power in a sense.. Finish >> work early and let most of the cpus enter idle state as early as >> possible. But with your solution we would run at lower frequencies >> and so these cpus will take longer to get into idle state again. This >> will really kill lot of power. > > Our approach is a bit different than cpufreq_ondemand one. Ondemand > takes the per CPU idle time, then on that basis calculates per cpu load. > The next step is to choose the highest load and then use this value to > properly scale frequency. > > On the other hand LAB tries to model different behavior: > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > patch to improve "race to idle" behavior: > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks > > > Afterwards, we decided to investigate different approach for power > governing: > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > change frequency. We thereof depend on [*] to "pack" as many tasks to > CPU as possible and allow other to sleep. > On this basis we can increase (even overclock) frequency (when other > CPUs sleep) to improve performance of the only running CPU. > > Contrary, when all cores are heavily loaded, we decided to reduce > frequency by around 30%. With this approach user experience recution is > still acceptable (with much less power consumption). > When system is "idle" - only small background tasks are running, the > frequency is reduced to minimum. > > To sum up: > > Different approach (number of idle CPUs) is used by us to control > ondemand governor. We also heavily depend on [*] patch set. > In additions, it is hard to say just letting high performance to busy system is the best way for reducing the power consumption. Yes, as like Viresh said, we can save the time of busy working, but that is not always perfect solution. If we push all CPUs to keep high performance as much as they can, the temperature will increase rapidly. In my test, on-demand governor reached the thermal limits frequently, while lab governor didn't. And the high temperature also effects the power consumption increasing. I got a rough result which has 10% differences in power consumption between 20% temperature differed conditions. <> Temperature Power consumption(mWh) Loss(%) 65'C 53.89 Base 80'C 59.88 10 So, to reduce the power consumption, it looks we have to give our cares more to avoid the situation where system goes to high temperature. It is more meaningful for the mobile environment. In mobile devices, high
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 9 April 2013 16:07, Lukasz Majewski wrote: >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > Our approach is a bit different than cpufreq_ondemand one. Ondemand > takes the per CPU idle time, then on that basis calculates per cpu load. > The next step is to choose the highest load and then use this value to > properly scale frequency. > > On the other hand LAB tries to model different behavior: > > As a first step we applied Vincent Guittot's "pack small tasks" [*] > patch to improve "race to idle" behavior: > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. > Afterwards, we decided to investigate different approach for power > governing: > > Use the number of sleeping CPUs (not the maximal per-CPU load) to > change frequency. We thereof depend on [*] to "pack" as many tasks to > CPU as possible and allow other to sleep. He packs only small tasks. And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. > Contrary, when all cores are heavily loaded, we decided to reduce > frequency by around 30%. With this approach user experience recution is > still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. > We have posted this "RFC" patch mainly for discussion, and I think it > fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread a bit and tell us what your views are? -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Viresh, First of all I'd like to apologize for a late response. Please find my comments below. > On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee > wrote: > > <> > > One of the problem of ondemand is that it considers the most busy > > cpu only while doesn't care how many cpu is in busy state at the > > moment. This may results in unnecessary power consumption, and it'll > > be critical for the system having limited power source. > > > > To get the best energy efficiency, LAB governor considers not only > > idle time but also the number of idle cpus. It primarily focuses on > > supplying adequate performance to users within the limited resource. > > It checks the number of idle cpus then controls the maximum > > frequency dynamically. And it also applies different frequency > > increasing level depends on that information. In simple terms the > > less the number of busy cpus, the better performance will be given. > > In addition, stable system's power consumption in the busy state > > can be achieved also with using LAB governor. This will help to > > manage and estimate power consumption in certain system. > > Hi Jonghwa, > > First of all, i should accept that i haven't got to the minute details > about your > patch until now but have done a broad review of it. > > There are many things that i am concerned about: > - I don't want an additional governor to be added to cpufreq unless > there is a very very strong reason for it. See what happened to > earlier attempts: > > https://lkml.org/lkml/2012/2/7/504 > > But it doesn't mean you can't get it in. :) > > - What the real logic behind your patchset: I haven't got it > completely with your > mails. So what you said is: > > - The lesser the number of busy cpus: you want to run at higher > freqs > - The more the number of busy cpus: you want to run at lower freqs > > But the basic idea i had about this stuff was: The more the number of > busy cpus, the more loaded the system is, otherwise scheduler wouldn't > have used so many cpus and so there is need to run at higher frequency > rather than a lower one. Which would save power in a sense.. Finish > work early and let most of the cpus enter idle state as early as > possible. But with your solution we would run at lower frequencies > and so these cpus will take longer to get into idle state again. This > will really kill lot of power. Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's "pack small tasks" [*] patch to improve "race to idle" behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to "pack" as many tasks to CPU as possible and allow other to sleep. On this basis we can increase (even overclock) frequency (when other CPUs sleep) to improve performance of the only running CPU. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). When system is "idle" - only small background tasks are running, the frequency is reduced to minimum. To sum up: Different approach (number of idle CPUs) is used by us to control ondemand governor. We also heavily depend on [*] patch set. > > Think about it. > > - In case you need some sort of support on this use case, why > replicate ondemand governor again by creating another governor. I > have had some hard time removing the amount of redundancy inside > governors and you are again going towards that direction. Modifying > ondemand governor for this response would be a better option. We have only posted the "RFC" so, we are open for suggestions. At cpufreq_governor.c file the dbs_check_cpu function is responsible for calculating the maximal load (among CPUs). I think that we could also count the number of sleeping CPUs (as the average of time accumulated data). Then we could pass this data to newly written function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and pointed to by gov_check_cpu). This would require change to dbs_check_cpu function and extending ONDEMAND governor by lab_check_cpu() function. > > - You haven't rebased of latest code from linux-next :) > We have posted this "RFC" patch mainly for discussion, and I think it fits its purpose :-). -- Best regards, Lukasz Majewski Samsung R Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Viresh, First of all I'd like to apologize for a late response. Please find my comments below. On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee jonghwa3@samsung.com wrote: Purpose One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. Hi Jonghwa, First of all, i should accept that i haven't got to the minute details about your patch until now but have done a broad review of it. There are many things that i am concerned about: - I don't want an additional governor to be added to cpufreq unless there is a very very strong reason for it. See what happened to earlier attempts: https://lkml.org/lkml/2012/2/7/504 But it doesn't mean you can't get it in. :) - What the real logic behind your patchset: I haven't got it completely with your mails. So what you said is: - The lesser the number of busy cpus: you want to run at higher freqs - The more the number of busy cpus: you want to run at lower freqs But the basic idea i had about this stuff was: The more the number of busy cpus, the more loaded the system is, otherwise scheduler wouldn't have used so many cpus and so there is need to run at higher frequency rather than a lower one. Which would save power in a sense.. Finish work early and let most of the cpus enter idle state as early as possible. But with your solution we would run at lower frequencies and so these cpus will take longer to get into idle state again. This will really kill lot of power. Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. On this basis we can increase (even overclock) frequency (when other CPUs sleep) to improve performance of the only running CPU. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). When system is idle - only small background tasks are running, the frequency is reduced to minimum. To sum up: Different approach (number of idle CPUs) is used by us to control ondemand governor. We also heavily depend on [*] patch set. Think about it. - In case you need some sort of support on this use case, why replicate ondemand governor again by creating another governor. I have had some hard time removing the amount of redundancy inside governors and you are again going towards that direction. Modifying ondemand governor for this response would be a better option. We have only posted the RFC so, we are open for suggestions. At cpufreq_governor.c file the dbs_check_cpu function is responsible for calculating the maximal load (among CPUs). I think that we could also count the number of sleeping CPUs (as the average of time accumulated data). Then we could pass this data to newly written function (e.g. lab_check_cpu) defined at cpufreq_ondemand.c (and pointed to by gov_check_cpu). This would require change to dbs_check_cpu function and extending ONDEMAND governor by lab_check_cpu() function. - You haven't rebased of latest code from linux-next :) We have posted this RFC patch mainly for discussion, and I think it fits its purpose :-). -- Best regards, Lukasz Majewski Samsung RD Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. We have posted this RFC patch mainly for discussion, and I think it fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread a bit and tell us what your views are? -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi, sorry for my late reply. I just want to add comment to assist Lukasz's. I put my comments below of Lukasz's. On 2013년 04월 09일 19:37, Lukasz Majewski wrote: Hi Viresh, First of all I'd like to apologize for a late response. Please find my comments below. On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee jonghwa3@samsung.com wrote: Purpose One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. Hi Jonghwa, First of all, i should accept that i haven't got to the minute details about your patch until now but have done a broad review of it. There are many things that i am concerned about: - I don't want an additional governor to be added to cpufreq unless there is a very very strong reason for it. See what happened to earlier attempts: https://lkml.org/lkml/2012/2/7/504 But it doesn't mean you can't get it in. :) - What the real logic behind your patchset: I haven't got it completely with your mails. So what you said is: - The lesser the number of busy cpus: you want to run at higher freqs - The more the number of busy cpus: you want to run at lower freqs But the basic idea i had about this stuff was: The more the number of busy cpus, the more loaded the system is, otherwise scheduler wouldn't have used so many cpus and so there is need to run at higher frequency rather than a lower one. Which would save power in a sense.. Finish work early and let most of the cpus enter idle state as early as possible. But with your solution we would run at lower frequencies and so these cpus will take longer to get into idle state again. This will really kill lot of power. Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. On this basis we can increase (even overclock) frequency (when other CPUs sleep) to improve performance of the only running CPU. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). When system is idle - only small background tasks are running, the frequency is reduced to minimum. To sum up: Different approach (number of idle CPUs) is used by us to control ondemand governor. We also heavily depend on [*] patch set. In additions, it is hard to say just letting high performance to busy system is the best way for reducing the power consumption. Yes, as like Viresh said, we can save the time of busy working, but that is not always perfect solution. If we push all CPUs to keep high performance as much as they can, the temperature will increase rapidly. In my test, on-demand governor reached the thermal limits frequently, while lab governor didn't. And the high temperature also effects the power consumption increasing. I got a rough result which has 10% differences in power consumption between 20% temperature differed conditions. Consumed power with maximum frequency at different temperature Temperature Power consumption(mWh) Loss(%) 65'C 53.89 Base 80'C 59.88 10 So, to reduce the power consumption, it looks we have to give our cares more to avoid the situation where system goes to high temperature. It is more meaningful for the mobile environment. In mobile devices, high temperature will also affect user's experience badly and can't be
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
Hi Viresh and Vincent, On 9 April 2013 16:07, Lukasz Majewski l.majew...@samsung.com wrote: On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee Our approach is a bit different than cpufreq_ondemand one. Ondemand takes the per CPU idle time, then on that basis calculates per cpu load. The next step is to choose the highest load and then use this value to properly scale frequency. On the other hand LAB tries to model different behavior: As a first step we applied Vincent Guittot's pack small tasks [*] patch to improve race to idle behavior: http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks Luckily he is part of my team :) http://www.linaro.org/linux-on-arm/meet-the-team/power-management BTW, he is using ondemand governor for all his work. Afterwards, we decided to investigate different approach for power governing: Use the number of sleeping CPUs (not the maximal per-CPU load) to change frequency. We thereof depend on [*] to pack as many tasks to CPU as possible and allow other to sleep. He packs only small tasks. What's about packing not only small tasks? I will investigate the possibility to aggressively pack (even with a cost of performance degradation) as many tasks as possible to a single CPU. It seems a good idea for a power consumption reduction. And if there are many small tasks we are packing, then load must be high and so ondemand gov will increase freq. This is of course true for packing all tasks to a single CPU. If we stay at the power consumption envelope, we can even overclock the frequency. But what if other - lets say 3 CPUs - are under heavy workload? Ondemand will switch frequency to maximum, and as Jonghwa pointed out this can cause dangerous temperature increase. Contrary, when all cores are heavily loaded, we decided to reduce frequency by around 30%. With this approach user experience recution is still acceptable (with much less power consumption). Don't know.. running many cpus at lower freq for long duration will probably take more power than running them at high freq for short duration and making system idle again. We have posted this RFC patch mainly for discussion, and I think it fits its purpose :-). Yes, no issues with your RFC idea.. its perfect.. @Vincent: Can you please follow this thread a bit and tell us what your views are? -- viresh -- Best regards, Lukasz Majewski Samsung RD Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee wrote: > <> > One of the problem of ondemand is that it considers the most busy > cpu only while doesn't care how many cpu is in busy state at the > moment. This may results in unnecessary power consumption, and it'll > be critical for the system having limited power source. > > To get the best energy efficiency, LAB governor considers not only > idle time but also the number of idle cpus. It primarily focuses on > supplying adequate performance to users within the limited resource. > It checks the number of idle cpus then controls the maximum frequency > dynamically. And it also applies different frequency increasing level > depends on that information. In simple terms the less the number of > busy cpus, the better performance will be given. > In addition, stable system's power consumption in the busy state can > be achieved also with using LAB governor. This will help to manage and > estimate power consumption in certain system. Hi Jonghwa, First of all, i should accept that i haven't got to the minute details about your patch until now but have done a broad review of it. There are many things that i am concerned about: - I don't want an additional governor to be added to cpufreq unless there is a very very strong reason for it. See what happened to earlier attempts: https://lkml.org/lkml/2012/2/7/504 But it doesn't mean you can't get it in. :) - What the real logic behind your patchset: I haven't got it completely with your mails. So what you said is: - The lesser the number of busy cpus: you want to run at higher freqs - The more the number of busy cpus: you want to run at lower freqs But the basic idea i had about this stuff was: The more the number of busy cpus, the more loaded the system is, otherwise scheduler wouldn't have used so many cpus and so there is need to run at higher frequency rather than a lower one. Which would save power in a sense.. Finish work early and let most of the cpus enter idle state as early as possible. But with your solution we would run at lower frequencies and so these cpus will take longer to get into idle state again. This will really kill lot of power. Think about it. - In case you need some sort of support on this use case, why replicate ondemand governor again by creating another governor. I have had some hard time removing the amount of redundancy inside governors and you are again going towards that direction. Modifying ondemand governor for this response would be a better option. - You haven't rebased of latest code from linux-next :) -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
This patchset adds new cpufreq governor named LAB(Legacy Application Boost). Basically, this governor is based on ondemand governor. ** Introduce LAB (Legacy Application Boost) governor <> One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. <> - Count the idle cpus : Number of idle cpus are determined by historical result. It stores cpu idle usage derived by dividing cpu idling time with a period at every time when cpufreq timer is called. And it calculates average of idle usage from the most recent stored data. It uses cpu state usage information from per-cpu cpuidle_device. However, because that cpu state usage information is updated only when cpu exits the state, so the information may differ with the real one while cpu is in idle at checking time. To detect whether cpu is idling, it uses timestamps of cpuidle devices. It needs to re-calculate idle state usage for the following cases. The possible 3 cases which cpu is in idle at checking time. (Shaded section represents staying in idle) 1) During the last period, idle state would be broken more than once. ||__ _|___ | ||//||/|//| t1 t2@ t4 2) During the last period, there was no idle state entered, current idle is the first one. || |___ | ||||__| t1 t2@ t4 3) During the last whole period, the core is in idle state. | __|_| | |_|//|/|__| t1 t2@ t4 (@ : Current checking point) After calculating idle state usage, it decides whether cpu was idle in the last period with the idle threshold. If average of idle usage is bigger than threshold, then it treats the cpu as a idle cpu. For the test, I set default threshold value to 90. - Limitation of maximum frequency : With information got from counting the idle cpus phase, it sets maximum frequency for the next period. By default, it limits the current policy's maximum frequency by from 0 to 35% depends on the number of idle cpus. - Setting next frequency LAB governor changes the frequency step by step like the conservative governor. However, in LAB governor, the step is changing dynamically depending on how many cpus are in idle. next_freq = current_freq + current_idle_cpus * increasing_step; <> The prototype of this feature was developed as a cpuidle driver, so it it uses cpuidle framework information temporarily. I'd like to use the data of per-cpu tick_sched variable, it has all information that I want exactly but it can't be accessed from cpufreq side. I tested this patch on pegasus qaud board. Any comments are welcomed. Jonghwa Lee (2): cpuidle: Add idle enter/exit time stamp for notifying current idle state. cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost). drivers/cpufreq/Kconfig| 26 ++ drivers/cpufreq/Makefile |1 + drivers/cpufreq/cpufreq_governor.h | 14 + drivers/cpufreq/cpufreq_lab.c | 553 drivers/cpuidle/cpuidle.c |8 +- include/linux/cpufreq.h|3 + include/linux/cpuidle.h|4 + 7 files changed, 605 insertions(+), 4 deletions(-) create mode 100644 drivers/cpufreq/cpufreq_lab.c -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
This patchset adds new cpufreq governor named LAB(Legacy Application Boost). Basically, this governor is based on ondemand governor. ** Introduce LAB (Legacy Application Boost) governor Purpose One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. Algorithm - Count the idle cpus : Number of idle cpus are determined by historical result. It stores cpu idle usage derived by dividing cpu idling time with a period at every time when cpufreq timer is called. And it calculates average of idle usage from the most recent stored data. It uses cpu state usage information from per-cpu cpuidle_device. However, because that cpu state usage information is updated only when cpu exits the state, so the information may differ with the real one while cpu is in idle at checking time. To detect whether cpu is idling, it uses timestamps of cpuidle devices. It needs to re-calculate idle state usage for the following cases. The possible 3 cases which cpu is in idle at checking time. (Shaded section represents staying in idle) 1) During the last period, idle state would be broken more than once. ||__ _|___ | ||//||/|//| t1 t2@ t4 2) During the last period, there was no idle state entered, current idle is the first one. || |___ | ||||__| t1 t2@ t4 3) During the last whole period, the core is in idle state. | __|_| | |_|//|/|__| t1 t2@ t4 (@ : Current checking point) After calculating idle state usage, it decides whether cpu was idle in the last period with the idle threshold. If average of idle usage is bigger than threshold, then it treats the cpu as a idle cpu. For the test, I set default threshold value to 90. - Limitation of maximum frequency : With information got from counting the idle cpus phase, it sets maximum frequency for the next period. By default, it limits the current policy's maximum frequency by from 0 to 35% depends on the number of idle cpus. - Setting next frequency LAB governor changes the frequency step by step like the conservative governor. However, in LAB governor, the step is changing dynamically depending on how many cpus are in idle. next_freq = current_freq + current_idle_cpus * increasing_step; To do The prototype of this feature was developed as a cpuidle driver, so it it uses cpuidle framework information temporarily. I'd like to use the data of per-cpu tick_sched variable, it has all information that I want exactly but it can't be accessed from cpufreq side. I tested this patch on pegasus qaud board. Any comments are welcomed. Jonghwa Lee (2): cpuidle: Add idle enter/exit time stamp for notifying current idle state. cpufreq: Introduce new cpufreq governor, LAB(Legacy Application Boost). drivers/cpufreq/Kconfig| 26 ++ drivers/cpufreq/Makefile |1 + drivers/cpufreq/cpufreq_governor.h | 14 + drivers/cpufreq/cpufreq_lab.c | 553 drivers/cpuidle/cpuidle.c |8 +- include/linux/cpufreq.h|3 + include/linux/cpuidle.h|4 + 7 files changed, 605 insertions(+), 4 deletions(-) create mode 100644 drivers/cpufreq/cpufreq_lab.c -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.
On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee jonghwa3@samsung.com wrote: Purpose One of the problem of ondemand is that it considers the most busy cpu only while doesn't care how many cpu is in busy state at the moment. This may results in unnecessary power consumption, and it'll be critical for the system having limited power source. To get the best energy efficiency, LAB governor considers not only idle time but also the number of idle cpus. It primarily focuses on supplying adequate performance to users within the limited resource. It checks the number of idle cpus then controls the maximum frequency dynamically. And it also applies different frequency increasing level depends on that information. In simple terms the less the number of busy cpus, the better performance will be given. In addition, stable system's power consumption in the busy state can be achieved also with using LAB governor. This will help to manage and estimate power consumption in certain system. Hi Jonghwa, First of all, i should accept that i haven't got to the minute details about your patch until now but have done a broad review of it. There are many things that i am concerned about: - I don't want an additional governor to be added to cpufreq unless there is a very very strong reason for it. See what happened to earlier attempts: https://lkml.org/lkml/2012/2/7/504 But it doesn't mean you can't get it in. :) - What the real logic behind your patchset: I haven't got it completely with your mails. So what you said is: - The lesser the number of busy cpus: you want to run at higher freqs - The more the number of busy cpus: you want to run at lower freqs But the basic idea i had about this stuff was: The more the number of busy cpus, the more loaded the system is, otherwise scheduler wouldn't have used so many cpus and so there is need to run at higher frequency rather than a lower one. Which would save power in a sense.. Finish work early and let most of the cpus enter idle state as early as possible. But with your solution we would run at lower frequencies and so these cpus will take longer to get into idle state again. This will really kill lot of power. Think about it. - In case you need some sort of support on this use case, why replicate ondemand governor again by creating another governor. I have had some hard time removing the amount of redundancy inside governors and you are again going towards that direction. Modifying ondemand governor for this response would be a better option. - You haven't rebased of latest code from linux-next :) -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/