Re: [PATCH] cpuidle: governor: menu: move repeated correction factor check to init
On 10 April 2014 16:43, Chander Kashyap wrote: > In menu_select function we check for correction factor every time. > If it is zero we are initializing to unity. Hence move it to init function > and initialise by unity, hence avoid repeated comparisons. > > Signed-off-by: Chander Kashyap > --- > drivers/cpuidle/governors/menu.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/cpuidle/governors/menu.c > b/drivers/cpuidle/governors/menu.c > index cf7f2f0..048f6d9 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -315,13 +315,6 @@ static int menu_select(struct cpuidle_driver *drv, > struct cpuidle_device *dev) > multiplier = performance_multiplier(); > > /* > -* if the correction factor is 0 (eg first time init or cpu hotplug > -* etc), we actually want to start out with a unity factor. > -*/ > - if (data->correction_factor[data->bucket] == 0) > - data->correction_factor[data->bucket] = RESOLUTION * DECAY; > - > - /* > * Force the result of multiplication to be 64 bits even if both > * operands are 32 bits. > * Make sure to round up for half microseconds. > @@ -453,9 +446,17 @@ static int menu_enable_device(struct cpuidle_driver *drv, > struct cpuidle_device *dev) > { > struct menu_device *data = _cpu(menu_devices, dev->cpu); > + int i; > > memset(data, 0, sizeof(struct menu_device)); > > + /* > +* if the correction factor is 0 (eg first time init or cpu hotplug > +* etc), we actually want to start out with a unity factor. > +*/ > + for(i = 0; i < BUCKETS; i++) > + data->correction_factor[i] = RESOLUTION * DECAY; > + > return 0; > } > > -- > 1.7.9.5 > Reviewed-by: Tuukka Tikkanen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpuidle: governor: menu: move repeated correction factor check to init
On 10 April 2014 16:43, Chander Kashyap chander.kash...@linaro.org wrote: In menu_select function we check for correction factor every time. If it is zero we are initializing to unity. Hence move it to init function and initialise by unity, hence avoid repeated comparisons. Signed-off-by: Chander Kashyap chander.kash...@linaro.org --- drivers/cpuidle/governors/menu.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index cf7f2f0..048f6d9 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -315,13 +315,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) multiplier = performance_multiplier(); /* -* if the correction factor is 0 (eg first time init or cpu hotplug -* etc), we actually want to start out with a unity factor. -*/ - if (data-correction_factor[data-bucket] == 0) - data-correction_factor[data-bucket] = RESOLUTION * DECAY; - - /* * Force the result of multiplication to be 64 bits even if both * operands are 32 bits. * Make sure to round up for half microseconds. @@ -453,9 +446,17 @@ static int menu_enable_device(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct menu_device *data = per_cpu(menu_devices, dev-cpu); + int i; memset(data, 0, sizeof(struct menu_device)); + /* +* if the correction factor is 0 (eg first time init or cpu hotplug +* etc), we actually want to start out with a unity factor. +*/ + for(i = 0; i BUCKETS; i++) + data-correction_factor[i] = RESOLUTION * DECAY; + return 0; } -- 1.7.9.5 Reviewed-by: Tuukka Tikkanen tuukka.tikka...@linaro.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
Hi, On 6 March 2014 09:41, Len Brown wrote: > > On Mon, Feb 24, 2014 at 1:29 AM, Tuukka Tikkanen > wrote: > > Sometimes (fairly often) when the cpuidle menu governor is making a decision > > about idle state to enter the next timer for the cpu appears to expire in > > the past. The menu governor expects the expiry to always be in the future > > and in fact stores the time delta in an unsigned variable. However, when > > the expiry is in the past, the value returned by tick_nohz_get_sleep_length > > can be negative. This patch prevents using negative values, instead making > > the governor return immediately similar to having latency requirement set > > to 0. > > > > Note: As with latency == 0, the return value is 0 with no check to see if > > the state 0 has been disabled or not. > > > > Signed-off-by: Tuukka Tikkanen > > --- > > drivers/cpuidle/governors/menu.c | 10 +- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpuidle/governors/menu.c > > b/drivers/cpuidle/governors/menu.c > > index 71b5232..c414468 100644 > > --- a/drivers/cpuidle/governors/menu.c > > +++ b/drivers/cpuidle/governors/menu.c > > @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, > > struct cpuidle_device *dev) > > if (unlikely(latency_req == 0)) > > return 0; > > > > - /* determine the expected residency time, round up */ > > + /* > > +* Determine the expected residency time. If the time is negative, > > +* a timer interrupt has probably just expired after disabling > > +* interrupts. Return as quickly as possible in the most shallow > > +* state possible. tv_nsec is always positive, so only check the > > +* seconds. > > +*/ > > t = ktime_to_timespec(tick_nohz_get_sleep_length()); > > + if (t.tv_sec < 0) > > + return 0; > > data->next_timer_us = > > t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; > > > > Are there special conditions that are necessary to provoke a negative > return value? > I've traced this code on several systems, and never seen a negative > return value. > By changing the if statement to if (WARN_ONCE(t.tv_sec < 0, "Next timer in the past. tv_sec: %ld, tv_usec: %lu", t.tv_sec, t.tv_usec)) return 0; I eventually got [ 3092.355144] WARNING: CPU: 4 PID: 0 at drivers/cpuidle/governors/menu.c:315 menu_select+0x410/0x420() [ 3092.355145] Next timer in the past. tv_sec: -1, tv_usec: 99645 ... [ 3092.355169] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.14.0-rc3+ #3 so the problem is still lurking there. However, as you can see from the timestamp, either something has changed recently and made it much much less frequent or I did not use the use case / hardware that triggers it more often. It has been quite a long time since I discovered and fixed this in my private trees, so I have forgotten the best way to reproduce this, sorry. In any case, existence is still confirmed. This was on a i7-3770K with 64 bit kernel, running make clean && make -j8 && make clean && make for kernel source tree. If I happen to remember the use case + hardware combination that triggers this rapidly, I'll let you know. That said, I agree dropping this patch and fixing the called function is a better idea. Tuukka > > However... > I do see values up to 300.2 seconds, and those large values seem to decay > at the rate of real-time so that after 5 minutes they are small, and then > jump back up to 300 seconds. > > Some folks at Oracle debugged it down to use of NEXT_TIMER_MAX_DELTA > when there is _no_ timer currently pending on that CPU. It seems this is > easier > to observe, the more CPUs a system has -- though I've been able to reproduce > it on a system as small as a single-package 8-cpu systems. > > One proposed way to address this is to cap large values at 1 second. > However, that will not recognize that for the period when the large > value decays to under 1 second, all of those are fiction. > > Also, if we could identify the case where there is no future timer, > it seems that re-using dev->last_residency would probably be > a more useful guess than pretending we'll have a timer expire in 1 second. > > thanks, > Len Brown, Intel Open Source Technology Cente -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
Hi, On 6 March 2014 09:41, Len Brown l...@kernel.org wrote: On Mon, Feb 24, 2014 at 1:29 AM, Tuukka Tikkanen tuukka.tikka...@linaro.org wrote: Sometimes (fairly often) when the cpuidle menu governor is making a decision about idle state to enter the next timer for the cpu appears to expire in the past. The menu governor expects the expiry to always be in the future and in fact stores the time delta in an unsigned variable. However, when the expiry is in the past, the value returned by tick_nohz_get_sleep_length can be negative. This patch prevents using negative values, instead making the governor return immediately similar to having latency requirement set to 0. Note: As with latency == 0, the return value is 0 with no check to see if the state 0 has been disabled or not. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 71b5232..c414468 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (unlikely(latency_req == 0)) return 0; - /* determine the expected residency time, round up */ + /* +* Determine the expected residency time. If the time is negative, +* a timer interrupt has probably just expired after disabling +* interrupts. Return as quickly as possible in the most shallow +* state possible. tv_nsec is always positive, so only check the +* seconds. +*/ t = ktime_to_timespec(tick_nohz_get_sleep_length()); + if (t.tv_sec 0) + return 0; data-next_timer_us = t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; Are there special conditions that are necessary to provoke a negative return value? I've traced this code on several systems, and never seen a negative return value. By changing the if statement to if (WARN_ONCE(t.tv_sec 0, Next timer in the past. tv_sec: %ld, tv_usec: %lu, t.tv_sec, t.tv_usec)) return 0; I eventually got [ 3092.355144] WARNING: CPU: 4 PID: 0 at drivers/cpuidle/governors/menu.c:315 menu_select+0x410/0x420() [ 3092.355145] Next timer in the past. tv_sec: -1, tv_usec: 99645 ... [ 3092.355169] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.14.0-rc3+ #3 so the problem is still lurking there. However, as you can see from the timestamp, either something has changed recently and made it much much less frequent or I did not use the use case / hardware that triggers it more often. It has been quite a long time since I discovered and fixed this in my private trees, so I have forgotten the best way to reproduce this, sorry. In any case, existence is still confirmed. This was on a i7-3770K with 64 bit kernel, running make clean make -j8 make clean make for kernel source tree. If I happen to remember the use case + hardware combination that triggers this rapidly, I'll let you know. That said, I agree dropping this patch and fixing the called function is a better idea. Tuukka However... I do see values up to 300.2 seconds, and those large values seem to decay at the rate of real-time so that after 5 minutes they are small, and then jump back up to 300 seconds. Some folks at Oracle debugged it down to use of NEXT_TIMER_MAX_DELTA when there is _no_ timer currently pending on that CPU. It seems this is easier to observe, the more CPUs a system has -- though I've been able to reproduce it on a system as small as a single-package 8-cpu systems. One proposed way to address this is to cap large values at 1 second. However, that will not recognize that for the period when the large value decays to under 1 second, all of those are fiction. Also, if we could identify the case where there is no future timer, it seems that re-using dev-last_residency would probably be a more useful guess than pretending we'll have a timer expire in 1 second. thanks, Len Brown, Intel Open Source Technology Cente -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/7] Cpuidle: poll state can measure residency
On 25 February 2014 12:56, Daniel Lezcano wrote: > On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote: >> >> For some platforms, a poll state is inserted in the cpuidle driver states. >> The flags for the state do not indicate that timekeeping is not affected. >> As the state does not do anything apart from calling cpu_relax(), the >> times returned by ktime_get should remain valid. Add the missing flag. > > > Yes, but it is done with the interrupt enabled, so the interrupt handler + > softirq handler times will be accounted in the residency time. > > I am not sure adding this flag is good. Granted, with a slow CPU and very complex interrupt handing there might be some extra time added. So let's consider what might happen: Menu: Not having this flag assumes the residency matches time until next timer expiry. Any measured amount of time less than that indicates that the residency was shorter. We might not know exactly how much, but we are closer to the truth and everything is fine. So that leaves reporting time that exceeds what was established as the time until next timer expiry. That too is OK (assuming another patch in the series) as the value is limited to the time until next timer expiry. In short, we get the same or better outcome and have no issues. Ladder: Last residency is assumed to be last_state->threshold.promotion_time + 1 if the flag is not set. Obviously we won't be considering demotion in that case and will go into the promotion path. Any measured value below that is again closer to the truth and any value at or above that is simply going to result in the same outcome. The only remaining issue is any hypothetical governor not currently in the kernel tree. It would have to depend on accurate measurements while being able to work with states that do not have the valid time flag. I'm not sure if I can picture a scenario like that. Tuukka > >> Signed-off-by: Tuukka Tikkanen >> --- >> drivers/cpuidle/driver.c |2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c >> index 06dbe7c..136d6a2 100644 >> --- a/drivers/cpuidle/driver.c >> +++ b/drivers/cpuidle/driver.c >> @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv) >> state->exit_latency = 0; >> state->target_residency = 0; >> state->power_usage = -1; >> - state->flags = 0; >> + state->flags = CPUIDLE_FLAG_TIME_VALID; >> state->enter = poll_idle; >> state->disabled = false; >> } >> > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/7] Cpuidle: poll state can measure residency
On 25 February 2014 12:56, Daniel Lezcano daniel.lezc...@linaro.org wrote: On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote: For some platforms, a poll state is inserted in the cpuidle driver states. The flags for the state do not indicate that timekeeping is not affected. As the state does not do anything apart from calling cpu_relax(), the times returned by ktime_get should remain valid. Add the missing flag. Yes, but it is done with the interrupt enabled, so the interrupt handler + softirq handler times will be accounted in the residency time. I am not sure adding this flag is good. Granted, with a slow CPU and very complex interrupt handing there might be some extra time added. So let's consider what might happen: Menu: Not having this flag assumes the residency matches time until next timer expiry. Any measured amount of time less than that indicates that the residency was shorter. We might not know exactly how much, but we are closer to the truth and everything is fine. So that leaves reporting time that exceeds what was established as the time until next timer expiry. That too is OK (assuming another patch in the series) as the value is limited to the time until next timer expiry. In short, we get the same or better outcome and have no issues. Ladder: Last residency is assumed to be last_state-threshold.promotion_time + 1 if the flag is not set. Obviously we won't be considering demotion in that case and will go into the promotion path. Any measured value below that is again closer to the truth and any value at or above that is simply going to result in the same outcome. The only remaining issue is any hypothetical governor not currently in the kernel tree. It would have to depend on accurate measurements while being able to work with states that do not have the valid time flag. I'm not sure if I can picture a scenario like that. Tuukka Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/driver.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 06dbe7c..136d6a2 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv) state-exit_latency = 0; state-target_residency = 0; state-power_usage = -1; - state-flags = 0; + state-flags = CPUIDLE_FLAG_TIME_VALID; state-enter = poll_idle; state-disabled = false; } -- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog -- To unsubscribe from this list: send the line unsubscribe linux-pm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] Cpuidle: poll state can measure residency
For some platforms, a poll state is inserted in the cpuidle driver states. The flags for the state do not indicate that timekeeping is not affected. As the state does not do anything apart from calling cpu_relax(), the times returned by ktime_get should remain valid. Add the missing flag. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/driver.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 06dbe7c..136d6a2 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv) state->exit_latency = 0; state->target_residency = 0; state->power_usage = -1; - state->flags = 0; + state->flags = CPUIDLE_FLAG_TIME_VALID; state->enter = poll_idle; state->disabled = false; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain
The menu governor uses coefficients as one method of actual idle period length estimation. The coefficients are, as detailed below, multipliers giving expected idle period length from time until next timer expiry. The multipliers are supposed to have domain of (0..1]. The coefficients are fractions where only the numerators are stored and denominators are a shared constant RESOLUTION*DECAY. Since the value of the coefficient should always be greater than 0 and less than or equal to 1, the numerator must have a value greater than 0 and less than or equal to RESOLUTION*DECAY. If the coefficients are updated with measured idle durations exceeding timer length, the multiplier may reach values exceeding unity (i.e. the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that the multipliers are updated with durations capped to timer length. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 115386a..f0995dd 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (measured_us > target->exit_latency) measured_us -= target->exit_latency; + /* Make sure our coefficients do not exceed unity */ + if (measured_us > data->next_timer_us) + measured_us = data->next_timer_us; /* Update our correction ratio */ new_factor = data->correction_factor[data->bucket]; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/7] Cpuidle: Minor fixes
This set of patches makes some minor changes to menu governor and the poll idle state. Patch 1 is simply a rename of a variable to make the name better represent the contained information. Patch 2 fixes calculating actual residency in cases where the entered state is different from the state decided by the menu governor. Patch 3 makes sure the menu governor timer coefficients are not updated with values that might cause a coefficient to reach a value greater than unity. Patch 4 fixes calculation actual residency in cases where the entered state does not support measuring residency. In such cases the residency time is taken from time remaining until next timer expiry. The timer is expected to go off at the start of exit latency, not after it. Therefore the exit latency must not be substracted from the assumed value. Patch 5 moves the performance multiplier based comparison out of the state selection loop by changing it into a latency requirement. This allows using a generic state selection function accepting only (duration, latency) tuple as input. The change is possible by noting that performance multiplier is used only to check for a minimum predicted idle duration to exit latency ratio. As predicted idle duration is a constant for the loop, the maximum allowed latency can be calculated outside of the loop. Patch 6 prevents using negative values from tick_nohz_get_sleep_length() in the menu governor. If unchecked, the negative values are used as huge unsigned values. Negative values occur fairly often (e.g. on x86_64 I've seen this happen several times per minute) on a busy system, allowing the deepest state to win the selection while the shallowest should be picked. Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any platfrom where cpu_relax() would break ktime_get() and in fact poll_idle uses ktime_get itself. (Note: poll_idle updates dev->last_residency for some reason. Does it ever get called without going through cpuidle_enter_state, which will overwrite the value? Even if some state redirects to this state, the call will eventually return to the framework. The redundant time measurement could be removed, unless there is some obscure way of getting called on some platform that I am unable to figure out.) Tuukka Tikkanen (7): Cpuidle: rename expected_us to next_timer_us in menu governor Cpuidle: Use actual state latency in menu governor Cpuidle: Ensure menu coefficients stay within domain Cpuidle: Do not substract exit latency from assumed sleep length Cpuidle: Move perf multiplier calculation out of the selection loop Cpuidle: Deal with timer expiring in the past Cpuidle: poll state can measure residency drivers/cpuidle/driver.c |2 +- drivers/cpuidle/governors/menu.c | 85 -- 2 files changed, 54 insertions(+), 33 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop
The menu governor performance multiplier defines a minimum predicted idle duration to latency ratio. Instead of checking this separately in every iteration of the state selection loop, adjust the overall latency restriction for the whole loop if this restriction is tighter than what is set by the QoS subsystem. The original test s->exit_latency * multiplier > data->predicted_us becomes s->exit_latency > data->predicted_us / multiplier by dividing both sides of the comparison by "multiplier". While division is likely to be several times slower than multiplication, the minor performance hit allows making a generic sleep state selection function based on (sleep duration, maximum latency) tuple. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b347c10..71b5232 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -288,7 +288,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) struct menu_device *data = &__get_cpu_var(menu_devices); int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); int i; - int multiplier; + unsigned int interactivity_req; struct timespec t; if (data->needs_update) { @@ -310,8 +310,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->bucket = which_bucket(data->next_timer_us); - multiplier = performance_multiplier(); - /* * if the correction factor is 0 (eg first time init or cpu hotplug * etc), we actually want to start out with a unity factor. @@ -331,6 +329,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) get_typical_interval(data); /* +* Performance multiplier defines a minimum predicted idle +* duration / latency ratio. Adjust the latency limit if +* necessary. +*/ + interactivity_req = data->predicted_us / performance_multiplier(); + if (latency_req > interactivity_req) + latency_req = interactivity_req; + + /* * We want to default to C1 (hlt), not to busy polling * unless the timer is happening really really soon. */ @@ -353,8 +360,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) continue; if (s->exit_latency > latency_req) continue; - if (s->exit_latency * multiplier > data->predicted_us) - continue; data->last_state_idx = i; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/7] Cpuidle: Minor fixes
This set of patches makes some minor changes to menu governor and the poll idle state. Patch 1 is simply a rename of a variable to make the name better represent the contained information. Patch 2 fixes calculating actual residency in cases where the entered state is different from the state decided by the menu governor. Patch 3 makes sure the menu governor timer coefficients are not updated with values that might cause a coefficient to reach a value greater than unity. Patch 4 fixes calculation actual residency in cases where the entered state does not support measuring residency. In such cases the residency time is taken from time remaining until next timer expiry. The timer is expected to go off at the start of exit latency, not after it. Therefore the exit latency must not be substracted from the assumed value. Patch 5 moves the performance multiplier based comparison out of the state selection loop by changing it into a latency requirement. This allows using a generic state selection function accepting only (duration, latency) tuple as input. The change is possible by noting that performance multiplier is used only to check for a minimum predicted idle duration to exit latency ratio. As predicted idle duration is a constant for the loop, the maximum allowed latency can be calculated outside of the loop. Patch 6 prevents using negative values from tick_nohz_get_sleep_length() in the menu governor. If unchecked, the negative values are used as huge unsigned values. Negative values occur fairly often (e.g. on x86_64 I've seen this happen several times per minute) on a busy system, allowing the deepest state to win the selection while the shallowest should be picked. Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any platfrom where cpu_relax() would break ktime_get() and in fact poll_idle uses ktime_get itself. (Note: poll_idle updates dev-last_residency for some reason. Does it ever get called without going through cpuidle_enter_state, which will overwrite the value? Even if some state redirects to this state, the call will eventually return to the framework. The redundant time measurement could be removed, unless there is some obscure way of getting called on some platform that I am unable to figure out.) Tuukka Tikkanen (7): Cpuidle: rename expected_us to next_timer_us in menu governor Cpuidle: Use actual state latency in menu governor Cpuidle: Ensure menu coefficients stay within domain Cpuidle: Do not substract exit latency from assumed sleep length Cpuidle: Move perf multiplier calculation out of the selection loop Cpuidle: Deal with timer expiring in the past Cpuidle: poll state can measure residency drivers/cpuidle/driver.c |2 +- drivers/cpuidle/governors/menu.c | 85 -- 2 files changed, 54 insertions(+), 33 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop
The menu governor performance multiplier defines a minimum predicted idle duration to latency ratio. Instead of checking this separately in every iteration of the state selection loop, adjust the overall latency restriction for the whole loop if this restriction is tighter than what is set by the QoS subsystem. The original test s-exit_latency * multiplier data-predicted_us becomes s-exit_latency data-predicted_us / multiplier by dividing both sides of the comparison by multiplier. While division is likely to be several times slower than multiplication, the minor performance hit allows making a generic sleep state selection function based on (sleep duration, maximum latency) tuple. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b347c10..71b5232 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -288,7 +288,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) struct menu_device *data = __get_cpu_var(menu_devices); int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); int i; - int multiplier; + unsigned int interactivity_req; struct timespec t; if (data-needs_update) { @@ -310,8 +310,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data-bucket = which_bucket(data-next_timer_us); - multiplier = performance_multiplier(); - /* * if the correction factor is 0 (eg first time init or cpu hotplug * etc), we actually want to start out with a unity factor. @@ -331,6 +329,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) get_typical_interval(data); /* +* Performance multiplier defines a minimum predicted idle +* duration / latency ratio. Adjust the latency limit if +* necessary. +*/ + interactivity_req = data-predicted_us / performance_multiplier(); + if (latency_req interactivity_req) + latency_req = interactivity_req; + + /* * We want to default to C1 (hlt), not to busy polling * unless the timer is happening really really soon. */ @@ -353,8 +360,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) continue; if (s-exit_latency latency_req) continue; - if (s-exit_latency * multiplier data-predicted_us) - continue; data-last_state_idx = i; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain
The menu governor uses coefficients as one method of actual idle period length estimation. The coefficients are, as detailed below, multipliers giving expected idle period length from time until next timer expiry. The multipliers are supposed to have domain of (0..1]. The coefficients are fractions where only the numerators are stored and denominators are a shared constant RESOLUTION*DECAY. Since the value of the coefficient should always be greater than 0 and less than or equal to 1, the numerator must have a value greater than 0 and less than or equal to RESOLUTION*DECAY. If the coefficients are updated with measured idle durations exceeding timer length, the multiplier may reach values exceeding unity (i.e. the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that the multipliers are updated with durations capped to timer length. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 115386a..f0995dd 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (measured_us target-exit_latency) measured_us -= target-exit_latency; + /* Make sure our coefficients do not exceed unity */ + if (measured_us data-next_timer_us) + measured_us = data-next_timer_us; /* Update our correction ratio */ new_factor = data-correction_factor[data-bucket]; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] Cpuidle: poll state can measure residency
For some platforms, a poll state is inserted in the cpuidle driver states. The flags for the state do not indicate that timekeeping is not affected. As the state does not do anything apart from calling cpu_relax(), the times returned by ktime_get should remain valid. Add the missing flag. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/driver.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 06dbe7c..136d6a2 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv) state-exit_latency = 0; state-target_residency = 0; state-power_usage = -1; - state-flags = 0; + state-flags = CPUIDLE_FLAG_TIME_VALID; state-enter = poll_idle; state-disabled = false; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/7] Cpuidle: Use actual state latency in menu governor
Currently menu governor records the exit latency of the state it has chosen for the idle period. The stored latency value is then later used to calculate the actual length of the idle period. This value may however be incorrect, as the entered state may not be the one chosen by the governor. The entered state information is available, so we can use that to obtain the real exit latency. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index e9a2a27..115386a 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -124,7 +124,6 @@ struct menu_device { unsigned intnext_timer_us; unsigned intpredicted_us; - unsigned intexit_us; unsigned intbucket; unsigned intcorrection_factor[BUCKETS]; unsigned intintervals[INTERVALS]; @@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) } data->last_state_idx = 0; - data->exit_us = 0; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) @@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) continue; data->last_state_idx = i; - data->exit_us = s->exit_latency; } return data->last_state_idx; @@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * We correct for the exit latency; we are assuming here that the * exit latency happens after the event that we're interested in. */ - if (measured_us > data->exit_us) - measured_us -= data->exit_us; + if (measured_us > target->exit_latency) + measured_us -= target->exit_latency; /* Update our correction ratio */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor
The field expected_us is used to store the time remaining until next timer expiry. The name is inaccurate, as we really do not expect all wakeups to be caused by timers. In addition, another field with a very similar name (predicted_us) is used to store the predicted time remaining until any wakeup source being active. This patch renames expected_us to next_timer_us in order to better reflect the contained information. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index cf7f2f0..e9a2a27 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -122,7 +122,7 @@ struct menu_device { int last_state_idx; int needs_update; - unsigned intexpected_us; + unsigned intnext_timer_us; unsigned intpredicted_us; unsigned intexit_us; unsigned intbucket; @@ -257,7 +257,7 @@ again: stddev = int_sqrt(stddev); if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) || stddev <= 20) { - if (data->expected_us > avg) + if (data->next_timer_us > avg) data->predicted_us = avg; return; } @@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* determine the expected residency time, round up */ t = ktime_to_timespec(tick_nohz_get_sleep_length()); - data->expected_us = + data->next_timer_us = t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; - data->bucket = which_bucket(data->expected_us); + data->bucket = which_bucket(data->next_timer_us); multiplier = performance_multiplier(); @@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) * operands are 32 bits. * Make sure to round up for half microseconds. */ - data->predicted_us = div_round64((uint64_t)data->expected_us * + data->predicted_us = div_round64((uint64_t)data->next_timer_us * data->correction_factor[data->bucket], RESOLUTION * DECAY); @@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) * We want to default to C1 (hlt), not to busy polling * unless the timer is happening really really soon. */ - if (data->expected_us > 5 && + if (data->next_timer_us > 5 && !drv->states[CPUIDLE_DRIVER_STATE_START].disabled && dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0) data->last_state_idx = CPUIDLE_DRIVER_STATE_START; @@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * for the whole expected time. */ if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) - last_idle_us = data->expected_us; + last_idle_us = data->next_timer_us; measured_us = last_idle_us; @@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) new_factor = data->correction_factor[data->bucket]; new_factor -= new_factor / DECAY; - if (data->expected_us > 0 && measured_us < MAX_INTERESTING) - new_factor += RESOLUTION * measured_us / data->expected_us; + if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING) + new_factor += RESOLUTION * measured_us / data->next_timer_us; else /* * we were idle so long that we count it as a perfect -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length
The menu governor statistics update function tries to determine the amount of time between entry to low power state and the occurrence of the wakeup event. However, the time measured by the framework includes exit latency on top of the desired value. This exit latency is substracted from the measured value to obtain the desired value. When measured value is not available, the menu governor assumes the wakeup was caused by the timer and the time is equal to remaining timer length. No exit latency should be substracted from this value. This patch prevents the erroneous substraction and clarifies the associated comment. It also removes one intermediate variable that serves no purpose. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 44 ++ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f0995dd..b347c10 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -387,32 +387,40 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct menu_device *data = &__get_cpu_var(menu_devices); int last_idx = data->last_state_idx; - unsigned int last_idle_us = cpuidle_get_last_residency(dev); struct cpuidle_state *target = >states[last_idx]; unsigned int measured_us; unsigned int new_factor; /* -* Ugh, this idle state doesn't support residency measurements, so we -* are basically lost in the dark. As a compromise, assume we slept -* for the whole expected time. +* Try to figure out how much time passed between entry to low +* power state and occurrence of the wakeup event. +* +* If the entered idle state didn't support residency measurements, +* we are basically lost in the dark how much time passed. +* As a compromise, assume we slept for the whole expected time. +* +* Any measured amount of time will include the exit latency. +* Since we are interested in when the wakeup begun, not when it +* was completed, we must substract the exit latency. However, if +* the measured amount of time is less than the exit latency, +* assume the state was never reached and the exit latency is 0. */ - if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) - last_idle_us = data->next_timer_us; - + if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) { + /* Use timer value as is */ + measured_us = data->next_timer_us; - measured_us = last_idle_us; + } else { + /* Use measured value */ + measured_us = cpuidle_get_last_residency(dev); - /* -* We correct for the exit latency; we are assuming here that the -* exit latency happens after the event that we're interested in. -*/ - if (measured_us > target->exit_latency) - measured_us -= target->exit_latency; + /* Deduct exit latency */ + if (measured_us > target->exit_latency) + measured_us -= target->exit_latency; - /* Make sure our coefficients do not exceed unity */ - if (measured_us > data->next_timer_us) - measured_us = data->next_timer_us; + /* Make sure our coefficients do not exceed unity */ + if (measured_us > data->next_timer_us) + measured_us = data->next_timer_us; + } /* Update our correction ratio */ new_factor = data->correction_factor[data->bucket]; @@ -439,7 +447,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->correction_factor[data->bucket] = new_factor; /* update the repeating-pattern data */ - data->intervals[data->interval_ptr++] = last_idle_us; + data->intervals[data->interval_ptr++] = measured_us; if (data->interval_ptr >= INTERVALS) data->interval_ptr = 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/7] Cpuidle: Deal with timer expiring in the past
Sometimes (fairly often) when the cpuidle menu governor is making a decision about idle state to enter the next timer for the cpu appears to expire in the past. The menu governor expects the expiry to always be in the future and in fact stores the time delta in an unsigned variable. However, when the expiry is in the past, the value returned by tick_nohz_get_sleep_length can be negative. This patch prevents using negative values, instead making the governor return immediately similar to having latency requirement set to 0. Note: As with latency == 0, the return value is 0 with no check to see if the state 0 has been disabled or not. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 71b5232..c414468 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (unlikely(latency_req == 0)) return 0; - /* determine the expected residency time, round up */ + /* +* Determine the expected residency time. If the time is negative, +* a timer interrupt has probably just expired after disabling +* interrupts. Return as quickly as possible in the most shallow +* state possible. tv_nsec is always positive, so only check the +* seconds. +*/ t = ktime_to_timespec(tick_nohz_get_sleep_length()); + if (t.tv_sec < 0) + return 0; data->next_timer_us = t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length
The menu governor statistics update function tries to determine the amount of time between entry to low power state and the occurrence of the wakeup event. However, the time measured by the framework includes exit latency on top of the desired value. This exit latency is substracted from the measured value to obtain the desired value. When measured value is not available, the menu governor assumes the wakeup was caused by the timer and the time is equal to remaining timer length. No exit latency should be substracted from this value. This patch prevents the erroneous substraction and clarifies the associated comment. It also removes one intermediate variable that serves no purpose. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 44 ++ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f0995dd..b347c10 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -387,32 +387,40 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct menu_device *data = __get_cpu_var(menu_devices); int last_idx = data-last_state_idx; - unsigned int last_idle_us = cpuidle_get_last_residency(dev); struct cpuidle_state *target = drv-states[last_idx]; unsigned int measured_us; unsigned int new_factor; /* -* Ugh, this idle state doesn't support residency measurements, so we -* are basically lost in the dark. As a compromise, assume we slept -* for the whole expected time. +* Try to figure out how much time passed between entry to low +* power state and occurrence of the wakeup event. +* +* If the entered idle state didn't support residency measurements, +* we are basically lost in the dark how much time passed. +* As a compromise, assume we slept for the whole expected time. +* +* Any measured amount of time will include the exit latency. +* Since we are interested in when the wakeup begun, not when it +* was completed, we must substract the exit latency. However, if +* the measured amount of time is less than the exit latency, +* assume the state was never reached and the exit latency is 0. */ - if (unlikely(!(target-flags CPUIDLE_FLAG_TIME_VALID))) - last_idle_us = data-next_timer_us; - + if (unlikely(!(target-flags CPUIDLE_FLAG_TIME_VALID))) { + /* Use timer value as is */ + measured_us = data-next_timer_us; - measured_us = last_idle_us; + } else { + /* Use measured value */ + measured_us = cpuidle_get_last_residency(dev); - /* -* We correct for the exit latency; we are assuming here that the -* exit latency happens after the event that we're interested in. -*/ - if (measured_us target-exit_latency) - measured_us -= target-exit_latency; + /* Deduct exit latency */ + if (measured_us target-exit_latency) + measured_us -= target-exit_latency; - /* Make sure our coefficients do not exceed unity */ - if (measured_us data-next_timer_us) - measured_us = data-next_timer_us; + /* Make sure our coefficients do not exceed unity */ + if (measured_us data-next_timer_us) + measured_us = data-next_timer_us; + } /* Update our correction ratio */ new_factor = data-correction_factor[data-bucket]; @@ -439,7 +447,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) data-correction_factor[data-bucket] = new_factor; /* update the repeating-pattern data */ - data-intervals[data-interval_ptr++] = last_idle_us; + data-intervals[data-interval_ptr++] = measured_us; if (data-interval_ptr = INTERVALS) data-interval_ptr = 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/7] Cpuidle: Deal with timer expiring in the past
Sometimes (fairly often) when the cpuidle menu governor is making a decision about idle state to enter the next timer for the cpu appears to expire in the past. The menu governor expects the expiry to always be in the future and in fact stores the time delta in an unsigned variable. However, when the expiry is in the past, the value returned by tick_nohz_get_sleep_length can be negative. This patch prevents using negative values, instead making the governor return immediately similar to having latency requirement set to 0. Note: As with latency == 0, the return value is 0 with no check to see if the state 0 has been disabled or not. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 71b5232..c414468 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (unlikely(latency_req == 0)) return 0; - /* determine the expected residency time, round up */ + /* +* Determine the expected residency time. If the time is negative, +* a timer interrupt has probably just expired after disabling +* interrupts. Return as quickly as possible in the most shallow +* state possible. tv_nsec is always positive, so only check the +* seconds. +*/ t = ktime_to_timespec(tick_nohz_get_sleep_length()); + if (t.tv_sec 0) + return 0; data-next_timer_us = t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor
The field expected_us is used to store the time remaining until next timer expiry. The name is inaccurate, as we really do not expect all wakeups to be caused by timers. In addition, another field with a very similar name (predicted_us) is used to store the predicted time remaining until any wakeup source being active. This patch renames expected_us to next_timer_us in order to better reflect the contained information. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index cf7f2f0..e9a2a27 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -122,7 +122,7 @@ struct menu_device { int last_state_idx; int needs_update; - unsigned intexpected_us; + unsigned intnext_timer_us; unsigned intpredicted_us; unsigned intexit_us; unsigned intbucket; @@ -257,7 +257,7 @@ again: stddev = int_sqrt(stddev); if (((avg stddev * 6) (divisor * 4 = INTERVALS * 3)) || stddev = 20) { - if (data-expected_us avg) + if (data-next_timer_us avg) data-predicted_us = avg; return; } @@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* determine the expected residency time, round up */ t = ktime_to_timespec(tick_nohz_get_sleep_length()); - data-expected_us = + data-next_timer_us = t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC; - data-bucket = which_bucket(data-expected_us); + data-bucket = which_bucket(data-next_timer_us); multiplier = performance_multiplier(); @@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) * operands are 32 bits. * Make sure to round up for half microseconds. */ - data-predicted_us = div_round64((uint64_t)data-expected_us * + data-predicted_us = div_round64((uint64_t)data-next_timer_us * data-correction_factor[data-bucket], RESOLUTION * DECAY); @@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) * We want to default to C1 (hlt), not to busy polling * unless the timer is happening really really soon. */ - if (data-expected_us 5 + if (data-next_timer_us 5 !drv-states[CPUIDLE_DRIVER_STATE_START].disabled dev-states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0) data-last_state_idx = CPUIDLE_DRIVER_STATE_START; @@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * for the whole expected time. */ if (unlikely(!(target-flags CPUIDLE_FLAG_TIME_VALID))) - last_idle_us = data-expected_us; + last_idle_us = data-next_timer_us; measured_us = last_idle_us; @@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) new_factor = data-correction_factor[data-bucket]; new_factor -= new_factor / DECAY; - if (data-expected_us 0 measured_us MAX_INTERESTING) - new_factor += RESOLUTION * measured_us / data-expected_us; + if (data-next_timer_us 0 measured_us MAX_INTERESTING) + new_factor += RESOLUTION * measured_us / data-next_timer_us; else /* * we were idle so long that we count it as a perfect -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/7] Cpuidle: Use actual state latency in menu governor
Currently menu governor records the exit latency of the state it has chosen for the idle period. The stored latency value is then later used to calculate the actual length of the idle period. This value may however be incorrect, as the entered state may not be the one chosen by the governor. The entered state information is available, so we can use that to obtain the real exit latency. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index e9a2a27..115386a 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -124,7 +124,6 @@ struct menu_device { unsigned intnext_timer_us; unsigned intpredicted_us; - unsigned intexit_us; unsigned intbucket; unsigned intcorrection_factor[BUCKETS]; unsigned intintervals[INTERVALS]; @@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) } data-last_state_idx = 0; - data-exit_us = 0; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) @@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) continue; data-last_state_idx = i; - data-exit_us = s-exit_latency; } return data-last_state_idx; @@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * We correct for the exit latency; we are assuming here that the * exit latency happens after the event that we're interested in. */ - if (measured_us data-exit_us) - measured_us -= data-exit_us; + if (measured_us target-exit_latency) + measured_us -= target-exit_latency; /* Update our correction ratio */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/8] Cpuidle: Fix variable domains in get_typical_interval()
From: Tuukka Tikkanen The menu governor uses a static function get_typical_interval() to try to detect a repeating pattern of wakeups. The previous interval durations are stored as an array of unsigned ints, but the arithmetic in the function is performed exclusively as 64 bit values, even when the value stored in a variable is known not to exceed unsigned int, which may be smaller and more efficient on some platforms. This patch changes the types of varibles used to store some intermediates, the maximum and and the cutoff threshold to unsigned ints. Average and standard deviation are still treated as 64 bit values, even when the values are known to be within the domain of unsigned int, to avoid casts to ensure correct integer promotion for arithmetic operations. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 0b37ee4..96fd10d 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -200,18 +200,19 @@ static u64 div_round64(u64 dividend, u32 divisor) static void get_typical_interval(struct menu_device *data) { int i, divisor; - uint64_t max, avg, stddev; - int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ + unsigned int max, thresh; + uint64_t avg, stddev; + + thresh = ULONG_MAX; /* Discard outliers above this value */ again: - /* first calculate average and standard deviation of the past */ + /* First calculate the average of past intervals */ max = 0; avg = 0; divisor = 0; - stddev = 0; for (i = 0; i < INTERVALS; i++) { - int64_t value = data->intervals[i]; + unsigned int value = data->intervals[i]; if (value <= thresh) { avg += value; divisor++; @@ -221,8 +222,10 @@ again: } do_div(avg, divisor); + /* Then try to determine standard deviation */ + stddev = 0; for (i = 0; i < INTERVALS; i++) { - int64_t value = data->intervals[i]; + unsigned int value = data->intervals[i]; if (value <= thresh) { int64_t diff = value - avg; stddev += diff * diff; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] Cpuidle: Change struct menu_device field types
From: Tuukka Tikkanen Field predicted_us value can never exceed expected_us value, but it has a potentially larger type. As there is no need for additional 32 bits of zeroes on 32 bit plaforms, change the type of predicted_us to match the type of expected_us. Field correction_factor is used to store a value that cannot exceed the product of RESOLUTION and DECAY (default 1024*8 = 8192). The constants cannot in practice be incremented to such values, that they'd overflow unsigned int even on 32 bit systems, so the type is changed to avoid unnecessary 64 bit arithmetic on 32 bit systems. One multiplication of (now) 32 bit values needs an added cast to avoid truncation of the result and has been added. In order to avoid another multiplication from 32 bit domain to 64 bit domain, the new correction_factor calculation has been changed from new = old * (DECAY-1) / DECAY to new = old - old / DECAY, which with infinite precision would yeild exactly the same result, but now changes the direction of rounding. The impact is not significant as the maximum accumulated difference cannot exceed the value of DECAY, which is relatively small compared to product of RESOLUTION and DECAY (8 / 8192). Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f277c13..cbd9b6c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -123,10 +123,10 @@ struct menu_device { int needs_update; unsigned intexpected_us; - u64 predicted_us; + unsigned intpredicted_us; unsigned intexit_us; unsigned intbucket; - u64 correction_factor[BUCKETS]; + unsigned intcorrection_factor[BUCKETS]; unsigned intintervals[INTERVALS]; int interval_ptr; }; @@ -321,8 +321,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (data->correction_factor[data->bucket] == 0) data->correction_factor[data->bucket] = RESOLUTION * DECAY; - /* Make sure to round up for half microseconds */ - data->predicted_us = div_round64(data->expected_us * data->correction_factor[data->bucket], + /* +* Force the result of multiplication to be 64 bits even if both +* operands are 32 bits. +* Make sure to round up for half microseconds. +*/ + data->predicted_us = div_round64((uint64_t)data->expected_us * +data->correction_factor[data->bucket], RESOLUTION * DECAY); get_typical_interval(data); @@ -388,7 +393,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) unsigned int last_idle_us = cpuidle_get_last_residency(dev); struct cpuidle_state *target = >states[last_idx]; unsigned int measured_us; - u64 new_factor; + unsigned int new_factor; /* * Ugh, this idle state doesn't support residency measurements, so we @@ -409,10 +414,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) measured_us -= data->exit_us; - /* update our correction ratio */ - - new_factor = data->correction_factor[data->bucket] - * (DECAY - 1) / DECAY; + /* Update our correction ratio */ + new_factor = data->correction_factor[data->bucket]; + new_factor -= new_factor / DECAY; if (data->expected_us > 0 && measured_us < MAX_INTERESTING) new_factor += RESOLUTION * measured_us / data->expected_us; @@ -425,9 +429,11 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* * We don't want 0 as factor; we always want at least -* a tiny bit of estimated time. +* a tiny bit of estimated time. Fortunately, due to rounding, +* new_factor will stay nonzero regardless of measured_us values +* and the compiler can eliminate this test as long as DECAY > 1. */ - if (new_factor == 0) + if (DECAY == 1 && unlikely(new_factor == 0)) new_factor = 1; data->correction_factor[data->bucket] = new_factor; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/8] Cpuidle: Fix menu_device->intervals type
From: Tuukka Tikkanen Struct menu_device member intervals is declared as u32, but the value stored is (unsigned) int. The type is changed to match the value being stored. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index fe2dd04..0b37ee4 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -118,7 +118,7 @@ struct menu_device { unsigned intexit_us; unsigned intbucket; u64 correction_factor[BUCKETS]; - u32 intervals[INTERVALS]; + unsigned intintervals[INTERVALS]; int interval_ptr; }; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] Cpuidle: CodingStyle: Break up multiple assignments on single line
From: Tuukka Tikkanen The function get_typical_interval() initializes a number of variables that are immediately after declarations assigned constant values. In addition, there are multiple assignments on a single line, which is explicitly forbidden by Documentation/CodingStyle. This patch removes redundant initial values for the variables and breaks up the multiple assignment line. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index c24b10b..fe2dd04 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -199,14 +199,17 @@ static u64 div_round64(u64 dividend, u32 divisor) */ static void get_typical_interval(struct menu_device *data) { - int i = 0, divisor = 0; - uint64_t max = 0, avg = 0, stddev = 0; + int i, divisor; + uint64_t max, avg, stddev; int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ again: /* first calculate average and standard deviation of the past */ - max = avg = divisor = stddev = 0; + max = 0; + avg = 0; + divisor = 0; + stddev = 0; for (i = 0; i < INTERVALS; i++) { int64_t value = data->intervals[i]; if (value <= thresh) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Cpuidle: Menu governor fixes
From: Tuukka Tikkanen This series of patches fixes some bugs, style issues and wild use of variable types in cpuidle menu governor. One of the bugs is a logic flaw where a detected previously recurring pattern is given priority over known guaranteed earlier wakeup. The others involve value overflows. Tuukka Tikkanen (8): Cpuidle: Ignore interval prediction result when timer is shorter Cpuidle: Rearrange code and comments in get_typical_interval() Cpuidle: Check called function parameter in get_typical_interval() Cpuidle: CodingStyle: Break up multiple assignments on single line Cpuidle: Fix menu_device->intervals type Cpuidle: Fix variable domains in get_typical_interval() Cpuidle: Add a comment warning about possible overflow Cpuidle: Change struct menu_device field types drivers/cpuidle/governors/menu.c | 96 ++ 1 file changed, 65 insertions(+), 31 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] Cpuidle: Ignore interval prediction result when timer is shorter
From: Tuukka Tikkanen This patch prevents cpuidle menu governor from using repeating interval prediction result if the idle period predicted is longer than the one allowed by shortest running timer. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index bc580b6..c8ab1c5 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -238,10 +238,13 @@ again: * * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. +* +* Use this result only if there is no timer to wake us up sooner. */ if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) || stddev <= 20) { - data->predicted_us = avg; + if (data->expected_us > avg) + data->predicted_us = avg; return; } else if ((divisor * 4) > INTERVALS * 3) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] Cpuidle: Rearrange code and comments in get_typical_interval()
From: Tuukka Tikkanen This patch rearranges a if-return-elsif-goto-fi-return sequence into if-return-fi-if-return-fi-goto sequence. The functionality remains the same. Also, a lengthy comment that did not describe the functionality in the order it occurs is split into half and top half is moved closer to actual implementation it describes. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index c8ab1c5..b13d57c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -228,14 +228,6 @@ again: do_div(stddev, divisor); stddev = int_sqrt(stddev); /* -* If we have outliers to the upside in our distribution, discard -* those by setting the threshold to exclude these outliers, then -* calculate the average and standard deviation again. Once we get -* down to the bottom 3/4 of our samples, stop excluding samples. -* -* This can deal with workloads that have long pauses interspersed -* with sporadic activity with a bunch of short pauses. -* * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. * @@ -246,12 +238,22 @@ again: if (data->expected_us > avg) data->predicted_us = avg; return; - - } else if ((divisor * 4) > INTERVALS * 3) { - /* Exclude the max interval */ - thresh = max - 1; - goto again; } + + /* +* If we have outliers to the upside in our distribution, discard +* those by setting the threshold to exclude these outliers, then +* calculate the average and standard deviation again. Once we get +* down to the bottom 3/4 of our samples, stop excluding samples. +* +* This can deal with workloads that have long pauses interspersed +* with sporadic activity with a bunch of short pauses. +*/ + if ((divisor * 4) <= INTERVALS * 3) + return; + + thresh = max - 1; + goto again; } /** -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] Cpuidle: Check called function parameter in get_typical_interval()
From: Tuukka Tikkanen get_typical_interval() uses int_sqrt() in calculation of standard deviation. The formal parameter of int_sqrt() is unsigned long, which may on some platforms be smaller than the 64 bit unsigned integer used as the actual parameter. The overflow can occur frequently when actual idle period lengths are in hundreds of milliseconds. This patch adds a check for such overflow and rejects the candidate average when an overflow would occur. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b13d57c..c24b10b 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -226,18 +226,26 @@ again: } } do_div(stddev, divisor); - stddev = int_sqrt(stddev); /* * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. * +* int_sqrt() formal parameter type is unsigned long. When the +* greatest difference to an outlier exceeds ~65 ms * sqrt(divisor) +* the resulting squared standard deviation exceeds the input domain +* of int_sqrt on platforms where unsigned long is 32 bits in size. +* In such case reject the candidate average. +* * Use this result only if there is no timer to wake us up sooner. */ - if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) + if (likely(stddev <= ULONG_MAX)) { + stddev = int_sqrt(stddev); + if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) || stddev <= 20) { - if (data->expected_us > avg) - data->predicted_us = avg; - return; + if (data->expected_us > avg) + data->predicted_us = avg; + return; + } } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] Cpuidle: Add a comment warning about possible overflow
From: Tuukka Tikkanen The menu governor has a number of tunable constants that may be changed in the source. If certain combination of values are chosen, an overflow is possible when the correction_factor is being recalculated. This patch adds a warning regarding this possibility and describes the change needed for fixing the issue. The change should not be permanently enabled, as it will hurt performance when it is not needed. Signed-off-by: Tuukka Tikkanen --- drivers/cpuidle/governors/menu.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 96fd10d..f277c13 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -21,6 +21,15 @@ #include #include +/* + * Please note when changing the tuning values: + * If (MAX_INTERESTING-1) * RESOLUTION > ULONG_MAX, the result of + * a scaling operation multiplication may overflow on 32 bit platforms. + * In that case, #define RESOLUTION as ULL to get 64 bit result: + * #define RESOLUTION 1024ULL + * + * The default values do not overflow. + */ #define BUCKETS 12 #define INTERVALS 8 #define RESOLUTION 1024 -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] Cpuidle: Add a comment warning about possible overflow
From: Tuukka Tikkanen tuukka.tikka...@linaro.org The menu governor has a number of tunable constants that may be changed in the source. If certain combination of values are chosen, an overflow is possible when the correction_factor is being recalculated. This patch adds a warning regarding this possibility and describes the change needed for fixing the issue. The change should not be permanently enabled, as it will hurt performance when it is not needed. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 96fd10d..f277c13 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -21,6 +21,15 @@ #include linux/math64.h #include linux/module.h +/* + * Please note when changing the tuning values: + * If (MAX_INTERESTING-1) * RESOLUTION ULONG_MAX, the result of + * a scaling operation multiplication may overflow on 32 bit platforms. + * In that case, #define RESOLUTION as ULL to get 64 bit result: + * #define RESOLUTION 1024ULL + * + * The default values do not overflow. + */ #define BUCKETS 12 #define INTERVALS 8 #define RESOLUTION 1024 -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] Cpuidle: Check called function parameter in get_typical_interval()
From: Tuukka Tikkanen tuukka.tikka...@linaro.org get_typical_interval() uses int_sqrt() in calculation of standard deviation. The formal parameter of int_sqrt() is unsigned long, which may on some platforms be smaller than the 64 bit unsigned integer used as the actual parameter. The overflow can occur frequently when actual idle period lengths are in hundreds of milliseconds. This patch adds a check for such overflow and rejects the candidate average when an overflow would occur. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b13d57c..c24b10b 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -226,18 +226,26 @@ again: } } do_div(stddev, divisor); - stddev = int_sqrt(stddev); /* * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. * +* int_sqrt() formal parameter type is unsigned long. When the +* greatest difference to an outlier exceeds ~65 ms * sqrt(divisor) +* the resulting squared standard deviation exceeds the input domain +* of int_sqrt on platforms where unsigned long is 32 bits in size. +* In such case reject the candidate average. +* * Use this result only if there is no timer to wake us up sooner. */ - if (((avg stddev * 6) (divisor * 4 = INTERVALS * 3)) + if (likely(stddev = ULONG_MAX)) { + stddev = int_sqrt(stddev); + if (((avg stddev * 6) (divisor * 4 = INTERVALS * 3)) || stddev = 20) { - if (data-expected_us avg) - data-predicted_us = avg; - return; + if (data-expected_us avg) + data-predicted_us = avg; + return; + } } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] Cpuidle: Rearrange code and comments in get_typical_interval()
From: Tuukka Tikkanen tuukka.tikka...@linaro.org This patch rearranges a if-return-elsif-goto-fi-return sequence into if-return-fi-if-return-fi-goto sequence. The functionality remains the same. Also, a lengthy comment that did not describe the functionality in the order it occurs is split into half and top half is moved closer to actual implementation it describes. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index c8ab1c5..b13d57c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -228,14 +228,6 @@ again: do_div(stddev, divisor); stddev = int_sqrt(stddev); /* -* If we have outliers to the upside in our distribution, discard -* those by setting the threshold to exclude these outliers, then -* calculate the average and standard deviation again. Once we get -* down to the bottom 3/4 of our samples, stop excluding samples. -* -* This can deal with workloads that have long pauses interspersed -* with sporadic activity with a bunch of short pauses. -* * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. * @@ -246,12 +238,22 @@ again: if (data-expected_us avg) data-predicted_us = avg; return; - - } else if ((divisor * 4) INTERVALS * 3) { - /* Exclude the max interval */ - thresh = max - 1; - goto again; } + + /* +* If we have outliers to the upside in our distribution, discard +* those by setting the threshold to exclude these outliers, then +* calculate the average and standard deviation again. Once we get +* down to the bottom 3/4 of our samples, stop excluding samples. +* +* This can deal with workloads that have long pauses interspersed +* with sporadic activity with a bunch of short pauses. +*/ + if ((divisor * 4) = INTERVALS * 3) + return; + + thresh = max - 1; + goto again; } /** -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] Cpuidle: Ignore interval prediction result when timer is shorter
From: Tuukka Tikkanen tuukka.tikka...@linaro.org This patch prevents cpuidle menu governor from using repeating interval prediction result if the idle period predicted is longer than the one allowed by shortest running timer. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index bc580b6..c8ab1c5 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -238,10 +238,13 @@ again: * * The typical interval is obtained when standard deviation is small * or standard deviation is small compared to the average interval. +* +* Use this result only if there is no timer to wake us up sooner. */ if (((avg stddev * 6) (divisor * 4 = INTERVALS * 3)) || stddev = 20) { - data-predicted_us = avg; + if (data-expected_us avg) + data-predicted_us = avg; return; } else if ((divisor * 4) INTERVALS * 3) { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Cpuidle: Menu governor fixes
From: Tuukka Tikkanen tuukka.tikka...@linaro.org This series of patches fixes some bugs, style issues and wild use of variable types in cpuidle menu governor. One of the bugs is a logic flaw where a detected previously recurring pattern is given priority over known guaranteed earlier wakeup. The others involve value overflows. Tuukka Tikkanen (8): Cpuidle: Ignore interval prediction result when timer is shorter Cpuidle: Rearrange code and comments in get_typical_interval() Cpuidle: Check called function parameter in get_typical_interval() Cpuidle: CodingStyle: Break up multiple assignments on single line Cpuidle: Fix menu_device-intervals type Cpuidle: Fix variable domains in get_typical_interval() Cpuidle: Add a comment warning about possible overflow Cpuidle: Change struct menu_device field types drivers/cpuidle/governors/menu.c | 96 ++ 1 file changed, 65 insertions(+), 31 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] Cpuidle: CodingStyle: Break up multiple assignments on single line
From: Tuukka Tikkanen tuukka.tikka...@linaro.org The function get_typical_interval() initializes a number of variables that are immediately after declarations assigned constant values. In addition, there are multiple assignments on a single line, which is explicitly forbidden by Documentation/CodingStyle. This patch removes redundant initial values for the variables and breaks up the multiple assignment line. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index c24b10b..fe2dd04 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -199,14 +199,17 @@ static u64 div_round64(u64 dividend, u32 divisor) */ static void get_typical_interval(struct menu_device *data) { - int i = 0, divisor = 0; - uint64_t max = 0, avg = 0, stddev = 0; + int i, divisor; + uint64_t max, avg, stddev; int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ again: /* first calculate average and standard deviation of the past */ - max = avg = divisor = stddev = 0; + max = 0; + avg = 0; + divisor = 0; + stddev = 0; for (i = 0; i INTERVALS; i++) { int64_t value = data-intervals[i]; if (value = thresh) { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/8] Cpuidle: Fix menu_device-intervals type
From: Tuukka Tikkanen tuukka.tikka...@linaro.org Struct menu_device member intervals is declared as u32, but the value stored is (unsigned) int. The type is changed to match the value being stored. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index fe2dd04..0b37ee4 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -118,7 +118,7 @@ struct menu_device { unsigned intexit_us; unsigned intbucket; u64 correction_factor[BUCKETS]; - u32 intervals[INTERVALS]; + unsigned intintervals[INTERVALS]; int interval_ptr; }; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] Cpuidle: Change struct menu_device field types
From: Tuukka Tikkanen tuukka.tikka...@linaro.org Field predicted_us value can never exceed expected_us value, but it has a potentially larger type. As there is no need for additional 32 bits of zeroes on 32 bit plaforms, change the type of predicted_us to match the type of expected_us. Field correction_factor is used to store a value that cannot exceed the product of RESOLUTION and DECAY (default 1024*8 = 8192). The constants cannot in practice be incremented to such values, that they'd overflow unsigned int even on 32 bit systems, so the type is changed to avoid unnecessary 64 bit arithmetic on 32 bit systems. One multiplication of (now) 32 bit values needs an added cast to avoid truncation of the result and has been added. In order to avoid another multiplication from 32 bit domain to 64 bit domain, the new correction_factor calculation has been changed from new = old * (DECAY-1) / DECAY to new = old - old / DECAY, which with infinite precision would yeild exactly the same result, but now changes the direction of rounding. The impact is not significant as the maximum accumulated difference cannot exceed the value of DECAY, which is relatively small compared to product of RESOLUTION and DECAY (8 / 8192). Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f277c13..cbd9b6c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -123,10 +123,10 @@ struct menu_device { int needs_update; unsigned intexpected_us; - u64 predicted_us; + unsigned intpredicted_us; unsigned intexit_us; unsigned intbucket; - u64 correction_factor[BUCKETS]; + unsigned intcorrection_factor[BUCKETS]; unsigned intintervals[INTERVALS]; int interval_ptr; }; @@ -321,8 +321,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (data-correction_factor[data-bucket] == 0) data-correction_factor[data-bucket] = RESOLUTION * DECAY; - /* Make sure to round up for half microseconds */ - data-predicted_us = div_round64(data-expected_us * data-correction_factor[data-bucket], + /* +* Force the result of multiplication to be 64 bits even if both +* operands are 32 bits. +* Make sure to round up for half microseconds. +*/ + data-predicted_us = div_round64((uint64_t)data-expected_us * +data-correction_factor[data-bucket], RESOLUTION * DECAY); get_typical_interval(data); @@ -388,7 +393,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) unsigned int last_idle_us = cpuidle_get_last_residency(dev); struct cpuidle_state *target = drv-states[last_idx]; unsigned int measured_us; - u64 new_factor; + unsigned int new_factor; /* * Ugh, this idle state doesn't support residency measurements, so we @@ -409,10 +414,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) measured_us -= data-exit_us; - /* update our correction ratio */ - - new_factor = data-correction_factor[data-bucket] - * (DECAY - 1) / DECAY; + /* Update our correction ratio */ + new_factor = data-correction_factor[data-bucket]; + new_factor -= new_factor / DECAY; if (data-expected_us 0 measured_us MAX_INTERESTING) new_factor += RESOLUTION * measured_us / data-expected_us; @@ -425,9 +429,11 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* * We don't want 0 as factor; we always want at least -* a tiny bit of estimated time. +* a tiny bit of estimated time. Fortunately, due to rounding, +* new_factor will stay nonzero regardless of measured_us values +* and the compiler can eliminate this test as long as DECAY 1. */ - if (new_factor == 0) + if (DECAY == 1 unlikely(new_factor == 0)) new_factor = 1; data-correction_factor[data-bucket] = new_factor; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/8] Cpuidle: Fix variable domains in get_typical_interval()
From: Tuukka Tikkanen tuukka.tikka...@linaro.org The menu governor uses a static function get_typical_interval() to try to detect a repeating pattern of wakeups. The previous interval durations are stored as an array of unsigned ints, but the arithmetic in the function is performed exclusively as 64 bit values, even when the value stored in a variable is known not to exceed unsigned int, which may be smaller and more efficient on some platforms. This patch changes the types of varibles used to store some intermediates, the maximum and and the cutoff threshold to unsigned ints. Average and standard deviation are still treated as 64 bit values, even when the values are known to be within the domain of unsigned int, to avoid casts to ensure correct integer promotion for arithmetic operations. Signed-off-by: Tuukka Tikkanen tuukka.tikka...@linaro.org --- drivers/cpuidle/governors/menu.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 0b37ee4..96fd10d 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -200,18 +200,19 @@ static u64 div_round64(u64 dividend, u32 divisor) static void get_typical_interval(struct menu_device *data) { int i, divisor; - uint64_t max, avg, stddev; - int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ + unsigned int max, thresh; + uint64_t avg, stddev; + + thresh = ULONG_MAX; /* Discard outliers above this value */ again: - /* first calculate average and standard deviation of the past */ + /* First calculate the average of past intervals */ max = 0; avg = 0; divisor = 0; - stddev = 0; for (i = 0; i INTERVALS; i++) { - int64_t value = data-intervals[i]; + unsigned int value = data-intervals[i]; if (value = thresh) { avg += value; divisor++; @@ -221,8 +222,10 @@ again: } do_div(avg, divisor); + /* Then try to determine standard deviation */ + stddev = 0; for (i = 0; i INTERVALS; i++) { - int64_t value = data-intervals[i]; + unsigned int value = data-intervals[i]; if (value = thresh) { int64_t diff = value - avg; stddev += diff * diff; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/