Re: [RFD/RFC PATCH 4/8] sched: Split scheduler execution context
Dear Juri, just a small comment for the next round of patches (I guess after OSPM)... On 091018, 11:24, Juri Lelli wrote: > From: Peter Zijlstra > > Lets define the scheduling context as all the scheduler state in > task_struct and the execution context as all state required to run the > task. > > Currently both are intertwined in task_struct. We want to logically > split these such that we can run the execution context of one task > with the scheduling context of another. > > To this purpose introduce rq::proxy to point to the task_struct used > for scheduler state and preserve rq::curr to denote the execution > context. > > Signed-off-by: Peter Zijlstra (Intel) > [added lot of comments/questions - identifiable by XXX] > Signed-off-by: Juri Lelli > --- > kernel/sched/core.c | 62 ++-- > kernel/sched/fair.c | 4 +++ > kernel/sched/sched.h | 30 - > 3 files changed, 82 insertions(+), 14 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index fe0223121883..d3c481b734dd 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -224,12 +224,13 @@ static enum hrtimer_restart hrtick(struct hrtimer > *timer) > { > struct rq *rq = container_of(timer, struct rq, hrtick_timer); > struct rq_flags rf; > + struct task_struct *curr = rq->proxy; You may want to use a different naming for these local variables (e.g. "proxy") to help the reader in not confusing curr (i.e. scheduling ctx) with rq::curr (i.e. execution ctx). Best regards, Claudio
Re: [PATCH] kernel/time/posix-cpu-timers: Remove useless call to check_dl_overrun
On 071118, 12:10, Juri Lelli wrote: > check_dl_overrun is used to send a SIGXCPU to users that asked to be > informed when SCHED_DEADLINE runtime overruns occur. > > The function is called by check_thread_timers already, so the call in > check_process_timers is redundant/wrong (even though harmless). > > Remove it. > > Fixes: 34be39305a77 ("sched/deadline: Implement "runtime overrun signal" > support") > Signed-off-by: Juri Lelli > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Luca Abeni > Cc: Claudio Scordino > Cc: Mathieu Poirier > --- > kernel/time/posix-cpu-timers.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c > index ce32cf741b25..8f0644af40be 100644 > --- a/kernel/time/posix-cpu-timers.c > +++ b/kernel/time/posix-cpu-timers.c > @@ -917,9 +917,6 @@ static void check_process_timers(struct task_struct *tsk, > struct task_cputime cputime; > unsigned long soft; > > - if (dl_task(tsk)) > - check_dl_overrun(tsk); > - > /* >* If cputimer is not running, then there are no active >* process wide timers (POSIX 1.b, itimers, RLIMIT_CPU). > -- [ Superfluous as (fortunately) already queued...] Tested-by: Claudio Scordino Thank you for taking care of this pending issue. Claudio
Re: [PATCH v5 00/10] track CPU utilization
Hi Quentin, 2018-06-06 15:20 GMT+02:00 Quentin Perret : > > Hi Claudio, > > On Wednesday 06 Jun 2018 at 15:05:58 (+0200), Claudio Scordino wrote: > > Hi Quentin, > > > > Il 05/06/2018 16:13, Juri Lelli ha scritto: > > > On 05/06/18 15:01, Quentin Perret wrote: > > > > On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote: > > > > > On 05/06/18 14:05, Quentin Perret wrote: > > > > > > On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote: > > > > > > > Hi Quentin, > > > > > > > > > > > > > > On 05/06/18 11:57, Quentin Perret wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > What about the diff below (just a quick hack to show the idea) > > > > > > > > applied > > > > > > > > on tip/sched/core ? > > > > > > > > > > > > > > > > ---8<--- > > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c > > > > > > > > b/kernel/sched/cpufreq_schedutil.c > > > > > > > > index a8ba6d1f262a..23a4fb1c2c25 100644 > > > > > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > > > > > @@ -180,9 +180,12 @@ static void sugov_get_util(struct > > > > > > > > sugov_cpu *sg_cpu) > > > > > > > > sg_cpu->util_dl = cpu_util_dl(rq); > > > > > > > > } > > > > > > > > +unsigned long scale_rt_capacity(int cpu); > > > > > > > > static unsigned long sugov_aggregate_util(struct sugov_cpu > > > > > > > > *sg_cpu) > > > > > > > > { > > > > > > > > struct rq *rq = cpu_rq(sg_cpu->cpu); > > > > > > > > + int cpu = sg_cpu->cpu; > > > > > > > > + unsigned long util, dl_bw; > > > > > > > > if (rq->rt.rt_nr_running) > > > > > > > > return sg_cpu->max; > > > > > > > > @@ -197,7 +200,14 @@ static unsigned long > > > > > > > > sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > > > > > > >* util_cfs + util_dl as requested freq. However, > > > > > > > > cpufreq is not yet > > > > > > > >* ready for such an interface. So, we only do the > > > > > > > > latter for now. > > > > > > > >*/ > > > > > > > > - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); > > > > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) * > > > > > > > > scale_rt_capacity(cpu); > > > > > > > > > > > > > > Sorry to be pedantinc, but this (ATM) includes DL avg > > > > > > > contribution, so, > > > > > > > since we use max below, we will probably have the same problem > > > > > > > that we > > > > > > > discussed on Vincent's approach (overestimation of DL > > > > > > > contribution while > > > > > > > we could use running_bw). > > > > > > > > > > > > Ah no, you're right, this isn't great for long running deadline > > > > > > tasks. > > > > > > We should definitely account for the running_bw here, not the dl > > > > > > avg... > > > > > > > > > > > > I was trying to address the issue of RT stealing time from CFS > > > > > > here, but > > > > > > the DL integration isn't quite right which this patch as-is, I > > > > > > agree ... > > > > > > > > > > > > > > > > > > > > > + util >>= SCHED_CAPACITY_SHIFT; > > > > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) - util; > > > > > > > > + util += sg_cpu->util_cfs; > > > > > > > > + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; > > > > > > > > > > > > > > Why this_bw instead of running_bw? > > > > > > > > > > > >
Re: [PATCH v5 00/10] track CPU utilization
Hi Quentin, Il 05/06/2018 16:13, Juri Lelli ha scritto: On 05/06/18 15:01, Quentin Perret wrote: On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote: On 05/06/18 14:05, Quentin Perret wrote: On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote: Hi Quentin, On 05/06/18 11:57, Quentin Perret wrote: [...] What about the diff below (just a quick hack to show the idea) applied on tip/sched/core ? ---8<--- diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index a8ba6d1f262a..23a4fb1c2c25 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu) sg_cpu->util_dl = cpu_util_dl(rq); } +unsigned long scale_rt_capacity(int cpu); static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) { struct rq *rq = cpu_rq(sg_cpu->cpu); + int cpu = sg_cpu->cpu; + unsigned long util, dl_bw; if (rq->rt.rt_nr_running) return sg_cpu->max; @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) * util_cfs + util_dl as requested freq. However, cpufreq is not yet * ready for such an interface. So, we only do the latter for now. */ - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); + util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu); Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so, since we use max below, we will probably have the same problem that we discussed on Vincent's approach (overestimation of DL contribution while we could use running_bw). Ah no, you're right, this isn't great for long running deadline tasks. We should definitely account for the running_bw here, not the dl avg... I was trying to address the issue of RT stealing time from CFS here, but the DL integration isn't quite right which this patch as-is, I agree ... + util >>= SCHED_CAPACITY_SHIFT; + util = arch_scale_cpu_capacity(NULL, cpu) - util; + util += sg_cpu->util_cfs; + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; Why this_bw instead of running_bw? So IIUC, this_bw should basically give you the absolute reservation (== the sum of runtime/deadline ratios of all DL tasks on that rq). Yep. The reason I added this max is because I'm still not sure to understand how we can safely drop the freq below that point ? If we don't guarantee to always stay at least at the freq required by DL, aren't we risking to start a deadline tasks stuck at a low freq because of rate limiting ? In this case, if that tasks uses all of its runtime then you might start missing deadlines ... We decided to avoid (software) rate limiting for DL with e97a90f7069b ("sched/cpufreq: Rate limits for SCHED_DEADLINE"). Right, I spotted that one, but yeah you could also be limited by HW ... My feeling is that the only safe thing to do is to guarantee to never go below the freq required by DL, and to optimistically add CFS tasks without raising the OPP if we have good reasons to think that DL is using less than it required (which is what we should get by using running_bw above I suppose). Does that make any sense ? Then we can't still avoid the hardware limits, so using running_bw is a trade off between safety (especially considering soft real-time scenarios) and energy consumption (which seems to be working in practice). Ok, I see ... Have you guys already tried something like my patch above (keeping the freq >= this_bw) in real world use cases ? Is this costing that much energy in practice ? If we fill the gaps left by DL (when it IIRC, Claudio (now Cc-ed) did experiment a bit with both approaches, so he might add some numbers to my words above. I didn't (yet). But, please consider that I might be reserving (for example) 50% of bandwidth for my heavy and time sensitive task and then have that task wake up only once in a while (but I'll be keeping clock speed up for the whole time). :/ As far as I can remember, we never tested energy consumption of running_bw vs this_bw, as at OSPM'17 we had already decided to use running_bw implementing GRUB-PA. The rationale is that, as Juri pointed out, the amount of spare (i.e. reclaimable) bandwidth in this_bw is very user-dependent. For example, the user can let this_bw be much higher than the measured bandwidth, just to be sure that the deadlines are met even in corner cases. In practice, this means that the task executes for quite a short time and then blocks (with its bandwidth reclaimed, hence the CPU frequency reduced, at the 0lag time). Using this_bw rather than running_bw, the CPU frequency would remain at the same fixed value even when the task is blocked. I understand that on some cases it could even be better (i.e. no waste of energy in frequency switch). However, IMHO, these are corner cases and in the average case it is
[tip:sched/core] sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation
Commit-ID: bb4e30a48045c9cc16c4efe447489542750397cc Gitweb: https://git.kernel.org/tip/bb4e30a48045c9cc16c4efe447489542750397cc Author: Claudio Scordino AuthorDate: Tue, 3 Apr 2018 09:42:42 +0200 Committer: Ingo Molnar CommitDate: Mon, 14 May 2018 09:12:27 +0200 sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation Signed-off-by: Claudio Scordino Signed-off-by: Peter Zijlstra (Intel) Cc: Jonathan Corbet Cc: Juri Lelli Cc: Linus Torvalds Cc: Luca Abeni Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1522741362-4542-1-git-send-email-clau...@evidence.eu.com Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 8ce78f82ae23..b14e03ff3528 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -49,7 +49,7 @@ CONTENTS 2.1 Main algorithm -- - SCHED_DEADLINE uses three parameters, named "runtime", "period", and + SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and "deadline", to schedule tasks. A SCHED_DEADLINE task should receive "runtime" microseconds of execution time every "period" microseconds, and these "runtime" microseconds are available within "deadline" microseconds @@ -117,6 +117,10 @@ CONTENTS scheduling deadline = scheduling deadline + period remaining runtime = remaining runtime + runtime + The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task + to get informed about runtime overruns through the delivery of SIGXCPU + signals. + 2.2 Bandwidth reclaiming @@ -279,6 +283,19 @@ CONTENTS running_bw is incremented. +2.3 Energy-aware scheduling + + + When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the + GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum + value that still allows to meet the deadlines. This behavior is currently + implemented only for ARM architectures. + + A particular care must be taken in case the time needed for changing frequency + is of the same order of magnitude of the reservation period. In such cases, + setting a fixed CPU frequency results in a lower amount of deadline misses. + + 3. Scheduling Real-Time Tasks = @@ -505,6 +522,12 @@ CONTENTS 17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016. + 18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the + Linux kernel, Software: Practice and Experience, 46(6): 821-839, June + 2016. + 19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in + the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC + 2018), Pau, France, April 2018. 4. Bandwidth management
Re: [RFC PATCH] sched/cpufreq/schedutil: handling urgent frequency requests
Il 08/05/2018 08:54, Viresh Kumar ha scritto: On 07-05-18, 16:43, Claudio Scordino wrote: At OSPM, it was mentioned the issue about urgent CPU frequency requests arriving when a frequency switch is already in progress. Besides the various issues (physical time for switching frequency, on-going kthread activity, etc.) one (minor) issue is the kernel "forgetting" such request, thus waiting the next switch time for recomputing the needed frequency and behaving accordingly. This patch makes the kthread serve any urgent request occurred during the previous frequency switch. It introduces a specific flag, only set when the SCHED_DEADLINE scheduling class increases the CPU utilization, aiming at decreasing the likelihood of a deadline miss. Indeed, some preliminary tests in critical conditions (i.e. SCHED_DEADLINE tasks with short periods) have shown reductions of more than 10% of the average number of deadline misses. On the other hand, the increase in terms of energy consumption when running SCHED_DEADLINE tasks (not yet measured) is likely to be not negligible (especially in case of critical scenarios like "ramp up" utilizations). The patch is meant as follow-up discussion after OSPM. Signed-off-by: Claudio Scordino CC: Viresh Kumar CC: Rafael J. Wysocki CC: Peter Zijlstra CC: Ingo Molnar CC: Patrick Bellasi CC: Juri Lelli Cc: Luca Abeni CC: Joel Fernandes CC: linux...@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index d2c6083..4de06b0 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -41,6 +41,7 @@ struct sugov_policy { boolwork_in_progress; bool need_freq_update; + boolurgent_freq_update; }; struct sugov_cpu { @@ -92,6 +93,14 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) !cpufreq_can_do_remote_dvfs(sg_policy->policy)) return false; + /* +* Continue computing the new frequency. In case of work_in_progress, +* the kthread will resched a change once the current transition is +* finished. +*/ + if (sg_policy->urgent_freq_update) + return true; + if (sg_policy->work_in_progress) return false; @@ -121,6 +130,9 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; + if (sg_policy->work_in_progress) + return; + if (policy->fast_switch_enabled) { next_freq = cpufreq_driver_fast_switch(policy, next_freq); if (!next_freq) @@ -274,7 +286,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) { if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) - sg_policy->need_freq_update = true; + sg_policy->urgent_freq_update = true; } static void sugov_update_single(struct update_util_data *hook, u64 time, @@ -383,8 +395,11 @@ static void sugov_work(struct kthread_work *work) struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); mutex_lock(&sg_policy->work_lock); - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, + do { + sg_policy->urgent_freq_update = false; + __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L); If we are going to solve this problem, then maybe instead of the added complexity and a new flag we can look for need_freq_update flag at this location and re-calculate the next frequency if required. I agree. Indeed, I've been in doubt if adding a new flag or relying on the existing need_freq_update flag (whose name, however, didn't seem to reflect any sense of urgency). Maybe we can use need_freq_update but change its name to a more meaningful string ? Thanks, Claudio
[RFC PATCH] sched/cpufreq/schedutil: handling urgent frequency requests
At OSPM, it was mentioned the issue about urgent CPU frequency requests arriving when a frequency switch is already in progress. Besides the various issues (physical time for switching frequency, on-going kthread activity, etc.) one (minor) issue is the kernel "forgetting" such request, thus waiting the next switch time for recomputing the needed frequency and behaving accordingly. This patch makes the kthread serve any urgent request occurred during the previous frequency switch. It introduces a specific flag, only set when the SCHED_DEADLINE scheduling class increases the CPU utilization, aiming at decreasing the likelihood of a deadline miss. Indeed, some preliminary tests in critical conditions (i.e. SCHED_DEADLINE tasks with short periods) have shown reductions of more than 10% of the average number of deadline misses. On the other hand, the increase in terms of energy consumption when running SCHED_DEADLINE tasks (not yet measured) is likely to be not negligible (especially in case of critical scenarios like "ramp up" utilizations). The patch is meant as follow-up discussion after OSPM. Signed-off-by: Claudio Scordino CC: Viresh Kumar CC: Rafael J. Wysocki CC: Peter Zijlstra CC: Ingo Molnar CC: Patrick Bellasi CC: Juri Lelli Cc: Luca Abeni CC: Joel Fernandes CC: linux...@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index d2c6083..4de06b0 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -41,6 +41,7 @@ struct sugov_policy { boolwork_in_progress; boolneed_freq_update; + boolurgent_freq_update; }; struct sugov_cpu { @@ -92,6 +93,14 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) !cpufreq_can_do_remote_dvfs(sg_policy->policy)) return false; + /* +* Continue computing the new frequency. In case of work_in_progress, +* the kthread will resched a change once the current transition is +* finished. +*/ + if (sg_policy->urgent_freq_update) + return true; + if (sg_policy->work_in_progress) return false; @@ -121,6 +130,9 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; + if (sg_policy->work_in_progress) + return; + if (policy->fast_switch_enabled) { next_freq = cpufreq_driver_fast_switch(policy, next_freq); if (!next_freq) @@ -274,7 +286,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) { if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) - sg_policy->need_freq_update = true; + sg_policy->urgent_freq_update = true; } static void sugov_update_single(struct update_util_data *hook, u64 time, @@ -383,8 +395,11 @@ static void sugov_work(struct kthread_work *work) struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work); mutex_lock(&sg_policy->work_lock); - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, + do { + sg_policy->urgent_freq_update = false; + __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L); + } while (sg_policy->urgent_freq_update); mutex_unlock(&sg_policy->work_lock); sg_policy->work_in_progress = false; @@ -673,6 +688,7 @@ static int sugov_start(struct cpufreq_policy *policy) sg_policy->next_freq= UINT_MAX; sg_policy->work_in_progress = false; sg_policy->need_freq_update = false; + sg_policy->urgent_freq_update = false; sg_policy->cached_raw_freq = 0; for_each_cpu(cpu, policy->cpus) { -- 2.7.4
[PATCH] sched/deadline: add overrun signal and GRUB-PA in the documentation
Signed-off-by: Claudio Scordino CC: Juri Lelli CC: Luca Abeni CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org --- Documentation/scheduler/sched-deadline.txt | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 8ce78f8..b14e03f 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -49,7 +49,7 @@ CONTENTS 2.1 Main algorithm -- - SCHED_DEADLINE uses three parameters, named "runtime", "period", and + SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and "deadline", to schedule tasks. A SCHED_DEADLINE task should receive "runtime" microseconds of execution time every "period" microseconds, and these "runtime" microseconds are available within "deadline" microseconds @@ -117,6 +117,10 @@ CONTENTS scheduling deadline = scheduling deadline + period remaining runtime = remaining runtime + runtime + The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task + to get informed about runtime overruns through the delivery of SIGXCPU + signals. + 2.2 Bandwidth reclaiming @@ -279,6 +283,19 @@ CONTENTS running_bw is incremented. +2.3 Energy-aware scheduling + + + When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the + GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum + value that still allows to meet the deadlines. This behavior is currently + implemented only for ARM architectures. + + A particular care must be taken in case the time needed for changing frequency + is of the same order of magnitude of the reservation period. In such cases, + setting a fixed CPU frequency results in a lower amount of deadline misses. + + 3. Scheduling Real-Time Tasks = @@ -505,6 +522,12 @@ CONTENTS 17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016. + 18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the + Linux kernel, Software: Practice and Experience, 46(6): 821-839, June + 2016. + 19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in + the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC + 2018), Pau, France, April 2018. 4. Bandwidth management -- 2.7.4
[tip:sched/core] sched/cpufreq: Rate limits for SCHED_DEADLINE
Commit-ID: e97a90f7069b740575bcb1dae86596e0484b8957 Gitweb: https://git.kernel.org/tip/e97a90f7069b740575bcb1dae86596e0484b8957 Author: Claudio Scordino AuthorDate: Tue, 13 Mar 2018 11:35:40 +0100 Committer: Thomas Gleixner CommitDate: Fri, 23 Mar 2018 22:48:22 +0100 sched/cpufreq: Rate limits for SCHED_DEADLINE When the SCHED_DEADLINE scheduling class increases the CPU utilization, it should not wait for the rate limit, otherwise it may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). Signed-off-by: Claudio Scordino Signed-off-by: Thomas Gleixner Reviewed-by: Rafael J. Wysocki Acked-by: Viresh Kumar Cc: Juri Lelli Cc: Joel Fernandes Cc: Vincent Guittot Cc: linux...@vger.kernel.org Cc: Peter Zijlstra Cc: Morten Rasmussen Cc: Patrick Bellasi Cc: Todd Kjos Cc: Dietmar Eggemann Link: https://lkml.kernel.org/r/1520937340-2755-1-git-send-email-clau...@evidence.eu.com --- kernel/sched/cpufreq_schedutil.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 89fe78ecb88c..2b124811947d 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -267,6 +267,16 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } #endif /* CONFIG_NO_HZ_COMMON */ +/* + * Make sugov_should_update_freq() ignore the rate limit when DL + * has increased the utilization. + */ +static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) +{ + if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) + sg_policy->need_freq_update = true; +} + static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { @@ -279,6 +289,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; + ignore_dl_rate_limit(sg_cpu, sg_policy); + if (!sugov_should_update_freq(sg_policy, time)) return; @@ -356,6 +368,8 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; + ignore_dl_rate_limit(sg_cpu, sg_policy); + if (sugov_should_update_freq(sg_policy, time)) { next_f = sugov_next_freq_shared(sg_cpu, time); sugov_update_commit(sg_policy, time, next_f);
[PATCH v4] cpufreq: schedutil: rate limits for SCHED_DEADLINE
When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). Signed-off-by: Claudio Scordino Acked-by: Viresh Kumar Reviewed-by: Rafael J. Wysocki CC: Rafael J. Wysocki CC: Viresh Kumar CC: Patrick Bellasi CC: Dietmar Eggemann CC: Morten Rasmussen CC: Juri Lelli CC: Vincent Guittot CC: Todd Kjos CC: Joel Fernandes CC: linux...@vger.kernel.org CC: linux-kernel@vger.kernel.org --- Changes from v3: - Specific routine renamed as ignore_dl_rate_limit() --- Changes from v2: - Rate limit ignored also in case of "fast switch" - Specific routine added --- Changes from v1: - Logic moved from sugov_should_update_freq() to sugov_update_single()/_shared() to not duplicate data structures - Rate limit not ignored in case of "fast switch" --- kernel/sched/cpufreq_schedutil.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index feb5f89..2aeb1ca 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -257,6 +257,16 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } #endif /* CONFIG_NO_HZ_COMMON */ +/* + * Make sugov_should_update_freq() ignore the rate limit when DL + * has increased the utilization. + */ +static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) +{ + if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) + sg_policy->need_freq_update = true; +} + static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { @@ -270,6 +280,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time); sg_cpu->last_update = time; + ignore_dl_rate_limit(sg_cpu, sg_policy); + if (!sugov_should_update_freq(sg_policy, time)) return; @@ -351,6 +363,8 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) raw_spin_lock(&sg_policy->update_lock); + ignore_dl_rate_limit(sg_cpu, sg_policy); + sugov_get_util(sg_cpu); sg_cpu->flags = flags; -- 2.7.4
[PATCH v3] cpufreq: schedutil: rate limits for SCHED_DEADLINE
When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). Signed-off-by: Claudio Scordino CC: Ingo Molnar CC: Patrick Bellasi CC: Dietmar Eggemann CC: Morten Rasmussen CC: Juri Lelli CC: Viresh Kumar CC: Vincent Guittot CC: Todd Kjos CC: Joel Fernandes CC: linux...@vger.kernel.org CC: linux-kernel@vger.kernel.org --- Changes from v2: - Rate limit ignored also in case of "fast switch" - Specific routine added --- Changes from v1: - Logic moved from sugov_should_update_freq() to sugov_update_single()/_shared() to not duplicate data structures - Rate limit not ignored in case of "fast switch" --- kernel/sched/cpufreq_schedutil.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 7936f54..13f9cce 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -260,6 +260,17 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } #endif /* CONFIG_NO_HZ_COMMON */ +/* + * Make sugov_should_update_freq() ignore the rate limit when DL + * has increased the utilization. + */ +static inline +void set_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy) +{ + if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) + sg_policy->need_freq_update = true; +} + static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { @@ -273,6 +284,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time); sg_cpu->last_update = time; + set_dl_rate_limit(sg_cpu, sg_policy); + if (!sugov_should_update_freq(sg_policy, time)) return; @@ -354,6 +367,8 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, raw_spin_lock(&sg_policy->update_lock); + set_dl_rate_limit(sg_cpu, sg_policy); + sugov_get_util(sg_cpu); sg_cpu->flags = flags; -- 2.7.4
Re: [PATCH v2] cpufreq: schedutil: rate limits for SCHED_DEADLINE
Dear Rafael, dear Viresh, Il 28/02/2018 12:06, Claudio Scordino ha scritto: When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). As a follow up of the previous thread, I've put some figures here: https://gist.github.com/claudioscordino/d4a10e8b3ceac419fb0c8b552db19806 In some cases, I've noticed the patch to even reduce the energy consumption (due to a mix of factors plus DL tasks entering the inactive state sooner). I've also tried to create the "ramp-up" scenario by allocating 10 DL tasks on the same core, but it didn't produce any significant increase of consumption. IMHO, the overall behavior looks better. Best regards, Claudio
[PATCH v2] cpufreq: schedutil: rate limits for SCHED_DEADLINE
When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). Signed-off-by: Claudio Scordino CC: Ingo Molnar CC: Patrick Bellasi CC: Dietmar Eggemann CC: Morten Rasmussen CC: Juri Lelli CC: Viresh Kumar CC: Vincent Guittot CC: Todd Kjos CC: Joel Fernandes CC: linux...@vger.kernel.org CC: linux-kernel@vger.kernel.org --- Changes from v1: - Logic moved from sugov_should_update_freq() to sugov_update_single()/_shared() to not duplicate data structures - Rate limit not ignored in case of "fast switch" --- kernel/sched/cpufreq_schedutil.c | 16 1 file changed, 16 insertions(+) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 7936f54..ca6ce72 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -273,6 +273,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time); sg_cpu->last_update = time; + /* +* Make sugov_should_update_freq() ignore the rate limit when DL +* has increased the utilization. +*/ + if ((cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) && + !(sg_policy->policy->fast_switch_enabled)) + sg_policy->need_freq_update = true; + if (!sugov_should_update_freq(sg_policy, time)) return; @@ -354,6 +362,14 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, raw_spin_lock(&sg_policy->update_lock); + /* +* Make sugov_should_update_freq() ignore the rate limit when DL +* has increased the utilization. +*/ + if ((cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) && + !(sg_policy->policy->fast_switch_enabled)) + sg_policy->need_freq_update = true; + sugov_get_util(sg_cpu); sg_cpu->flags = flags; -- 2.7.4
Re: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE
Il 09/02/2018 13:56, Rafael J. Wysocki ha scritto: On Fri, Feb 9, 2018 at 1:52 PM, Juri Lelli wrote: On 09/02/18 13:08, Rafael J. Wysocki wrote: On Fri, Feb 9, 2018 at 12:51 PM, Juri Lelli wrote: On 09/02/18 12:37, Rafael J. Wysocki wrote: On Fri, Feb 9, 2018 at 12:26 PM, Juri Lelli wrote: On 09/02/18 12:04, Rafael J. Wysocki wrote: On Fri, Feb 9, 2018 at 11:53 AM, Juri Lelli wrote: Hi, On 09/02/18 11:36, Rafael J. Wysocki wrote: On Friday, February 9, 2018 9:02:34 AM CET Claudio Scordino wrote: Hi Viresh, Il 09/02/2018 04:51, Viresh Kumar ha scritto: On 08-02-18, 18:01, Claudio Scordino wrote: When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline misses for tasks with low RT periods. The patch applies on top of the one recently proposed by Peter to drop the SCHED_CPUFREQ_* flags. [cut] Is it possible to (somehow) check here if the DL tasks will miss deadline if we continue to run at current frequency? And only ignore rate-limit if that is the case ? Isn't it always the case? Utilization associated to DL tasks is given by what the user said it's needed to meet a task deadlines (admission control). If that task wakes up and we realize that adding its utilization contribution is going to require a frequency change, we should _theoretically_ always do it, or it will be too late. Now, user might have asked for a bit more than what strictly required (this is usually the case to compensate for discrepancies between theory and real world, e.g. hw transition limits), but I don't think there is a way to know "how much". :/ You are right. I'm somewhat concerned about "fast switch" cases when the rate limit is used to reduce overhead. Mmm, right. I'm thinking that in those cases we could leave rate limit as is. The user should then be aware of it and consider it as proper overhead when designing her/his system. But then, isn't it the same for "non fast switch" platforms? I mean, even in the latter case we can't go faster than hw limits.. mmm, maybe the difference is that in the former case we could go as fast as theory would expect.. but we shouldn't. :) Well, in practical terms that means "no difference" IMO. :-) I can imagine that in some cases this approach may lead to better results than reducing the rate limit overall, but the general case I'm not sure about. I mean, if overriding the rate limit doesn't take place very often, then it really should make no difference overhead-wise. Now, of course, how to define "not very often" is a good question as that leads to rate-limiting the overriding of the original rate limit and that scheme may continue indefinitely ... :) My impression is that rate limit helps a lot for CFS, where the "true" utilization is not known in advance, and being too responsive might actually be counterproductive. For DEADLINE (and RT, with differences) we should always respond as quick as we can (and probably remember that a frequency transition was requested if hw was already performing one, but that's another patch) because, if we don't, a task belonging to a lower priority class might induce deadline misses in highest priority activities. E.g., a CFS task that happens to trigger a freq switch right before a DEADLINE task wakes up and needs an higher frequency to meet its deadline: if we wait for the rate limit of the CFS originated transition.. deadline miss! Fair enough, but if there's too much overhead as a result of this, you can't guarantee the deadlines to be met anyway. Indeed. I guess this only works if corner cases as the one above don't happen too often. Well, that's the point. So there is a tradeoff: do we want to allow deadlines to be missed because of excessive overhead or do we want to allow deadlines to be missed because of the rate limit. For a very few tasks, the tests have indeed shown that the approach pays off: we get a significant reduction of misses with a negligible increase of energy consumption. I still need to check what happens for a high amount of tasks, trying to reproduce the "ramp up" pattern (in which DL keeps increasing the utilization, ignoring the rate limit and adding overhead) Thanks, Claudio
Re: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE
Hi Viresh, Il 09/02/2018 04:51, Viresh Kumar ha scritto: On 08-02-18, 18:01, Claudio Scordino wrote: When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline misses for tasks with low RT periods. The patch applies on top of the one recently proposed by Peter to drop the SCHED_CPUFREQ_* flags. Signed-off-by: Claudio Scordino CC: Rafael J . Wysocki CC: Patrick Bellasi CC: Dietmar Eggemann CC: Morten Rasmussen CC: Juri Lelli CC: Viresh Kumar CC: Vincent Guittot CC: Todd Kjos CC: Joel Fernandes CC: linux...@vger.kernel.org CC: linux-kernel@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) So the previous commit was surely incorrect as it relied on comparing frequencies instead of dl-util, and freq requirements could have even changed due to CFS. You're right. The very original patch (not posted) added a specific SCHED_CPUFREQ flag to let the scheduling class ask for ignoring the rate limit. However, polluting the API with further flags is not such a good approach. The next patches didn't introduce such flag, but were incorrect. diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index b0bd77d..d8dcba2 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -74,7 +74,10 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu); / Governor internals ***/ -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, +u64 time, +struct sugov_cpu *sg_cpu_old, +struct sugov_cpu *sg_cpu_new) { s64 delta_ns; @@ -111,6 +114,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return true; } + /* Ignore rate limit when DL increased utilization. */ + if (sg_cpu_new->util_dl > sg_cpu_old->util_dl) + return true; + Changing the frequency has a penalty, specially in the ARM world (and that's where you are testing your stuff). I am worried that we will have (corner) cases where we will waste a lot of time changing the frequencies. For example (I may be wrong here), what if 10 small DL tasks are queued one after the other? The util will keep on changing and so will the frequency ? There may be more similar cases ? I forgot to say that I've not observed any relevant increase of the energy consumption (measured through a Baylibre Cape). However, the tests had a very small number of RT tasks. If I'm not wrong, at the hardware level we do have a physical rate limit (as we cannot trigger a frequency update when there is one already on-going). Don't know if this could somehow mitigate such effect. Anyway, I'll repeat the tests with a considerable amount of RT tasks to check if I can reproduce such "ramp up" situation. Depending on the energy results, we may have to choose between meeting more RT deadlines and consuming less energy. Is it possible to (somehow) check here if the DL tasks will miss deadline if we continue to run at current frequency? And only ignore rate-limit if that is the case ? I need to think further about it. delta_ns = time - sg_policy->last_freq_update_time; return delta_ns >= sg_policy->freq_update_delay_ns; } @@ -271,6 +278,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); + struct sugov_cpu sg_cpu_old = *sg_cpu; Not really a big deal, but this structure is 80 bytes on ARM64, why copy everything when what we need is just 8 bytes ? I didn't want to add deadline-specific code into the sugov_should_update_freq() signature as it should remain independent from the scheduling classes. In my opinion, the best approach would be to group util_cfs and util_dl in a struct within sugov_cpu and pass that struct to sugov_should_update_freq(). Thanks for your comments. Claudio
[PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE
When the SCHED_DEADLINE scheduling class increases the CPU utilization, we should not wait for the rate limit, otherwise we may miss some deadline. Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline misses for tasks with low RT periods. The patch applies on top of the one recently proposed by Peter to drop the SCHED_CPUFREQ_* flags. Signed-off-by: Claudio Scordino CC: Rafael J . Wysocki CC: Patrick Bellasi CC: Dietmar Eggemann CC: Morten Rasmussen CC: Juri Lelli CC: Viresh Kumar CC: Vincent Guittot CC: Todd Kjos CC: Joel Fernandes CC: linux...@vger.kernel.org CC: linux-kernel@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index b0bd77d..d8dcba2 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -74,7 +74,10 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu); / Governor internals ***/ -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, +u64 time, +struct sugov_cpu *sg_cpu_old, +struct sugov_cpu *sg_cpu_new) { s64 delta_ns; @@ -111,6 +114,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return true; } + /* Ignore rate limit when DL increased utilization. */ + if (sg_cpu_new->util_dl > sg_cpu_old->util_dl) + return true; + delta_ns = time - sg_policy->last_freq_update_time; return delta_ns >= sg_policy->freq_update_delay_ns; } @@ -271,6 +278,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); + struct sugov_cpu sg_cpu_old = *sg_cpu; struct sugov_policy *sg_policy = sg_cpu->sg_policy; unsigned long util, max; unsigned int next_f; @@ -279,7 +287,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; - if (!sugov_should_update_freq(sg_policy, time)) + if (!sugov_should_update_freq(sg_policy, time, &sg_cpu_old, sg_cpu)) return; busy = sugov_cpu_is_busy(sg_cpu); @@ -350,6 +358,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); + struct sugov_cpu sg_cpu_old = *sg_cpu; struct sugov_policy *sg_policy = sg_cpu->sg_policy; unsigned int next_f; @@ -359,7 +368,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; - if (sugov_should_update_freq(sg_policy, time)) { + if (sugov_should_update_freq(sg_policy, time, &sg_cpu_old, sg_cpu)) { next_f = sugov_next_freq_shared(sg_cpu, time); sugov_update_commit(sg_policy, time, next_f); } -- 2.7.4
Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
Hi Patrick, Il 06/02/2018 19:36, Patrick Bellasi ha scritto: On 06-Feb 19:14, Claudio Scordino wrote: Hi Patrick, At first glance, your proposal below makes to make sense. However, I'm wondering if we cannot get it working using rq->dl's provided information instead of flags? Yes, we can use the value of rq->dl to check if there has been an increase of the deadline utilization. Even if schedutil might have been triggered by a different scheduling class, the effect should be almost the same. Below a potential patch. I've kept all frequency update decisions in a single point (i.e. sugov_should_update_freq). Not yet tested (waiting for further comments). I have a todo list entry to backport/test Peter's series on Android... will add this patch too. Thanks. Please discard my latest patch, as the tests on Odroid have shown performance regressions (likely due to sugov_next_freq_shared being called more often) Never mind. I have already tested another (even simpler) patch. Sending soon as a new thread on LKML. Many thanks and best regards, Claudio
Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
Hi Patrick, Il 06/02/2018 16:43, Patrick Bellasi ha scritto: Hi Claudio, On 06-Feb 11:55, Claudio Scordino wrote: Hi Peter, Il 20/12/2017 16:30, Peter Zijlstra ha scritto: So I ended up with the below (on top of Juri's cpufreq-dl patches). It compiles, but that's about all the testing it had. --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h [..] @@ -188,17 +187,23 @@ static void sugov_get_util(struct sugov_ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) { + unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl; + struct rq *rq = cpu_rq(sg_cpu->cpu); + + if (rq->rt.rt_nr_running) + util = sg_cpu->max; + /* * Ideally we would like to set util_dl as min/guaranteed freq and * util_cfs + util_dl as requested freq. However, cpufreq is not yet * ready for such an interface. So, we only do the latter for now. */ - return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max); + return min(util, sg_cpu->max); } [...] What is the status of this patch ? I couldn't find it on the tip/queue repositories. BTW, I wonder if we actually want to remove also the information about the scheduling class who triggered the frequency change. Removing flags was the main goal of the patch, since they represents mainly duplicated information which scheduling classes already know. This was making flags update error prone and difficult to keep aligned with existing scheduling classes info. This prevents us from adopting class-specific behaviors. In Peter's proposal he replaces flags with checks like: if (rq->rt.rt_nr_running) For example, we might want to skip the rate limits when deadline asks for an increase of frequency, as shown in the patch below. In this case, we could just remove the flags from sugov_cpu, but leave the defines and the argument for sugov_update_*() At first glance, your proposal below makes to make sense. However, I'm wondering if we cannot get it working using rq->dl's provided information instead of flags? Yes, we can use the value of rq->dl to check if there has been an increase of the deadline utilization. Even if schedutil might have been triggered by a different scheduling class, the effect should be almost the same. Below a potential patch. I've kept all frequency update decisions in a single point (i.e. sugov_should_update_freq). Not yet tested (waiting for further comments). Thanks, Claudio From 49a6eec60574ae93297406d40155e6ce4113e442 Mon Sep 17 00:00:00 2001 From: Claudio Scordino Date: Tue, 6 Feb 2018 18:42:23 +0100 Subject: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE When the SCHED_DEADLINE scheduling class asks to increase the CPU frequency, we should not wait the rate limit, otherwise we may miss some deadline. This patch moves all frequency update decisions to a single point: sugov_should_update_freq(). In addition, it ignores the rate limit whenever there is an increase of the CPU frequency given by an increase of the deadline utilization. Signed-off-by: Claudio Scordino --- kernel/sched/cpufreq_schedutil.c | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index b0bd77d..e8504f5 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -74,7 +74,11 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu); / Governor internals ***/ -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, +u64 time, +struct sugov_cpu *sg_cpu_old, +struct sugov_cpu *sg_cpu_new, +unsigned int next_freq) { s64 delta_ns; @@ -111,6 +115,14 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return true; } + /* +* Ignore rate limit when DL asked to increase the CPU frequency, +* otherwise we may miss some deadline. +*/ + if ((next_freq > sg_policy->next_freq) && + (sg_cpu_new->util_dl > sg_cpu_old->util_dl)) + return true; + delta_ns = time - sg_policy->last_freq_update_time; return delta_ns >= sg_policy->freq_update_delay_ns; } @@ -271,6 +283,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, unsigned int flags) { struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); + struct sugov_cpu sg_cpu_old = *sg_cpu; struct su
Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
gt;flags & SCHED_CPUFREQ_RT) - return policy->cpuinfo.max_freq; j_max = j_sg_cpu->max; j_util = sugov_aggregate_util(j_sg_cpu); @@ -357,17 +356,11 @@ static void sugov_update_shared(struct u raw_spin_lock(&sg_policy->update_lock); sugov_get_util(sg_cpu); - sg_cpu->flags = flags; - - sugov_set_iowait_boost(sg_cpu, time); + sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time; if (sugov_should_update_freq(sg_policy, time)) { - if (flags & SCHED_CPUFREQ_RT) - next_f = sg_policy->policy->cpuinfo.max_freq; - else - next_f = sugov_next_freq_shared(sg_cpu, time); - + next_f = sugov_next_freq_shared(sg_cpu, time); sugov_update_commit(sg_policy, time, next_f); } @@ -678,7 +671,6 @@ static int sugov_start(struct cpufreq_po memset(sg_cpu, 0, sizeof(*sg_cpu)); sg_cpu->cpu = cpu; sg_cpu->sg_policy = sg_policy; - sg_cpu->flags = 0; sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq; } --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -87,7 +87,7 @@ void __add_running_bw(u64 dl_bw, struct SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */ SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw); /* kick cpufreq (see the comment in kernel/sched/sched.h). */ - cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL); + cpufreq_update_util(rq_of_dl_rq(dl_rq), 0); } static inline @@ -101,7 +101,7 @@ void __sub_running_bw(u64 dl_bw, struct if (dl_rq->running_bw > old) dl_rq->running_bw = 0; /* kick cpufreq (see the comment in kernel/sched/sched.h). */ - cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL); + cpufreq_update_util(rq_of_dl_rq(dl_rq), 0); } static inline --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq if (unlikely((s64)delta_exec <= 0)) return; - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ - cpufreq_update_util(rq, SCHED_CPUFREQ_RT); - schedstat_set(curr->se.statistics.exec_max, max(curr->se.statistics.exec_max, delta_exec)); @@ -1003,6 +1000,9 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq) sub_nr_running(rq, rt_rq->rt_nr_running); rt_rq->rt_queued = 0; + + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ + cpufreq_update_util(rq, 0); } static void @@ -1019,6 +1019,9 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq) add_nr_running(rq, rt_rq->rt_nr_running); rt_rq->rt_queued = 1; + + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ + cpufreq_update_util(rq, 0); } #if defined CONFIG_SMP What is the status of this patch ? I couldn't find it on the tip/queue repositories. BTW, I wonder if we actually want to remove also the information about the scheduling class who triggered the frequency change. This prevents us from adopting class-specific behaviors. For example, we might want to skip the rate limits when deadline asks for an increase of frequency, as shown in the patch below. In this case, we could just remove the flags from sugov_cpu, but leave the defines and the argument for sugov_update_*() Best regards, Claudio From ed13fa5a8f93a43f8ff8f7d354b18c0031df482c Mon Sep 17 00:00:00 2001 From: Claudio Scordino Date: Wed, 27 Sep 2017 17:16:36 +0200 Subject: [PATCH RFC] cpufreq: schedutil: rate limits for SCHED_DEADLINE When the SCHED_DEADLINE scheduling class asks to increase CPU frequency, we should not wait the rate limit, otherwise we may miss some deadline. The patch just ignores the limit whenever SCHED_DEADLINE asks for a higher CPU frequency. Signed-off-by: Claudio Scordino --- kernel/sched/cpufreq_schedutil.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index dd062a1..5027ab1 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -75,7 +75,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu); / Governor internals ***/ -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time, +unsigned int next_freq, unsigned int flags) { s64 delta_ns; @@ -112,6 +113,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) return true; } +
Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
Hi Peter, Il 31/12/2017 10:43, Claudio Scordino ha scritto: Hi all, Il 20/12/2017 16:56, Peter Zijlstra ha scritto: On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote: So I ended up with the below (on top of Juri's cpufreq-dl patches). It compiles, but that's about all the testing it had. Should all be available at: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core compile tested only, I suppose the 0day bot will let me know soon enough :-) I'm going to do some testing from Tuesday. The rebase looks good to me. Tested on Odroid XU4 (ARM big.LITTLE), no regressions found. Many thanks and best regards, Claudio
Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
Hi all, Il 20/12/2017 16:56, Peter Zijlstra ha scritto: On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote: So I ended up with the below (on top of Juri's cpufreq-dl patches). It compiles, but that's about all the testing it had. Should all be available at: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core compile tested only, I suppose the 0day bot will let me know soon enough :-) I'm going to do some testing from Tuesday. Happy new year, Claudio
[PATCH v3] sched/deadline: runtime overrun signal
From: Juri Lelli This patch adds the possibility of getting the delivery of a SIGXCPU signal whenever there is a runtime overrun. The request is done through the sched_flags field within the sched_attr structure. Forward port of https://lkml.org/lkml/2009/10/16/170 Signed-off-by: Juri Lelli Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Tested-by: Mathieu Poirier Cc: Tommaso Cucinotta CC: Peter Zijlstra CC: Ingo Molnar CC: Thomas Gleixner CC: LKML CC: linux-rt-users --- Changes from v2: - do not use a dubious signed bitfield for the dl_overrun flag Changes from v1: - fix: do not send the signal in case of yield (reported by Mathieu Poirier) - removed the deadline miss signal because not needed in most of the cases --- include/linux/sched.h | 4 include/uapi/linux/sched.h | 1 + kernel/sched/core.c| 3 ++- kernel/sched/deadline.c| 7 +++ kernel/time/posix-cpu-timers.c | 20 5 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 21991d6..a62a2f1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -472,11 +472,15 @@ struct sched_dl_entity { * has not been executed yet. This flag is useful to avoid race * conditions between the inactive timer handler and the wakeup * code. +* +* @dl_overrun tells if the task asked to be informed about runtime +* overruns. */ unsigned intdl_throttled : 1; unsigned intdl_boosted: 1; unsigned intdl_yielded: 1; unsigned intdl_non_contending : 1; + unsigned intdl_overrun: 1; /* * Bandwidth enforcement timer. Each -deadline task has its diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 30a9e51..beb8ecd 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -49,5 +49,6 @@ */ #define SCHED_FLAG_RESET_ON_FORK 0x01 #define SCHED_FLAG_RECLAIM 0x02 +#define SCHED_FLAG_DL_OVERRUN 0x04 #endif /* _UAPI_LINUX_SCHED_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 75554f3..5bb67af 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4041,7 +4041,8 @@ static int __sched_setscheduler(struct task_struct *p, } if (attr->sched_flags & - ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM)) + ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM | + SCHED_FLAG_DL_OVERRUN)) return -EINVAL; /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 2473736..72554d1 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1155,6 +1155,12 @@ static void update_curr_dl(struct rq *rq) throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled = 1; + + /* If requested, inform the user about runtime overruns. */ + if (dl_runtime_exceeded(dl_se) && + (dl_se->flags & SCHED_FLAG_DL_OVERRUN)) + dl_se->dl_overrun = 1; + __dequeue_task_dl(rq, curr, 0); if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); @@ -2566,6 +2572,7 @@ void __dl_clear_params(struct task_struct *p) dl_se->dl_throttled = 0; dl_se->dl_yielded = 0; dl_se->dl_non_contending = 0; + dl_se->dl_overrun = 0; } bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 1f27887a..48281d9 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "posix-timers.h" @@ -791,6 +792,16 @@ check_timers_list(struct list_head *timers, return 0; } +static inline void check_dl_overrun(struct task_struct *tsk) +{ + if (tsk->dl.dl_overrun) { + tsk->dl.dl_overrun = 0; + pr_info("runtime overrun: %s[%d]\n", + tsk->comm, task_pid_nr(tsk)); + __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk); + } +} + /* * Check for any per-thread CPU timers that have fired and move them off * the tsk->cpu_timers[N] list onto the firing list. Here we update the @@ -804,6 +815,9 @@ static void check_thread_timers(struct task_struct *tsk, u64 expires; unsigned long soft; + if (dl_task(tsk)) + check_dl_overrun(tsk); + /* * If cputime_expires is zero, then there a
[tip:sched/urgent] sched/deadline: Fix the description of runtime accounting in the documentation
Commit-ID: 5c0342ca7ef17220d8dd2da68d0d349c26ab19df Gitweb: https://git.kernel.org/tip/5c0342ca7ef17220d8dd2da68d0d349c26ab19df Author: Claudio Scordino AuthorDate: Tue, 14 Nov 2017 12:19:26 +0100 Committer: Ingo Molnar CommitDate: Thu, 16 Nov 2017 09:00:35 +0100 sched/deadline: Fix the description of runtime accounting in the documentation Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Acked-by: Daniel Bristot de Oliveira Acked-by: Peter Zijlstra Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Mathieu Poirier Cc: Thomas Gleixner Cc: Tommaso Cucinotta Cc: linux-...@vger.kernel.org Link: http://lkml.kernel.org/r/1510658366-28995-1-git-send-email-clau...@evidence.eu.com Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index e89e36e..8ce78f8 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -204,10 +204,17 @@ CONTENTS It does so by decrementing the runtime of the executing task Ti at a pace equal to - dq = -max{ Ui, (1 - Uinact) } dt + dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt - where Uinact is the inactive utilization, computed as (this_bq - running_bw), - and Ui is the bandwidth of task Ti. + where: + + - Ui is the bandwidth of task Ti; + - Umax is the maximum reclaimable utilization (subjected to RT throttling +limits); + - Uinact is the (per runqueue) inactive utilization, computed as +(this_bq - running_bw); + - Uextra is the (per runqueue) extra reclaimable utilization +(subjected to RT throttling limits). Let's now see a trivial example of two deadline tasks with runtime equal
[PATCH v2] sched/deadline: runtime overrun signal
From: Juri Lelli This patch adds the possibility of getting the delivery of a SIGXCPU signal whenever there is a runtime overrun. The request is done through the sched_flags field within the sched_attr structure. Forward port of https://lkml.org/lkml/2009/10/16/170 Signed-off-by: Juri Lelli Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Tested-by: Mathieu Poirier Cc: Tommaso Cucinotta CC: Peter Zijlstra CC: Ingo Molnar CC: Thomas Gleixner --- Changes from v1: - fix: do not send the signal in case of yield (reported by Mathieu) - removed the deadline miss signal because not needed in most of the cases --- include/linux/sched.h | 4 include/uapi/linux/sched.h | 1 + kernel/sched/core.c| 3 ++- kernel/sched/deadline.c| 7 +++ kernel/time/posix-cpu-timers.c | 20 5 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a5dc7c9..2ec6014 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -472,11 +472,15 @@ struct sched_dl_entity { * has not been executed yet. This flag is useful to avoid race * conditions between the inactive timer handler and the wakeup * code. +* +* @dl_overrun tells if the task asked to be informed about runtime +* overruns. */ int dl_throttled : 1; int dl_boosted: 1; int dl_yielded: 1; int dl_non_contending : 1; + int dl_overrun: 1; /* * Bandwidth enforcement timer. Each -deadline task has its diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 30a9e51..beb8ecd 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -49,5 +49,6 @@ */ #define SCHED_FLAG_RESET_ON_FORK 0x01 #define SCHED_FLAG_RECLAIM 0x02 +#define SCHED_FLAG_DL_OVERRUN 0x04 #endif /* _UAPI_LINUX_SCHED_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5b82a00..5bd2498f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4040,7 +4040,8 @@ static int __sched_setscheduler(struct task_struct *p, } if (attr->sched_flags & - ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM)) + ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM | + SCHED_FLAG_DL_OVERRUN)) return -EINVAL; /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f349f7e..304087c 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1155,6 +1155,12 @@ static void update_curr_dl(struct rq *rq) throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled = 1; + + /* If requested, inform the user about runtime overruns. */ + if (dl_runtime_exceeded(dl_se) && + (dl_se->flags & SCHED_FLAG_DL_OVERRUN)) + dl_se->dl_overrun = 1; + __dequeue_task_dl(rq, curr, 0); if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); @@ -2566,6 +2572,7 @@ void __dl_clear_params(struct task_struct *p) dl_se->dl_throttled = 0; dl_se->dl_yielded = 0; dl_se->dl_non_contending = 0; + dl_se->dl_overrun = 0; } bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 1f27887..48281d9 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "posix-timers.h" @@ -791,6 +792,16 @@ check_timers_list(struct list_head *timers, return 0; } +static inline void check_dl_overrun(struct task_struct *tsk) +{ + if (tsk->dl.dl_overrun) { + tsk->dl.dl_overrun = 0; + pr_info("runtime overrun: %s[%d]\n", + tsk->comm, task_pid_nr(tsk)); + __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk); + } +} + /* * Check for any per-thread CPU timers that have fired and move them off * the tsk->cpu_timers[N] list onto the firing list. Here we update the @@ -804,6 +815,9 @@ static void check_thread_timers(struct task_struct *tsk, u64 expires; unsigned long soft; + if (dl_task(tsk)) + check_dl_overrun(tsk); + /* * If cputime_expires is zero, then there are no active * per thread CPU timers. @@ -906,6 +920,9 @@ static void check_process_timers(struct task_struct *tsk,
[PATCH v2] sched/deadline: fix runtime accounting in documentation
Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Acked-by: Daniel Bristot de Oliveira CC: Jonathan Corbet CC: "Peter Zijlstra (Intel)" CC: Ingo Molnar CC: linux-...@vger.kernel.org Cc: Tommaso Cucinotta Cc: Mathieu Poirier --- Documentation/scheduler/sched-deadline.txt | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index e89e36e..8ce78f8 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -204,10 +204,17 @@ CONTENTS It does so by decrementing the runtime of the executing task Ti at a pace equal to - dq = -max{ Ui, (1 - Uinact) } dt + dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt - where Uinact is the inactive utilization, computed as (this_bq - running_bw), - and Ui is the bandwidth of task Ti. + where: + + - Ui is the bandwidth of task Ti; + - Umax is the maximum reclaimable utilization (subjected to RT throttling +limits); + - Uinact is the (per runqueue) inactive utilization, computed as +(this_bq - running_bw); + - Uextra is the (per runqueue) extra reclaimable utilization +(subjected to RT throttling limits). Let's now see a trivial example of two deadline tasks with runtime equal -- 2.7.4
[PATCH] sched/deadline: runtime overrun and deadline miss signals
From: Juri Lelli This patch adds the possibility to get the delivery of two signals whenever there is a runtime overrun or a deadline miss, respectively. The request is done through the sched_flags field within the sched_attr structure. Forward port of https://lkml.org/lkml/2009/10/16/170 Signed-off-by: Juri Lelli Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Cc: Tommaso Cucinotta CC: Peter Zijlstra CC: Ingo Molnar CC: Thomas Gleixner Cc: Mathieu Poirier --- include/linux/sched.h | 8 include/uapi/linux/sched.h | 2 ++ kernel/sched/core.c| 3 ++- kernel/sched/deadline.c| 13 + kernel/time/posix-cpu-timers.c | 26 ++ 5 files changed, 51 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 0f897df..285d1b4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -473,11 +473,19 @@ struct sched_dl_entity { * has not been executed yet. This flag is useful to avoid race * conditions between the inactive timer handler and the wakeup * code. +* +* @dl_overrun tells if the task asked to be informed about budget +* overruns. +* +* @dl_dmiss tells if the task asked to be informed about deadline +* misses. */ int dl_throttled : 1; int dl_boosted: 1; int dl_yielded: 1; int dl_non_contending : 1; + int dl_overrun: 1; + int dl_dmiss : 1; /* * Bandwidth enforcement timer. Each -deadline task has its diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index e2a6c7b..544be0c 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -48,5 +48,7 @@ */ #define SCHED_FLAG_RESET_ON_FORK 0x01 #define SCHED_FLAG_RECLAIM 0x02 +#define SCHED_FLAG_DL_OVERRUN 0x04 +#define SCHED_FLAG_DL_DMISS0x08 #endif /* _UAPI_LINUX_SCHED_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 97227df..d79df7a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4041,7 +4041,8 @@ static int __sched_setscheduler(struct task_struct *p, } if (attr->sched_flags & - ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM)) + ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM | + SCHED_FLAG_DL_OVERRUN | SCHED_FLAG_DL_DMISS)) return -EINVAL; /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 8d1b946..8c1aa61 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1154,6 +1154,17 @@ static void update_curr_dl(struct rq *rq) throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled = 1; + + /* +* If requested, inform the user about deadline misses and/or +* runtime overruns. +*/ + if (unlikely(dl_se->flags & SCHED_FLAG_DL_DMISS && +dl_time_before(dl_se->deadline, rq_clock(rq + dl_se->dl_dmiss = 1; + if (dl_se->flags & SCHED_FLAG_DL_OVERRUN) + dl_se->dl_overrun = 1; + __dequeue_task_dl(rq, curr, 0); if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); @@ -2565,6 +2576,8 @@ void __dl_clear_params(struct task_struct *p) dl_se->dl_throttled = 0; dl_se->dl_yielded = 0; dl_se->dl_non_contending = 0; + dl_se->dl_overrun = 0; + dl_se->dl_dmiss = 0; } bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 8585ad6..f3616c5 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "posix-timers.h" @@ -790,6 +791,22 @@ check_timers_list(struct list_head *timers, return 0; } +static inline void check_dl_overrun(struct task_struct *tsk) +{ + if (tsk->dl.dl_overrun) { + tsk->dl.dl_overrun = 0; + pr_info("runtime overrun: %s[%d]\n", + tsk->comm, task_pid_nr(tsk)); + __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk); + } + if (tsk->dl.dl_dmiss) { + tsk->dl.dl_dmiss = 0; + pr_info("scheduling deadline miss: %s[%d]\n", + tsk->comm, task_
[PATCH] sched/deadline: fix runtime accounting in documentation
Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni --- Documentation/scheduler/sched-deadline.txt | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index e89e36e..79f40c6 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -204,10 +204,16 @@ CONTENTS It does so by decrementing the runtime of the executing task Ti at a pace equal to - dq = -max{ Ui, (1 - Uinact) } dt + dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt - where Uinact is the inactive utilization, computed as (this_bq - running_bw), - and Ui is the bandwidth of task Ti. + where: + + - Ui is the bandwidth of task Ti; + - Umax is the maximum reclaimable utilization (subject to RT limits); + - Uinact is the (per runqueue) inactive utilization, computed as +(this_bq - running_bw); + - Uextra is the (per runqueue) extra reclaimable utilization +(subject to RT limits). Let's now see a trivial example of two deadline tasks with runtime equal -- 2.7.4
Re: [PATCH] sched/deadline: add GRUB bw change tracepoints
Ouch, you'right. Now that I recall, you provided the first draft for internal debugging but it was Luca who agreed to add it to the patchset. Sorry for the mistake guys. Best, Claudio 2017-07-04 15:12 GMT+02:00 Claudio Scordino : > Ouch, you'right. > > Now that I recall, you provided the first draft for internal debugging but > it was Luca who agreed to add it to the patchset. Sorry for the mistake > guys. > > Best, > > Claudio > > > Il 04/lug/2017 14:49, "Juri Lelli" ha scritto: > > Hi Claudio, > > On 04/07/17 14:29, Claudio Scordino wrote: >> This patch adds a tracepoint for tracing the total and running >> bandwidths of GRUB's runqueue. >> >> Signed-off-by: Claudio Scordino >> Signed-off-by: Juri Lelli > > Don't remember signing this off. > Or if I did it was maybe for internal debugging? > > Best, > > - Juri > > -- Claudio Scordino, Ph.D. Project Manager - Funded research projects Evidence Srl Via Carducci 56 56010 S.Giuliano Terme - Pisa - Italy Phone: +39 050 99 11 122 Mobile: + 39 393 811 7491 Fax: +39 050 99 10 812 http://www.evidence.eu.com
[PATCH] sched/deadline: add GRUB bw change tracepoints
This patch adds a tracepoint for tracing the total and running bandwidths of GRUB's runqueue. Signed-off-by: Claudio Scordino Signed-off-by: Juri Lelli Signed-off-by: Luca Abeni --- include/trace/events/sched.h | 32 kernel/sched/deadline.c | 11 +++ 2 files changed, 43 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index ae1409f..050fcb2 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -545,6 +545,38 @@ TRACE_EVENT(sched_swap_numa, __entry->dst_cpu, __entry->dst_nid) ); +DECLARE_EVENT_CLASS(sched_dl_grub_template, + + TP_PROTO(u64 this_bw, u64 running_bw, unsigned int cpu_id), + + TP_ARGS(this_bw, running_bw, cpu_id), + + TP_STRUCT__entry( + __field(u64,this_bw) + __field(u64,running_bw) + __field(u32,cpu_id) + ), + + TP_fast_assign( + __entry->this_bw = this_bw; + __entry->running_bw = running_bw; + __entry->cpu_id = cpu_id; + ), + + TP_printk("total_bw=%llu running_bw=%llu cpu_id=%lu", + (unsigned long long)__entry->this_bw, + (unsigned long long)__entry->running_bw, + (unsigned long)__entry->cpu_id) +); + + +DEFINE_EVENT(sched_dl_grub_template, sched_dl_grub, + + TP_PROTO(u64 this_bw, u64 running_bw, unsigned int cpu_id), + + TP_ARGS(this_bw, running_bw, cpu_id) +); + /* * Tracepoint for waking a polling cpu without an IPI. */ diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index a84299f..ae5c7ef 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -17,6 +17,7 @@ #include "sched.h" #include +#include #include struct dl_bandwidth def_dl_bandwidth; @@ -85,6 +86,8 @@ void add_running_bw(u64 dl_bw, struct dl_rq *dl_rq) dl_rq->running_bw += dl_bw; SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */ SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw); + trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw, + rq_of_dl_rq(dl_rq)->cpu); } static inline @@ -97,6 +100,8 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq) SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */ if (dl_rq->running_bw > old) dl_rq->running_bw = 0; + trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw, + rq_of_dl_rq(dl_rq)->cpu); } static inline @@ -107,6 +112,9 @@ void add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq) lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock); dl_rq->this_bw += dl_bw; SCHED_WARN_ON(dl_rq->this_bw < old); /* overflow */ + trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw, + rq_of_dl_rq(dl_rq)->cpu); + } static inline @@ -120,6 +128,9 @@ void sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq) if (dl_rq->this_bw > old) dl_rq->this_bw = 0; SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw); + trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw, + rq_of_dl_rq(dl_rq)->cpu); + } void dl_change_utilization(struct task_struct *p, u64 new_bw) -- 2.7.4
[tip:sched/core] sched/deadline: Add documentation about GRUB reclaiming
Commit-ID: ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7 Gitweb: http://git.kernel.org/tip/ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7 Author: Claudio Scordino AuthorDate: Thu, 18 May 2017 22:13:37 +0200 Committer: Ingo Molnar CommitDate: Thu, 8 Jun 2017 10:31:56 +0200 sched/deadline: Add documentation about GRUB reclaiming This patch adds the documentation about the GRUB reclaiming algorithm, adding a few details discussed in list. Signed-off-by: Claudio Scordino Signed-off-by: Luca Abeni Signed-off-by: Peter Zijlstra (Intel) Cc: Daniel Bristot de Oliveira Cc: Joel Fernandes Cc: Juri Lelli Cc: Linus Torvalds Cc: Mathieu Poirier Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Tommaso Cucinotta Link: http://lkml.kernel.org/r/1495138417-6203-11-git-send-email-luca.ab...@santannapisa.it Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 168 + 1 file changed, 168 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index cbc1b46..e89e36e 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -7,6 +7,8 @@ CONTENTS 0. WARNING 1. Overview 2. Scheduling algorithm + 2.1 Main algorithm + 2.2 Bandwidth reclaiming 3. Scheduling Real-Time Tasks 3.1 Definitions 3.2 Schedulability Analysis for Uniprocessor Systems @@ -44,6 +46,9 @@ CONTENTS 2. Scheduling algorithm == +2.1 Main algorithm +-- + SCHED_DEADLINE uses three parameters, named "runtime", "period", and "deadline", to schedule tasks. A SCHED_DEADLINE task should receive "runtime" microseconds of execution time every "period" microseconds, and @@ -113,6 +118,160 @@ CONTENTS remaining runtime = remaining runtime + runtime +2.2 Bandwidth reclaiming + + + Bandwidth reclaiming for deadline tasks is based on the GRUB (Greedy + Reclamation of Unused Bandwidth) algorithm [15, 16, 17] and it is enabled + when flag SCHED_FLAG_RECLAIM is set. + + The following diagram illustrates the state names for tasks handled by GRUB: + + + (d)| Active | + ->|| + | | Contending | + | + |A | + -- | | + | | | | + | Inactive | |(b) | (a) + | | | | + -- | | + A| V + | + | | Active | + --| Non| + (c)| Contending | + + + A task can be in one of the following states: + + - ActiveContending: if it is ready for execution (or executing); + + - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag +time; + + - Inactive: if it is blocked and has surpassed the 0-lag time. + + State transitions: + + (a) When a task blocks, it does not become immediately inactive since its + bandwidth cannot be immediately reclaimed without breaking the + real-time guarantees. It therefore enters a transitional state called + ActiveNonContending. The scheduler arms the "inactive timer" to fire at + the 0-lag time, when the task's bandwidth can be reclaimed without + breaking the real-time guarantees. + + The 0-lag time for a task entering the ActiveNonContending state is + computed as + +(runtime * dl_period) + deadline - - + dl_runtime + + where runtime is the remaining runtime, while dl_runtime and dl_period + are the reservation parameters. + + (b) If the task wakes up before the inactive timer fires, the task re-enters + the ActiveContending state and the "inactive timer" is canceled. + In addition, if the task wakes up on a different runqueue, then + the task's utilization must be removed from the previous runqueue's active + utilization and must be added to the new runqueue's active utilization. + In order to avoid races between a task waking up on a runqueue while the + "inactive timer" is running on a different CPU, the "dl_non_contending" + flag is used to indicate that a task is not on a runqueue but is active + (so, the flag is set when the task blocks and is cleared when the + "inactive timer" fires or when the task wakes up). + + (c) When the "inactive timer" fires, the task enters the Inactive state
Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization
Hi guys, 2017-03-27 10:20 GMT+02:00 Luca Abeni : > On Fri, 24 Mar 2017 22:31:46 -0400 > Steven Rostedt wrote: > >> On Fri, 24 Mar 2017 22:47:15 +0100 >> luca abeni wrote: >> >> > Ok... Since I am not good at ascii art, would it be ok to add a >> > textual description? If yes, I'll add a comment like: >> > " >> > The utilization of a task is added to the runqueue's active >> > utilization when the task becomes active (is enqueued in the >> > runqueue), and is removed when the task becomes inactive. A task >> > does not become immediately inactive when it blocks, but becomes >> > inactive at the so called "0 lag time"; so, we setup the "inactive >> > timer" to fire at the "0 lag time". When the "inactive timer" >> > fires, the task utilization is removed from the runqueue's active >> > utilization. If the task wakes up again on the same runqueue before >> > the "0 lag time", the active utilization must not be changed and >> > the "inactive timer" must be cancelled. If the task wakes up again >> > on a different runqueue before the "0 lag time", then the task's >> > utilization must be removed from the previous runqueue's active >> > utilization and must be added to the new runqueue's active >> > utilization. In order to avoid races between a task waking up on a >> > runqueue while the "inactive timer" is running on a different CPU, >> > the "dl_non_contending" flag is used to indicate that a task is not >> > on a runqueue but is active (so, the flag is set when the task >> > blocks and is cleared when the "inactive timer" fires or when the >> > task wakes up). >> >> Sure, the above is great if you never want anyone to read it ;) >> >> Can you please break it up a little. My head starts to spin by the >> third line down. > > Ok... Maybe finding a clean and understandable way to explain the > above sentence is something that can be done at the OSPM summit? What about adding the following text to Documentation/ ? Does it make things clearer ? Cheers, Claudio The following figure illustrates the state names of the GRUB algorithm: (d)| Active | ->|| | | Contending | | |A | -- | | | | | | | Inactive | |(b) | (a) | | | | -- | | A| V | | | Active | --| Non| (c)| Contending | A task can be in one of the following states: - ActiveContending: if it is ready for execution (or executing); - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag time; - Inactive: if it is blocked and has surpassed the 0-lag time. For each runqueue, the algorithm GRUB keeps track of two different bandwidths: - Active bandwidth (running_bw): this is the sum of the bandwidths of all tasks in active state (i.e., ActiveContending or ActiveNonContending); - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the runqueue, including the tasks in Inactive state. State transitions: (a) When a task blocks, it does not become immediately inactive since its bandwidth cannot be immediately reclaimed without breaking the real-time guarantees. It therefore enters a transitional state called ActiveNonContending. The scheduler arms the "inactive timer" to fire at the 0-lag time, when the task's bandwidth can be reclaimed without breaking the real-time guarantees. (b) If the task wakes up before the inactive timer fires, the task re-enters the ActiveContending state and the "inactive timer" is canceled. In addition, if the task wakes up on a different runqueue, then the task's utilization must be removed from the previous runqueue's active utilization and must be added to the new runqueue's active utilization. In order to avoid races between a task waking up on a runqueue while the "inactive timer" is running on a different CPU, the "dl_non_contending" flag is used to indicate that a task is not on a runqueue but is active (so, the flag is set when the task blocks and is cleared when the "inactive timer" fires or when the task wakes up). (c) When the "inactive timer" fires, the task enters the Inactive state and its utilization is removed from the runqueue's active utilization. (d) When an inactive task wakes up, it enters the ActiveContending state and its utilization is added to the active utilization of the runqueue where it has been enqueued. The algorithm reclaims the bandwidth of the tasks in Inactive state. It does so by decrementing the runtime of the executing task Ti at a pace equal to
Re: sched_{set,get}attr() manpage
Il 10/04/2014 09:47, Juri Lelli ha scritto: Hi all, On Wed, 9 Apr 2014 17:42:04 +0200 Peter Zijlstra wrote: On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote: The following "real-time" policies are also supported, for why the "'s? I borrowed those from SCHED_SETSCHEDULER(2). sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. ... The flags argument should be 0. What about SCHED_FLAG_RESET_ON_FOR? Different flags. The one is sched_attr::flags the other is sched_setattr(.flags). The other sched_attr fields are filled out as described in sched_setattr(). Scheduling Policies The scheduler is the kernel component that decides which runnable process will be executed by the CPU next. Each process has an associ‐ ated scheduling policy and a static scheduling priority, sched_prior‐ ity; these are the settings that are modified by sched_setscheduler(). The scheduler makes it decisions based on knowledge of the scheduling policy and static priority of all processes on the system. Isn't this last sentence redundant/sliglhtly repetitive? I borrowed that from SCHED_SETSCHEDULER(2) again. SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is an implementation of GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). The CBS guarantees that tasks that over-run their specified budget are throttled and do not affect the correct performance of other SCHED_DEADLINE tasks. SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN Setting SCHED_DEADLINE can fail with -EINVAL when admission control tests fail. Perhaps add a note about the deadline-class having higher priority than the other classes; i.e. if a deadline-task is runnable, it will preempt any other SCHED_(RR|FIFO) regardless of priority? Yes, good point, will do. SCHED_FIFO: First In-First Out scheduling SCHED_FIFO can only be used with static priorities higher than 0, which means that when a SCHED_FIFO processes becomes runnable, it will always immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ out time slicing. For processes scheduled under the SCHED_FIFO policy, the following rules apply: * A SCHED_FIFO process that has been preempted by another process of higher priority will stay at the head of the list for its priority and will resume execution as soon as all processes of higher prior‐ ity are blocked again. * When a SCHED_FIFO process becomes runnable, it will be inserted at the end of the list for its priority. * A call to sched_setscheduler() or sched_setparam(2) will put the SCHED_FIFO (or SCHED_RR) process identified by pid at the start of the list if it was runnable. As a consequence, it may preempt the currently running process if it has the same priority. (POSIX.1-2001 specifies that the process should go to the end of the list.) * A process calling sched_yield(2) will be put at the end of the list. How about the recent discussion regarding sched_yield(). Is this correct? lkml.kernel.org/r/alpine.deb.2.02.1403312333100.14...@ionos.tec.linutronix.de Is this the correct place to add a note explaining te potentional pitfalls using sched_yield? I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that nonsense. Also; I realized I have not described the DEADLINE sched_yield() behaviour. So, for SCHED_DEADLINE we currently have this behaviour: /* * Yield task semantic for -deadline tasks is: * * get off from the CPU until our next instance, with * a new runtime. This is of little use now, since we * don't have a bandwidth reclaiming mechanism. Anyway, * bandwidth reclaiming is planned for the future, and * yield_task_dl will indicate that some spare budget * is available for other task instances to use it. */ But, considering also the discussion above, I'm less sure now that's what we want. Still, I think we will want some way in the future to be able to say "I'm finished with my current job, give this remaining runtime to someone else", like another syscall or something. Hi Juri, hi Peter, my two cents: A syscall to block the task until its next instance is definitely useful. This way, a periodic task doesn't have to sleep anymore: the kernel takes care of unblocki
[PATCH RESEND] umc-bus.c: fix usage of device_trylock
Hi all. I've not received any feedback about this patch, so I'm resending just to check if someone is going to take care of it. The patch fixes the usage of device_trylock inside the driver umc-bus.c: device_trylock has the same semantics of mutex_trylock, so it returns 1 if the lock has been acquired successfully. Best regards, Claudio Subject: umc-bus.c: fix usage of device_trylock From: Claudio Scordino Fix usage of device_trylock. It has the same semantics of mutex_trylock, so it returns 1 if the lock has been acquired successfully. Signed-off-by: Claudio Scordino Signed-off-by: Bruno Morelli --- drivers/uwb/umc-bus.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/uwb/umc-bus.c b/drivers/uwb/umc-bus.c index 82a84d5..5c5b3fc 100644 --- a/drivers/uwb/umc-bus.c +++ b/drivers/uwb/umc-bus.c @@ -63,7 +63,7 @@ int umc_controller_reset(struct umc_dev *umc) struct device *parent = umc->dev.parent; int ret = 0; - if (device_trylock(parent)) + if (!device_trylock(parent)) return -EAGAIN; ret = device_for_each_child(parent, parent, umc_bus_pre_reset_helper); if (ret >= 0) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] umc-bus.c: fix usage of device_trylock
Hi all. I've probably found a wrong usage of device_trylock inside the driver umc-bus.c: device_trylock has the same semantics of mutex_trylock, so it returns 1 if the lock has been acquired successfully. Please, find below a patch. Best regards, Claudio Subject: umc-bus.c: fix usage of device_trylock From: Claudio Scordino device_trylock has the same semantics of mutex_trylock, so it returns 1 if the lock has been acquired successfully. Signed-off-by: Claudio Scordino --- drivers/uwb/umc-bus.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/uwb/umc-bus.c b/drivers/uwb/umc-bus.c index 82a84d5..5c5b3fc 100644 --- a/drivers/uwb/umc-bus.c +++ b/drivers/uwb/umc-bus.c @@ -63,7 +63,7 @@ int umc_controller_reset(struct umc_dev *umc) struct device *parent = umc->dev.parent; int ret = 0; - if (device_trylock(parent)) + if (!device_trylock(parent)) return -EAGAIN; ret = device_for_each_child(parent, parent, umc_bus_pre_reset_helper); if (ret >= 0) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun
Il 20/07/2012 17:15, Greg KH ha scritto: On Fri, Jul 20, 2012 at 12:26:10PM +0200, Claudio Scordino wrote: Il 20/07/2012 00:58, Greg KH ha scritto: On Wed, Jul 18, 2012 at 10:53:09AM +0200, Claudio Scordino wrote: Il 06/07/2012 19:41, Greg KH ha scritto: On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote: Hi Olav, please find below a patch for the isp1362-hcd.c driver to always save the message in case of underrun. More information is provided inside the patch comment. Let us know if you need any further information. Best regards, Claudio Subject: isp1362-hcd.c: usb message always saved in case of underrun From: Bruno Morelli The usb message must be saved also in case the USB endpoint is not a control endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a payload in case of error. The patch has been created by tracing with usbmon the different error messages generated by this driver with respect to the ehci-hcd driver. Signed-off-by: Bruno Morelli Signed-off-by: Claudio Scordino Tested-by: Bruno Morelli --- drivers/usb/host/isp1362-hcd.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c index 2ed112d..61bf1b2 100644 --- a/drivers/usb/host/isp1362-hcd.c +++ b/drivers/usb/host/isp1362-hcd.c @@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, struct isp1362_ep *ep) usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid, short_ok ? "" : "not_", PTD_GET_COUNT(ptd), ep->maxpacket, len); + /* save the data underrun error code for later and +* proceed with the status stage +*/ + urb->actual_length += PTD_GET_COUNT(ptd); + BUG_ON(urb->actual_length> + urb->transfer_buffer_length); Please NEVER crash the machine in a driver like this, it's bad and can cause problems. Yes, I know you are just moving it from the lines below: if (usb_pipecontrol(urb->pipe)) { ep->nextpid = USB_PID_ACK; - /* save the data underrun error code for later and -* proceed with the status stage -*/ - urb->actual_length += PTD_GET_COUNT(ptd); - BUG_ON(urb->actual_length> urb->transfer_buffer_length); But really, it should not be in the driver at all. Please remove it, at the most, do a WARN_ON() so that someone can see the problem and at least report it. Actually, what is this checking? How can someone recover from it? Who is this check for? The developer of this driver? Another driver? Hardware developer? End user? Who would be able to fix the problem if this happens? As it is, I can't take it, sorry. Hi Greg. I understand. As you have already said, this driver is full of BUG_ON statements. So, we can shift just the assignment as in the patch below, to have a correct behavior, leaving the BUG_ON where it already is. Then, we may propose a different patch to change BUG_ONs to WARN_ONs. Your updated patch is much better, thanks. But it doesn't apply to my tree right now. Can you please refresh it against the usb-next tree and resend it? Actually, I did. So, this means that I'm using the wrong tree... I'm using the "usb-next" branch available on your tree at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git Is this the wrong one ? That is the correct one. It didn't work for me, so try refreshing your patch and resending it. Hi Greg. I've just refreshed the patch by cloning the repo from scratch and creating the patch against the "usb-next" branch. You can find the patch below, but the content is still the same. Can you please check it again ? Is it possible that you have local changes not yet committed ? Many thanks, Claudio Subject: isp1362-hcd.c: usb message always saved in case of underrun From: Bruno Morelli The usb message must be saved also in case the USB endpoint is not a control endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a payload in case of error. The patch has been created by tracing with usbmon the different error messages generated by this driver with respect to the ehci-hcd driver. Signed-off-by: Bruno Morelli Signed-off-by: Claudio Scordino Tested-by: Bruno Morelli --- drivers/usb/host/isp1362-hcd.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp136
Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun
Il 20/07/2012 00:58, Greg KH ha scritto: On Wed, Jul 18, 2012 at 10:53:09AM +0200, Claudio Scordino wrote: Il 06/07/2012 19:41, Greg KH ha scritto: On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote: Hi Olav, please find below a patch for the isp1362-hcd.c driver to always save the message in case of underrun. More information is provided inside the patch comment. Let us know if you need any further information. Best regards, Claudio Subject: isp1362-hcd.c: usb message always saved in case of underrun From: Bruno Morelli The usb message must be saved also in case the USB endpoint is not a control endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a payload in case of error. The patch has been created by tracing with usbmon the different error messages generated by this driver with respect to the ehci-hcd driver. Signed-off-by: Bruno Morelli Signed-off-by: Claudio Scordino Tested-by: Bruno Morelli --- drivers/usb/host/isp1362-hcd.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c index 2ed112d..61bf1b2 100644 --- a/drivers/usb/host/isp1362-hcd.c +++ b/drivers/usb/host/isp1362-hcd.c @@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, struct isp1362_ep *ep) usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid, short_ok ? "" : "not_", PTD_GET_COUNT(ptd), ep->maxpacket, len); + /* save the data underrun error code for later and +* proceed with the status stage +*/ + urb->actual_length += PTD_GET_COUNT(ptd); + BUG_ON(urb->actual_length> + urb->transfer_buffer_length); Please NEVER crash the machine in a driver like this, it's bad and can cause problems. Yes, I know you are just moving it from the lines below: if (usb_pipecontrol(urb->pipe)) { ep->nextpid = USB_PID_ACK; - /* save the data underrun error code for later and -* proceed with the status stage -*/ - urb->actual_length += PTD_GET_COUNT(ptd); - BUG_ON(urb->actual_length> urb->transfer_buffer_length); But really, it should not be in the driver at all. Please remove it, at the most, do a WARN_ON() so that someone can see the problem and at least report it. Actually, what is this checking? How can someone recover from it? Who is this check for? The developer of this driver? Another driver? Hardware developer? End user? Who would be able to fix the problem if this happens? As it is, I can't take it, sorry. Hi Greg. I understand. As you have already said, this driver is full of BUG_ON statements. So, we can shift just the assignment as in the patch below, to have a correct behavior, leaving the BUG_ON where it already is. Then, we may propose a different patch to change BUG_ONs to WARN_ONs. Your updated patch is much better, thanks. But it doesn't apply to my tree right now. Can you please refresh it against the usb-next tree and resend it? Actually, I did. So, this means that I'm using the wrong tree... I'm using the "usb-next" branch available on your tree at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git Is this the wrong one ? Many thanks, Claudio -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun
Il 06/07/2012 19:41, Greg KH ha scritto: On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote: Hi Olav, please find below a patch for the isp1362-hcd.c driver to always save the message in case of underrun. More information is provided inside the patch comment. Let us know if you need any further information. Best regards, Claudio Subject: isp1362-hcd.c: usb message always saved in case of underrun From: Bruno Morelli The usb message must be saved also in case the USB endpoint is not a control endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a payload in case of error. The patch has been created by tracing with usbmon the different error messages generated by this driver with respect to the ehci-hcd driver. Signed-off-by: Bruno Morelli Signed-off-by: Claudio Scordino Tested-by: Bruno Morelli --- drivers/usb/host/isp1362-hcd.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c index 2ed112d..61bf1b2 100644 --- a/drivers/usb/host/isp1362-hcd.c +++ b/drivers/usb/host/isp1362-hcd.c @@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, struct isp1362_ep *ep) usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid, short_ok ? "" : "not_", PTD_GET_COUNT(ptd), ep->maxpacket, len); + /* save the data underrun error code for later and +* proceed with the status stage +*/ + urb->actual_length += PTD_GET_COUNT(ptd); + BUG_ON(urb->actual_length> + urb->transfer_buffer_length); Please NEVER crash the machine in a driver like this, it's bad and can cause problems. Yes, I know you are just moving it from the lines below: if (usb_pipecontrol(urb->pipe)) { ep->nextpid = USB_PID_ACK; - /* save the data underrun error code for later and -* proceed with the status stage -*/ - urb->actual_length += PTD_GET_COUNT(ptd); - BUG_ON(urb->actual_length> urb->transfer_buffer_length); But really, it should not be in the driver at all. Please remove it, at the most, do a WARN_ON() so that someone can see the problem and at least report it. Actually, what is this checking? How can someone recover from it? Who is this check for? The developer of this driver? Another driver? Hardware developer? End user? Who would be able to fix the problem if this happens? As it is, I can't take it, sorry. Hi Greg. I understand. As you have already said, this driver is full of BUG_ON statements. So, we can shift just the assignment as in the patch below, to have a correct behavior, leaving the BUG_ON where it already is. Then, we may propose a different patch to change BUG_ONs to WARN_ONs. Best regards, Claudio Subject: isp1362-hcd.c: usb message always saved in case of underrun From: Bruno Morelli The usb message must be saved also in case the USB endpoint is not a control endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a payload in case of error. The patch has been created by tracing with usbmon the different error messages generated by this driver with respect to the ehci-hcd driver. Signed-off-by: Bruno Morelli Signed-off-by: Claudio Scordino Tested-by: Bruno Morelli --- drivers/usb/host/isp1362-hcd.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c index 2ed112d..2563263 100644 --- a/drivers/usb/host/isp1362-hcd.c +++ b/drivers/usb/host/isp1362-hcd.c @@ -543,12 +543,12 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, struct isp1362_ep *ep) usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid, short_ok ? "" : "not_", PTD_GET_COUNT(ptd), ep->maxpacket, len); + /* save the data underrun error code for later and +* proceed with the status stage +*/ + urb->actual_length += PTD_GET_COUNT(ptd); if (usb_pipecontrol(urb->pipe)) { ep->nextpid = USB_PID_ACK; - /* save the data underrun error code for later and -* proceed with the status stage -*/ -