Re: [PATCH 3/2] sched/deadline: Use deadline instead of period when calculating overflow
On 15/02/2017 14:33, Daniel Bristot de Oliveira wrote: [1] For the sake of completeness: - dl_se->deadline = Absolute deadline - dl_se->dl_deadline = Relative deadline Daniel's note [1] reminds me that, would it be worth a rename of these, for the sake of clarity, e.g.: -) abs_deadline vs rel_deadline -) runtime_left vs runtime_tot or similar :-) ? T.
Re: [PATCH V2 2/2] sched/deadline: Throttle a constrained deadline task activated after the deadline
On 13/02/2017 20:05, Daniel Bristot de Oliveira wrote: To avoid this problem, in the activation of a constrained deadline task after the deadline but before the next period, throttle the task and set the replenishing timer to the begin of the next period, unless it is boosted. my only comment is that, by throttling on (dl < wakeuptime < period), we force the app to sync its activation time with the kernel, and the cbs doesn't self-sync anymore with the app own periodicity, which is what normally happens with dl=period. With dl=period, we loose the cbs self-sync and we force the app to sync with the kernel periodic timer only if we use explicitly yield(), but now this becomes also implicit just if we set dl attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */ attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */ ... On my box, this reproducer uses almost 50% of the CPU time, which is obviously wrong for a task with 2/2000 reservation. just a note here: in this example of runtime=deadline=2ms, shall we rely on a utilization-based test, then we should assume the task is taking 100%. More precise tests for EDF with deadlinehttp://retis.sssup.it/people/tommaso
[tip:sched/core] sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched
Commit-ID: 59f8c2989283bbd3df9fcfb22494d84f4852e536 Gitweb: http://git.kernel.org/tip/59f8c2989283bbd3df9fcfb22494d84f4852e536 Author: Tommaso Cucinotta AuthorDate: Wed, 26 Oct 2016 11:17:17 +0200 Committer: Ingo Molnar CommitDate: Sat, 14 Jan 2017 11:30:00 +0100 sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched This patch allows for reading the current (leftover) runtime and absolute deadline of a SCHED_DEADLINE task through /proc/*/sched (entries dl.runtime and dl.deadline), while debugging/testing. Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Juri Lelli Reviewed-by: Luca Abeni Acked-by: Daniel Bistrot de Oliveira Cc: Juri Lelli Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1477473437-10346-2-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 6 ++ kernel/sched/debug.c | 4 2 files changed, 10 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 8e37b0b..cbc1b46 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -408,6 +408,11 @@ CONTENTS * the new scheduling related syscalls that manipulate it, i.e., sched_setattr() and sched_getattr() are implemented. + For debugging purposes, the leftover runtime and absolute deadline of a + SCHED_DEADLINE task can be retrieved through /proc//sched (entries + dl.runtime and dl.deadline, both values in ns). A programmatic way to + retrieve these values from production code is under discussion. + 4.3 Default behavior - @@ -476,6 +481,7 @@ CONTENTS Still missing: + - programmatic way to retrieve current runtime and absolute deadline - refinements to deadline inheritance, especially regarding the possibility of retaining bandwidth isolation among non-interacting tasks. This is being studied from both theoretical and practical points of view, and diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index fa178b6..109adc0 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) #endif P(policy); P(prio); + if (p->policy == SCHED_DEADLINE) { + P(dl.runtime); + P(dl.deadline); + } #undef PN_SCHEDSTAT #undef PN #undef __PN
Re: [RFD] sched/deadline: Support single CPU affinity
Hi Peter, On 13/12/2016 11:21, Peter Zijlstra wrote: On Thu, Nov 10, 2016 at 11:01:59AM +0100, Tommaso Cucinotta wrote: Just a note: if you want to recover arbitrary task affinities, you can re-cast your above test like this: for_each_processor(cpu) \sum U[t]/A[t] \leq 1 (or U_max), for each task t on cpu, with utilization U[t] and A[t] tasks overall in its affinity mask Do I read it correct when I interpret A[t] as the number of CPUs in its affinity mask? yes, exactly, A[t] number of CPUs in the task affinity mask (sorry for my bad write-up) Also, does recoverable mean a bound tardiness, or is that something weaker still? nope, nothing exact -- it just meant providing flexible but simple & consistent (ie, towards recovering affinity masks) options from the kernel/scheduler side, leaving more complex & exact tests to user-space, or future add-ons to the kernel. Thanks, T.
Re: [PATCH 2/3] cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task
Hi, On 11/11/2016 11:22, Viresh Kumar wrote: If slow path frequency changes are conducted in a SCHED_OTHER context then they may be delayed for some amount of time, including indefinitely, when real time or deadline activity is taking place. Move the slow path to a real time kernel thread. In the future the thread should be made SCHED_DEADLINE. would you have an insight, as to what runtime/deadline/period to set, and/or what specific timing constraints you'd like to set, just for this cpufreq slowpath work? The RT priority is arbitrarily set to 50 for now. [...] + struct sched_param param = { .sched_priority = 50 }; won't you have a tunable here? (sysctl?) Thanks, T. -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [RFD] sched/deadline: Support single CPU affinity
On 10/11/2016 10:06, luca abeni wrote: is equivalent to the "least laxity first" (LLF) algorithm. Giving precedence to tasks with 0 laxity is a technique that is often used to improve the schedulability on multi-processor systems. EDZL (EDF / Zero Laxity first), right? AFAICR, there's quite a lot of analysis on EDZL for multi-cores... eg, Insik Shin et al http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6374195 But, before going the EDZL way, isn't it worthwhile to consider just splitting tasks among 2 cpus https://people.mpi-sws.org/~bbb/papers/pdf/rtss16b.pdf ? ... we're working at RETIS on simpler ways to make the AC for these split tasks cases (cc-ing Alessandro) that doesn't need demand-bound complex analysis... My2c, T. -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [RFD] sched/deadline: Support single CPU affinity
Hi, On 10/11/2016 09:08, Peter Zijlstra wrote: Add support for single CPU affinity to SCHED_DEADLINE; the supposed reason for wanting single CPU affinity is better QoS than provided by G-EDF. Therefore the aim is to provide harder guarantees, similar to UP, for single CPU affine tasks. This then leads to a mixed criticality scheduling requirement for the CPU scheduler. G-EDF like for the non-affine (global) tasks and UP like for the single CPU tasks. ADMISSION CONTROL Do simple UP admission control on the CPU local tasks, and subtract the admitted bandwidth from the global total when doing global admission control. single cpu: U[n] := \Sum tl_u,n <= 1 global: \Sum tg_u <= N - \Sum U[n] +1, even with the current G-EDF: we need in the kernel a minimum permissive admission control simple enough to just avoid ill-formed workloads that pile-up forever without hope of recovering (as opposed to an AC that runs a complex test and doesn't allow you to deploy a task unless it's absolutely guaranteed to schedule its runtime by its deadline), even though it won't be perfect in terms of hard RT guarantees; the latter would require anyway more complex analysis techniques, considering also frequency of interrupts & the likes, and can be done in user-space by proper middleware, libraries, or even at design-time for static embedded systems where everything is known upfront and doesn't change often. That said, it's good if in addition the mechanism behaves well from an analysis viewpoint (and we have a tardiness bound), the only problem being that there's a zillion proposals in research (see upcoming reply to Luca's). Just a note: if you want to recover arbitrary task affinities, you can re-cast your above test like this: for_each_processor(cpu) \sum U[t]/A[t] \leq 1 (or U_max), for each task t on cpu, with utilization U[t] and A[t] tasks overall in its affinity mask (I'm not claiming we need scenarios with overlapping cpusets and G-EDF tasks, it's just in case it simplifies code) T. -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote: Hi Tommaso, Hi, I'm cc-ing Luca for GRUB et al., pls find a few further notes below... On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote: as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE Resumed: the sum of the runtime of deadline tasks will not be greater than the "to_ratio(global_rt_period(), global_rt_runtime())" - see init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling mechanism. Extended: RT tasks' throttling aims to bound, for all CPUS of a domain - when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run continuously, in such way to provide some CPU time for non-real-time tasks to run. RT tasks need this global/local throttling mechanism to avoid the starvation of non-rt tasks because RT tasks do not have a limited runtime - RT task (or taskset) can run for an infinity runtime. DL tasks' throttling has another meaning. DL tasks' throttling aims to avoid *a* DL task for running for more than *its own* pre-allocated runtime. sure, and having an option to let it run for longer, if there's nothing else running in the system, is still interesting for pretty much similar reasons to those being discussed in this thread ... The sum of allocated runtime for all DL tasks will not to be greater than RT throttling enforcement runtime. The DL scheduler admission control already avoids this by limiting the amount of CPU time all DL tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the "global" throttling on before hand - in the admission control. GRUB might implement something <> for the DEADLINE scheduler. With GRUB, a deadline tasks will have more runtime than previously set/granted. yes, the main difference being: GRUB will let a DL task run for longer than its own runtime, but still let it starve anything below (RT as well as OTHER tasks); perhaps Luca (cc) has some further comment on this... Thanks, T. But I am quite sure it will still be bounded by the sum of the already allocated DL runtime, that will continue being smaller than "to_ratio(global_rt_period(), global_rt_runtime())". Am I missing something? -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window In the current default behavior (RT_RUNTIME_SHARING), in a domain with more than two CPUs, the worst case easily become "infinity," because a CPU can borrow runtime from another CPU. There is no guarantee for minimum latency for non-rt tasks. Anyway, if the user wants to provide such guarantee, they just need not enable this feature, while disabling RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-)) -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple Yeah, decrease the priority of the task would be something way more complicated and prone to errors. RT tasks would need to reduce its priority to a level higher than the IDLE task, but lower than SCHED_IDLE... -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: Currently, DL tasks are limited (in the bw control) to the global RT throttling limit... I think that this might be an extension to GRUB... that is extending the current behavior... so... things for the future - and IMHO it is another topic - way more challenging. Comments are welcome :-) -- Daniel -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso
Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature
as anticipated live to Daniel: -) +1 for the general concept, we'd need something similar also for SCHED_DEADLINE -) only issue might be that, if a non-RT task wakes up after the unthrottle, it will have to wait, but worst-case it will have a chance in the next throttling window -) an alternative to unthrottling might be temporary class downgrade to sched_other, but that might be much more complex, instead this Daniel's one looks quite simple -) when considering also DEADLINE tasks, it might be good to think about how we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.: a) DEADLINE unthrottles if there's no RT nor OTHER tasks? what if there's an unthrottled RT? b) DEADLINE throttles by downgrading to OTHER? c) DEADLINE throttles by downgrading to RT (RR/FIFO and what prio?) My2c, thanks! T. On 07/11/2016 09:17, Daniel Bristot de Oliveira wrote: The rt throttling mechanism prevents the starvation of non-real-time tasks by CPU intensive real-time tasks. In terms of percentage, the default behavior allows real-time tasks to run up to 95% of a given period, leaving the other 5% of the period for non-real-time tasks. In the absence of non-rt tasks, the system goes idle for 5% of the period. Although this behavior works fine for the purpose of avoiding bad real-time tasks that can hang the system, some greed users want to allow the real-time task to continue running in the absence of non-real-time tasks starving. In other words, they do not want to see the system going idle. This patch implements the RT_RUNTIME_GREED scheduler feature for greedy users (TM). When enabled, this feature will check if non-rt tasks are starving before throttling the real-time task. If the real-time task becomes throttled, it will be unthrottled as soon as the system goes idle, or when the next period starts, whichever comes first. This feature is enabled with the following command: # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features The user might also want to disable NO_RT_RUNTIME_SHARE logic, to keep all CPUs with the same rt_runtime. # echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features With these two options set, the user will guarantee some runtime for non-rt-tasks on all CPUs, while keeping real-time tasks running as much as possible. The feature is disabled by default, keeping the current behavior. Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Christoph Lameter Cc: linux-rt-users Cc: LKML diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 42d4027..c4c62ee 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3275,7 +3275,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie if (unlikely(!p)) p = idle_sched_class.pick_next_task(rq, prev, cookie); - return p; + if (likely(p != RETRY_TASK)) + return p; } again: diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 69631fa..3bd7a6d 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -66,6 +66,7 @@ SCHED_FEAT(RT_PUSH_IPI, true) SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) +SCHED_FEAT(RT_RUNTIME_GREED, false) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 5405d3f..0f23e06 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -26,6 +26,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct * pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie) { + if (sched_feat(RT_RUNTIME_GREED)) + if (try_to_unthrottle_rt_rq(&rq->rt)) + return RETRY_TASK; + put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 2516b8d..a6961a5 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -631,6 +631,22 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) #endif /* CONFIG_RT_GROUP_SCHED */ +static inline void unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + rt_rq->rt_time = 0; + rt_rq->rt_throttled = 0; + sched_rt_rq_enqueue(rt_rq); +} + +int try_to_unthrottle_rt_rq(struct rt_rq *rt_rq) +{ + if (rt_rq_throttled(rt_rq)) { + unthrottle_rt_rq(rt_rq); + return 1; + } + return 0; +} + bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) { struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); @@ -920,6 +936,18 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) * but accrue some time due to boosting. */ if (likely(rt_b->rt_runt
[PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/-/sched
This patch allows for reading the current (leftover) runtime and absolute deadline in /proc/*/sched, for debugging purposes. This revised patch: 1) has a fixed format, Signed-off-by, etc., as suggested by Juri 2) now includes related docs in Documentation/ 3) now I'm sending these e-mails with the correct From: [PATCH] sched/deadline: show leftover runtime and abs deadline in Thanks, T. -- Hi all, this is a tiny patch providing readings of the current (leftover) runtime and absolute deadline in /proc/*/sched. Mostly useful for debugging, I heard others playing with SCHED_DEADLINE had some need for similar patches as well. In addition to debugging, reading the leftover runtime is generally useful for adaptive/incremental RT logics that need to check whether there's enough runtime left for refining the computed result, or just use what we've computed so far and block till the next instance. Also, knowing what the absolute scheduling deadline is (along with what clock it refers to) might be useful for synchronization purposes. (albeit, for real production code, I wouldn't like to parse /proc anyway, rather I'd prefer to retrieve those params via eg sched_getscheduler()?)
[PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/*/sched
This patch allows for reading the current (leftover) runtime and absolute deadline of a SCHED_DEADLINE task through /proc/*/sched (entries dl.runtime and dl.deadline), while debugging/testing. Signed-off-by: Tommaso Cucinotta Reviewed-by: Juri Lelli Reviewed-by: Luca Abeni Acked-by: Daniel Bistrot de Oliveira Cc: Peter Zijlstra Cc: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 6 ++ kernel/sched/debug.c | 4 2 files changed, 10 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 8e37b0b..cbc1b46 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -408,6 +408,11 @@ CONTENTS * the new scheduling related syscalls that manipulate it, i.e., sched_setattr() and sched_getattr() are implemented. + For debugging purposes, the leftover runtime and absolute deadline of a + SCHED_DEADLINE task can be retrieved through /proc//sched (entries + dl.runtime and dl.deadline, both values in ns). A programmatic way to + retrieve these values from production code is under discussion. + 4.3 Default behavior - @@ -476,6 +481,7 @@ CONTENTS Still missing: + - programmatic way to retrieve current runtime and absolute deadline - refinements to deadline inheritance, especially regarding the possibility of retaining bandwidth isolation among non-interacting tasks. This is being studied from both theoretical and practical points of view, and diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index fa178b6..109adc0 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) #endif P(policy); P(prio); + if (p->policy == SCHED_DEADLINE) { + P(dl.runtime); + P(dl.deadline); + } #undef PN_SCHEDSTAT #undef PN #undef __PN -- 2.7.4
[RFC PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/-/sched
Hi all, this is a tiny patch providing readings of the current (leftover) runtime and absolute deadline in /proc/*/sched. Mostly useful for debugging, I heard others playing with SCHED_DEADLINE had some need for similar patches as well. In addition to debugging, reading the leftover runtime is generally useful for adaptive/incremental RT logics that need to check whether there's enough runtime left for refining the computed result, or just use what we've computed so far and block till the next instance. Also, knowing what the absolute scheduling deadline is (along with what clock it refers to) might be useful for synchronization purposes. (albeit, for real production code, I wouldn't like to parse /proc anyway, rather I'd prefer to retrieve those params via eg sched_getscheduler()?) Please, share your thoughts, thanks. Tommaso
[RFC PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/*/sched
From: Tommaso Cucinotta --- kernel/sched/debug.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index fa178b6..109adc0 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) #endif P(policy); P(prio); + if (p->policy == SCHED_DEADLINE) { + P(dl.runtime); + P(dl.deadline); + } #undef PN_SCHEDSTAT #undef PN #undef __PN -- 2.7.4
[tip:sched/core] sched/deadline: Document behavior of sched_yield()
Commit-ID: b95202a3b6bb8715a716dbdb15cdb82bf622260b Gitweb: http://git.kernel.org/tip/b95202a3b6bb8715a716dbdb15cdb82bf622260b Author: Tommaso Cucinotta AuthorDate: Fri, 9 Sep 2016 19:45:17 +0200 Committer: Ingo Molnar CommitDate: Sat, 10 Sep 2016 11:17:41 +0200 sched/deadline: Document behavior of sched_yield() This is a documentation only patch, explaining the behavior of sched_yield() when a SCHED_DEADLINE task calls it (give up remaining runtime and be throttled until next period begins). Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Juri Lelli Reviewed-by: Luca Abeni Reviewed-by: Daniel Bristot de Oliveira Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@retis.sssup.it Link: http://lkml.kernel.org/r/1473443117-11794-2-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-deadline.txt | 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 53a2fe1..8e37b0b 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -16,6 +16,7 @@ CONTENTS 4.1 System-wide settings 4.2 Task interface 4.3 Default behavior + 4.4 Behavior of sched_yield() 5. Tasks CPU affinity 5.1 SCHED_DEADLINE and cpusets HOWTO 6. Future plans @@ -426,6 +427,23 @@ CONTENTS Finally, notice that in order not to jeopardize the admission control a -deadline task cannot fork. + +4.4 Behavior of sched_yield() +- + + When a SCHED_DEADLINE task calls sched_yield(), it gives up its + remaining runtime and is immediately throttled, until the next + period, when its runtime will be replenished (a special flag + dl_yielded is set and used to handle correctly throttling and runtime + replenishment after a call to sched_yield()). + + This behavior of sched_yield() allows the task to wake-up exactly at + the beginning of the next period. Also, this may be useful in the + future with bandwidth reclaiming mechanisms, where sched_yield() will + make the leftoever runtime available for reclamation by other + SCHED_DEADLINE tasks. + + 5. Tasks CPU affinity =
[PATCH] sched/deadline: document behavior of sched_yield()
This is a documentation only patch, explaining the behavior of sched_yield() when a SCHED_DEADLINE task calls it (give up remaining runtime and be throttled until next period begins). Signed-off-by: Tommaso Cucinotta Reviewed-by: Juri Lelli Reviewed-by: Luca Abeni Reviewed-by: Daniel Bristot de Oliveira --- Documentation/scheduler/sched-deadline.txt | 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 53a2fe1..8e37b0b 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -16,6 +16,7 @@ CONTENTS 4.1 System-wide settings 4.2 Task interface 4.3 Default behavior + 4.4 Behavior of sched_yield() 5. Tasks CPU affinity 5.1 SCHED_DEADLINE and cpusets HOWTO 6. Future plans @@ -426,6 +427,23 @@ CONTENTS Finally, notice that in order not to jeopardize the admission control a -deadline task cannot fork. + +4.4 Behavior of sched_yield() +- + + When a SCHED_DEADLINE task calls sched_yield(), it gives up its + remaining runtime and is immediately throttled, until the next + period, when its runtime will be replenished (a special flag + dl_yielded is set and used to handle correctly throttling and runtime + replenishment after a call to sched_yield()). + + This behavior of sched_yield() allows the task to wake-up exactly at + the beginning of the next period. Also, this may be useful in the + future with bandwidth reclaiming mechanisms, where sched_yield() will + make the leftoever runtime available for reclamation by other + SCHED_DEADLINE tasks. + + 5. Tasks CPU affinity = -- 2.7.4
[PATCH] sched/deadline: document behavior of sched_yield()
Hi again, this is the reworked text following the comments by Luca, Juri and Daniel (thanks everybody), namely: -) "throttled" instead of "suspended" -) no more "reservation period" is mentioned -- just "period" Please, let me know if this sounds better now, thanks! T.
[PATCH] sched/deadline: document behavior of sched_yield()
This is a documentation only patch, explaining the behavior of sched_yield() when a SCHED_DEADLINE task calls it (give up remaining runtime and suspend till next period). Signed-off-by: Tommaso Cucinotta Reviewed-by: Juri Lelli --- Documentation/scheduler/sched-deadline.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 53a2fe1..cb43421 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -16,6 +16,7 @@ CONTENTS 4.1 System-wide settings 4.2 Task interface 4.3 Default behavior + 4.4 Behavior of sched_yield() 5. Tasks CPU affinity 5.1 SCHED_DEADLINE and cpusets HOWTO 6. Future plans @@ -426,6 +427,18 @@ CONTENTS Finally, notice that in order not to jeopardize the admission control a -deadline task cannot fork. +4.4 Behavior of sched_yield() +- + + When a SCHED_DEADLINE task calls sched_yield(), it gives up its + remaining runtime and is suspended till the next reservation period, + when its runtime will be replenished. This allows the task to + wake-up exactly at the beginning of the next period. Also, this may + be useful in the future with bandwidth reclaiming mechanisms, where + sched_yield() will make the leftoever runtime available for + reclamation by other SCHED_DEADLINE tasks. + + 5. Tasks CPU affinity = -- 2.7.4
[PATCH] sched/deadline: document behavior of sched_yield()
Added SoB, as per Juri's comment (thanks). This is a documentation only patch, explaining the behavior of sched_yield() when a SCHED_DEADLINE task calls it (give up remaining runtime and suspend till next period). T.
[PATCH] sched/deadline: document behavior of sched_yield()
This is a documentation only patch, explaining the behavior of sched_yield() when a SCHED_DEADLINE task calls it (give up remaining runtime and suspend till next period). --- Documentation/scheduler/sched-deadline.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt index 53a2fe1..cb43421 100644 --- a/Documentation/scheduler/sched-deadline.txt +++ b/Documentation/scheduler/sched-deadline.txt @@ -16,6 +16,7 @@ CONTENTS 4.1 System-wide settings 4.2 Task interface 4.3 Default behavior + 4.4 Behavior of sched_yield() 5. Tasks CPU affinity 5.1 SCHED_DEADLINE and cpusets HOWTO 6. Future plans @@ -426,6 +427,18 @@ CONTENTS Finally, notice that in order not to jeopardize the admission control a -deadline task cannot fork. +4.4 Behavior of sched_yield() +- + + When a SCHED_DEADLINE task calls sched_yield(), it gives up its + remaining runtime and is suspended till the next reservation period, + when its runtime will be replenished. This allows the task to + wake-up exactly at the beginning of the next period. Also, this may + be useful in the future with bandwidth reclaiming mechanisms, where + sched_yield() will make the leftoever runtime available for + reclamation by other SCHED_DEADLINE tasks. + + 5. Tasks CPU affinity = -- 2.7.4
[tip:sched/core] sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()
Commit-ID: d8206bb3ffe0eaee03abfad46fd44d8b17142e88 Gitweb: http://git.kernel.org/tip/d8206bb3ffe0eaee03abfad46fd44d8b17142e88 Author: Tommaso Cucinotta AuthorDate: Sun, 14 Aug 2016 16:27:08 +0200 Committer: Ingo Molnar CommitDate: Mon, 5 Sep 2016 13:29:43 +0200 sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear() These 2 exercise independent code paths and need different arguments. After this change, you call: cpudl_clear(cp, cpu); cpudl_set(cp, cpu, dl); instead of: cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */); cpudl_set(cp, cpu, dl, 1 /* is_valid */); Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-4-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- kernel/sched/cpudeadline.c | 49 +++--- kernel/sched/cpudeadline.h | 3 ++- kernel/sched/deadline.c| 10 +- 3 files changed, 40 insertions(+), 22 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 0ace75a..e731190 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -145,16 +145,15 @@ out: } /* - * cpudl_set - update the cpudl max-heap + * cpudl_clear - remove a cpu from the cpudl max-heap * @cp: the cpudl max-heap context * @cpu: the target cpu - * @dl: the new earliest deadline for this cpu * * Notes: assumes cpu_rq(cpu)->lock is locked * * Returns: (void) */ -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) +void cpudl_clear(struct cpudl *cp, int cpu) { int old_idx, new_cpu; unsigned long flags; @@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) WARN_ON(!cpu_present(cpu)); raw_spin_lock_irqsave(&cp->lock, flags); + old_idx = cp->elements[cpu].idx; - if (!is_valid) { - /* remove item */ - if (old_idx == IDX_INVALID) { - /* -* Nothing to remove if old_idx was invalid. -* This could happen if a rq_offline_dl is -* called for a CPU without -dl tasks running. -*/ - goto out; - } + if (old_idx == IDX_INVALID) { + /* +* Nothing to remove if old_idx was invalid. +* This could happen if a rq_offline_dl is +* called for a CPU without -dl tasks running. +*/ + } else { new_cpu = cp->elements[cp->size - 1].cpu; cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl; cp->elements[old_idx].cpu = new_cpu; @@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; cpudl_heapify(cp, old_idx); - cpumask_set_cpu(cpu, cp->free_cpus); - goto out; + cpumask_set_cpu(cpu, cp->free_cpus); } + raw_spin_unlock_irqrestore(&cp->lock, flags); +} + +/* + * cpudl_set - update the cpudl max-heap + * @cp: the cpudl max-heap context + * @cpu: the target cpu + * @dl: the new earliest deadline for this cpu + * + * Notes: assumes cpu_rq(cpu)->lock is locked + * + * Returns: (void) + */ +void cpudl_set(struct cpudl *cp, int cpu, u64 dl) +{ + int old_idx; + unsigned long flags; + + WARN_ON(!cpu_present(cpu)); + raw_spin_lock_irqsave(&cp->lock, flags); + + old_idx = cp->elements[cpu].idx; if (old_idx == IDX_INVALID) { int new_idx = cp->size++; cp->elements[new_idx].dl = dl; @@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cpudl_heapify(cp, old_idx); } -out: raw_spin_unlock_irqrestore(&cp->lock, flags); } diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h index fcbdf83..f7da8c5 100644 --- a/kernel/sched/cpudeadline.h +++ b/kernel/sched/cpudeadline.h @@ -23,7 +23,8 @@ struct cpudl { #ifdef CONFIG_SMP int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask); -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid); +void cpudl_set(struct cpudl *cp, int cpu, u64 dl); +void cpudl_clear(struct cpudl *cp, int cpu); int cpudl_init(struct cpudl *cp); void cpudl_set_freecpu(struct cpudl *cp, int cpu); void cpudl_clear_freecpu(struct cpudl *cp, int cpu); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d091f4a..18fb0b8 100644 --- a/kernel/sc
[tip:sched/core] sched/deadline: Make CPU heap faster avoiding real swaps on heapify
Commit-ID: 8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2 Gitweb: http://git.kernel.org/tip/8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2 Author: Tommaso Cucinotta AuthorDate: Sun, 14 Aug 2016 16:27:07 +0200 Committer: Ingo Molnar CommitDate: Mon, 5 Sep 2016 13:29:43 +0200 sched/deadline: Make CPU heap faster avoiding real swaps on heapify This change goes from heapify() ops done by swapping with parent/child so that the item to fix moves along, to heapify() ops done by just pulling the parent/child chain by 1 pos, then storing the item to fix just at the end. On a non-trivial heapify(), this performs roughly half stores wrt swaps. This has been measured to achieve up to 10% of speed-up for cpudl_set() calls, with a randomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs, as measured on an Intel Core2 Duo. Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-3-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- kernel/sched/cpudeadline.c | 66 +++--- 1 file changed, 45 insertions(+), 21 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 0acb0d4..0ace75a 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -31,48 +31,72 @@ static inline int right_child(int i) return (i << 1) + 2; } -static void cpudl_exchange(struct cpudl *cp, int a, int b) -{ - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu; - - swap(cp->elements[a].cpu, cp->elements[b].cpu); - swap(cp->elements[a].dl , cp->elements[b].dl ); - - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); -} - static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (left_child(idx) >= cp->size) + return; + /* adapted from lib/prio_heap.c */ while(1) { + u64 largest_dl; l = left_child(idx); r = right_child(idx); largest = idx; + largest_dl = orig_dl; - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl, - cp->elements[l].dl)) + if ((l < cp->size) && dl_time_before(orig_dl, + cp->elements[l].dl)) { largest = l; - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl, - cp->elements[r].dl)) + largest_dl = cp->elements[l].dl; + } + if ((r < cp->size) && dl_time_before(largest_dl, + cp->elements[r].dl)) largest = r; + if (largest == idx) break; - /* Push idx down the heap one level and bump one up */ - cpudl_exchange(cp, largest, idx); + /* pull largest child onto idx */ + cp->elements[idx].cpu = cp->elements[largest].cpu; + cp->elements[idx].dl = cp->elements[largest].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; idx = largest; } + /* actual push down of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } static void cpudl_heapify_up(struct cpudl *cp, int idx) { - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + int p; + + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (idx == 0) + return; + + do { + p = parent(idx); + if (dl_time_before(orig_dl, cp->elements[p].dl)) + break; + /* pull parent onto idx */ + cp->elements[idx].cpu = cp->elements[p].cpu; + cp->elements[idx].dl = cp->elements[p].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; + idx = p; + } while (idx != 0); + /* actual push
[tip:sched/core] sched/deadline: Refactor CPU heap code
Commit-ID: 126b3b6842cc848fc9880e7816e0a8d743be51f1 Gitweb: http://git.kernel.org/tip/126b3b6842cc848fc9880e7816e0a8d743be51f1 Author: Tommaso Cucinotta AuthorDate: Sun, 14 Aug 2016 16:27:06 +0200 Committer: Ingo Molnar CommitDate: Mon, 5 Sep 2016 13:29:42 +0200 sched/deadline: Refactor CPU heap code 1. heapify up factored out in new dedicated function heapify_up() (avoids repetition of same code) 2. call to cpudl_change_key() replaced with heapify_up() when cpudl_set actually inserts a new node in the heap 3. cpudl_change_key() replaced with heapify() that heapifies up or down as needed. Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-2-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- kernel/sched/cpudeadline.c | 50 +- 1 file changed, 23 insertions(+), 27 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index d418449..0acb0d4 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b) swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); } -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; @@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx) } } -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) +static void cpudl_heapify_up(struct cpudl *cp, int idx) { - WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); - - if (dl_time_before(new_dl, cp->elements[idx].dl)) { - cp->elements[idx].dl = new_dl; - cpudl_heapify(cp, idx); - } else { - cp->elements[idx].dl = new_dl; - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) { + cpudl_exchange(cp, idx, parent(idx)); + idx = parent(idx); } } +static void cpudl_heapify(struct cpudl *cp, int idx) +{ + if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) + cpudl_heapify_up(cp, idx); + else + cpudl_heapify_down(cp, idx); +} + static inline int cpudl_maximum(struct cpudl *cp) { return cp->elements[0].cpu; @@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->size--; cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - while (old_idx > 0 && dl_time_before( - cp->elements[parent(old_idx)].dl, - cp->elements[old_idx].dl)) { - cpudl_exchange(cp, old_idx, parent(old_idx)); - old_idx = parent(old_idx); - } + cpudl_heapify(cp, old_idx); cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify(cp, old_idx); goto out; } if (old_idx == IDX_INVALID) { - cp->size++; - cp->elements[cp->size - 1].dl = dl; - cp->elements[cp->size - 1].cpu = cpu; - cp->elements[cpu].idx = cp->size - 1; - cpudl_change_key(cp, cp->size - 1, dl); + int new_idx = cp->size++; + cp->elements[new_idx].dl = dl; + cp->elements[new_idx].cpu = cpu; + cp->elements[cpu].idx = new_idx; + cpudl_heapify_up(cp, new_idx); cpumask_clear_cpu(cpu, cp->free_cpus); } else { - cpudl_change_key(cp, old_idx, dl); + cp->elements[old_idx].dl = dl; + cpudl_heapify(cp, old_idx); } out:
SCHED_DEADLINE cpudeadline.{h,c} fixup
Hi, this is a rework of the cpudeadline bugfix and speed-up patch-set, that integrates all comments received so far from Luca, Juri and Peter. Compared with the previous post, here: -) I'm keeping out the minimally invasive bugfix, as it's already been merged in tip/sched/core -) I moved some little code refactory around change_key_dl() out of the (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just changes the heapify_up/down() functions -) I rebased on top of commit f0b22e39 -) I repeated an extensive set of tests through the framework published separately at: https://github.com/tomcucinotta/cpudl-bench repeating new no-behavior-change tests, new heap-consistency tests, and new a/b benchmarks (I'm working on a new i5 laptop now), results at: https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-10.pdf highlighting up to a 14% speed-up when averaging over 100K ops. See the enclosed README in that repo for more info. I'm leaving below the original description of all 4 patches. -- The first patch is a minimally invasive (1-line) fix for the deadline wrap-around bug. This leaves some weirdness in how cpudl_change_key() is called. Therefore, the second patch does a minimum of refactory to make things more explicit and clear. The 3rd patch contains now the actual performance enhancement (avoiding unneeded swaps during heapify operations), which has been measured to achieve up to 14% of speed-up for cpudl_set() calls. This has been measured with a randomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs. Benchmarking code at: https://github.com/tomcucinotta/cpudl-bench Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c} and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl) instead of cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */) and cpudl_set(cp, cpu, dl, 1 /* is_valid */). Any further comment is welcome, thanks! Tommaso --
[RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify
This change goes from heapify() ops done by swapping with parent/child so that the item to fix moves along, to heapify() ops done by just pulling the parent/child chain by 1 pos, then storing the item to fix just at the end. On a non-trivial heapify(), this performs roughly half stores wrt swaps. This has been measured to achieve up to 10% of speed-up for cpudl_set() calls, with a randomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs, as measured on an Intel Core2 Duo. Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 66 +++--- 1 file changed, 45 insertions(+), 21 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 0acb0d4..0ace75a 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -31,48 +31,72 @@ static inline int right_child(int i) return (i << 1) + 2; } -static void cpudl_exchange(struct cpudl *cp, int a, int b) -{ - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu; - - swap(cp->elements[a].cpu, cp->elements[b].cpu); - swap(cp->elements[a].dl , cp->elements[b].dl ); - - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); -} - static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (left_child(idx) >= cp->size) + return; + /* adapted from lib/prio_heap.c */ while(1) { + u64 largest_dl; l = left_child(idx); r = right_child(idx); largest = idx; + largest_dl = orig_dl; - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl, - cp->elements[l].dl)) + if ((l < cp->size) && dl_time_before(orig_dl, + cp->elements[l].dl)) { largest = l; - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl, - cp->elements[r].dl)) + largest_dl = cp->elements[l].dl; + } + if ((r < cp->size) && dl_time_before(largest_dl, + cp->elements[r].dl)) largest = r; + if (largest == idx) break; - /* Push idx down the heap one level and bump one up */ - cpudl_exchange(cp, largest, idx); + /* pull largest child onto idx */ + cp->elements[idx].cpu = cp->elements[largest].cpu; + cp->elements[idx].dl = cp->elements[largest].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; idx = largest; } + /* actual push down of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } static void cpudl_heapify_up(struct cpudl *cp, int idx) { - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + int p; + + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (idx == 0) + return; + + do { + p = parent(idx); + if (dl_time_before(orig_dl, cp->elements[p].dl)) + break; + /* pull parent onto idx */ + cp->elements[idx].cpu = cp->elements[p].cpu; + cp->elements[idx].dl = cp->elements[p].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; + idx = p; + } while (idx != 0); + /* actual push up of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } static void cpudl_heapify(struct cpudl *cp, int idx) -- 2.7.4
[RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear()
These 2 exercise independent code paths and need different arguments. After this change, you call cpudl_clear(cp, cpu) cpudl_set(cp, cpu, dl) instead of cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */) cpudl_set(cp, cpu, dl, 1 /* is_valid */) Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 49 +++--- kernel/sched/cpudeadline.h | 3 ++- kernel/sched/deadline.c| 10 +- 3 files changed, 40 insertions(+), 22 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 0ace75a..e731190 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -145,16 +145,15 @@ out: } /* - * cpudl_set - update the cpudl max-heap + * cpudl_clear - remove a cpu from the cpudl max-heap * @cp: the cpudl max-heap context * @cpu: the target cpu - * @dl: the new earliest deadline for this cpu * * Notes: assumes cpu_rq(cpu)->lock is locked * * Returns: (void) */ -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) +void cpudl_clear(struct cpudl *cp, int cpu) { int old_idx, new_cpu; unsigned long flags; @@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) WARN_ON(!cpu_present(cpu)); raw_spin_lock_irqsave(&cp->lock, flags); + old_idx = cp->elements[cpu].idx; - if (!is_valid) { - /* remove item */ - if (old_idx == IDX_INVALID) { - /* -* Nothing to remove if old_idx was invalid. -* This could happen if a rq_offline_dl is -* called for a CPU without -dl tasks running. -*/ - goto out; - } + if (old_idx == IDX_INVALID) { + /* +* Nothing to remove if old_idx was invalid. +* This could happen if a rq_offline_dl is +* called for a CPU without -dl tasks running. +*/ + } else { new_cpu = cp->elements[cp->size - 1].cpu; cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl; cp->elements[old_idx].cpu = new_cpu; @@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; cpudl_heapify(cp, old_idx); - cpumask_set_cpu(cpu, cp->free_cpus); - goto out; + cpumask_set_cpu(cpu, cp->free_cpus); } + raw_spin_unlock_irqrestore(&cp->lock, flags); +} + +/* + * cpudl_set - update the cpudl max-heap + * @cp: the cpudl max-heap context + * @cpu: the target cpu + * @dl: the new earliest deadline for this cpu + * + * Notes: assumes cpu_rq(cpu)->lock is locked + * + * Returns: (void) + */ +void cpudl_set(struct cpudl *cp, int cpu, u64 dl) +{ + int old_idx; + unsigned long flags; + + WARN_ON(!cpu_present(cpu)); + raw_spin_lock_irqsave(&cp->lock, flags); + + old_idx = cp->elements[cpu].idx; if (old_idx == IDX_INVALID) { int new_idx = cp->size++; cp->elements[new_idx].dl = dl; @@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cpudl_heapify(cp, old_idx); } -out: raw_spin_unlock_irqrestore(&cp->lock, flags); } diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h index fcbdf83..f7da8c5 100644 --- a/kernel/sched/cpudeadline.h +++ b/kernel/sched/cpudeadline.h @@ -23,7 +23,8 @@ struct cpudl { #ifdef CONFIG_SMP int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask); -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid); +void cpudl_set(struct cpudl *cp, int cpu, u64 dl); +void cpudl_clear(struct cpudl *cp, int cpu); int cpudl_init(struct cpudl *cp); void cpudl_set_freecpu(struct cpudl *cp, int cpu); void cpudl_clear_freecpu(struct cpudl *cp, int cpu); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d091f4a..18fb0b8 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -798,7 +798,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline) if (dl_rq->earliest_dl.curr == 0 || dl_time_before(deadline, dl_rq->earliest_dl.curr)) { dl_rq->earliest_dl.curr = deadline; - cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1); + cpudl_set(&rq->rd->cpudl, rq->cpu, deadline); } } @@ -813,14 +813,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline) if (!d
[RFC PATCH 1/3] sched/deadline: refactor cpu heap code
1. heapify up factored out in new dedicated function heapify_up() (avoids repetition of same code) 2. call to cpudl_change_key() replaced with heapify_up() when cpudl_set actually inserts a new node in the heap 3. cpudl_change_key() replaced with heapify() that heapifies up or down as needed. Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Reviewed-by: Juri Lelli Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 50 +- 1 file changed, 23 insertions(+), 27 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index d418449..0acb0d4 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b) swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); } -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; @@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx) } } -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) +static void cpudl_heapify_up(struct cpudl *cp, int idx) { - WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); - - if (dl_time_before(new_dl, cp->elements[idx].dl)) { - cp->elements[idx].dl = new_dl; - cpudl_heapify(cp, idx); - } else { - cp->elements[idx].dl = new_dl; - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) { + cpudl_exchange(cp, idx, parent(idx)); + idx = parent(idx); } } +static void cpudl_heapify(struct cpudl *cp, int idx) +{ + if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) + cpudl_heapify_up(cp, idx); + else + cpudl_heapify_down(cp, idx); +} + static inline int cpudl_maximum(struct cpudl *cp) { return cp->elements[0].cpu; @@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->size--; cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - while (old_idx > 0 && dl_time_before( - cp->elements[parent(old_idx)].dl, - cp->elements[old_idx].dl)) { - cpudl_exchange(cp, old_idx, parent(old_idx)); - old_idx = parent(old_idx); - } + cpudl_heapify(cp, old_idx); cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify(cp, old_idx); goto out; } if (old_idx == IDX_INVALID) { - cp->size++; - cp->elements[cp->size - 1].dl = dl; - cp->elements[cp->size - 1].cpu = cpu; - cp->elements[cpu].idx = cp->size - 1; - cpudl_change_key(cp, cp->size - 1, dl); + int new_idx = cp->size++; + cp->elements[new_idx].dl = dl; + cp->elements[new_idx].cpu = cpu; + cp->elements[cpu].idx = new_idx; + cpudl_heapify_up(cp, new_idx); cpumask_clear_cpu(cpu, cp->free_cpus); } else { - cpudl_change_key(cp, old_idx, dl); + cp->elements[old_idx].dl = dl; + cpudl_heapify(cp, old_idx); } out: -- 2.7.4
[tip:sched/core] sched/deadline: Fix wrap-around in DL heap
Commit-ID: a23eadfae2fd45536a355b785d5a1533e1955c22 Gitweb: http://git.kernel.org/tip/a23eadfae2fd45536a355b785d5a1533e1955c22 Author: Tommaso Cucinotta AuthorDate: Tue, 19 Jul 2016 11:44:50 +0200 Committer: Ingo Molnar CommitDate: Wed, 10 Aug 2016 13:32:55 +0200 sched/deadline: Fix wrap-around in DL heap Current code in cpudeadline.c has a bug in re-heapifying when adding a new element at the end of the heap, because a deadline value of 0 is temporarily set in the new elem, then cpudl_change_key() is called with the actual elem deadline as param. However, the function compares the new deadline to set with the one previously in the elem, which is 0. So, if current absolute deadlines grew so much to have negative values as s64, the comparison in cpudl_change_key() makes the wrong decision. Instead, as from dl_time_before(), the kernel should handle correctly abs deadlines wrap-arounds. This patch fixes the problem with a minimally invasive change that forces cpudl_change_key() to heapify up in this case. Signed-off-by: Tommaso Cucinotta Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Luca Abeni Cc: Juri Lelli Cc: Juri Lelli Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1468921493-10054-2-git-send-email-tommaso.cucino...@sssup.it Signed-off-by: Ingo Molnar --- kernel/sched/cpudeadline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 5be5882..d418449 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) if (old_idx == IDX_INVALID) { cp->size++; - cp->elements[cp->size - 1].dl = 0; + cp->elements[cp->size - 1].dl = dl; cp->elements[cp->size - 1].cpu = cpu; cp->elements[cpu].idx = cp->size - 1; cpudl_change_key(cp, cp->size - 1, dl);
[RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
This change achieves up to 10% of speed-up for cpudl_set() calls, as measured with a andomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs, as measured on an Intel Core2 Duo. Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 114 +++-- 1 file changed, 69 insertions(+), 45 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 3c42702..60f933a 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -31,60 +31,82 @@ static inline int right_child(int i) return (i << 1) + 2; } -static void cpudl_exchange(struct cpudl *cp, int a, int b) -{ - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu; - - swap(cp->elements[a].cpu, cp->elements[b].cpu); - swap(cp->elements[a].dl , cp->elements[b].dl ); - - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); -} - static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (left_child(idx) >= cp->size) + return; + /* adapted from lib/prio_heap.c */ while(1) { + u64 largest_dl; l = left_child(idx); r = right_child(idx); largest = idx; + largest_dl = orig_dl; - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl, - cp->elements[l].dl)) + if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) { largest = l; - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl, - cp->elements[r].dl)) + largest_dl = cp->elements[l].dl; + } + if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl)) largest = r; + if (largest == idx) break; - /* Push idx down the heap one level and bump one up */ - cpudl_exchange(cp, largest, idx); + /* pull largest child onto idx */ + cp->elements[idx].cpu = cp->elements[largest].cpu; + cp->elements[idx].dl = cp->elements[largest].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; idx = largest; } + /* actual push down of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } static void cpudl_heapify_up(struct cpudl *cp, int idx) { - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + int p; + + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (idx == 0) + return; + + do { + p = parent(idx); + if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl)) + break; + /* pull parent onto idx */ + cp->elements[idx].cpu = cp->elements[p].cpu; + cp->elements[idx].dl = cp->elements[p].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; + idx = p; + } while (idx != 0); + /* actual push up of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) +static void cpudl_heapify(struct cpudl *cp, int idx) { WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); + if (idx == IDX_INVALID) + return; - if (dl_time_before(new_dl, cp->elements[idx].dl)) { - cp->elements[idx].dl = new_dl; - cpudl_heapify_down(cp, idx); - } else { - cp->elements[idx].dl = new_dl; + if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) { cpudl_heapify_up(cp, idx); + } else { + cpudl_heapify_down(cp, idx); } } @@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu,
[RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear().
These 2 exercise independent code paths and need different arguments. Now you call cpudl_clear(cp, cpu) cpudl_set(cp, cpu, dl) instead of cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */) cpudl_set(cp, cpu, dl, 1 /* is_valid */) Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 71 +- kernel/sched/cpudeadline.h | 3 +- kernel/sched/deadline.c| 10 +++ 3 files changed, 52 insertions(+), 32 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 60f933a..0f276bf 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -147,16 +147,15 @@ out: } /* - * cpudl_set - update the cpudl max-heap + * cpudl_clear - remove a cpu from the cpudl max-heap * @cp: the cpudl max-heap context * @cpu: the target cpu - * @dl: the new earliest deadline for this cpu * * Notes: assumes cpu_rq(cpu)->lock is locked * * Returns: (void) */ -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) +void cpudl_clear(struct cpudl *cp, int cpu) { int old_idx, new_cpu; unsigned long flags; @@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) WARN_ON(!cpu_present(cpu)); raw_spin_lock_irqsave(&cp->lock, flags); + old_idx = cp->elements[cpu].idx; - if (!is_valid) { + if (old_idx == IDX_INVALID) { + /* +* Nothing to remove if old_idx was invalid. +* This could happen if a rq_offline_dl is +* called for a CPU without -dl tasks running. +*/ + } else { /* remove item */ - if (old_idx == IDX_INVALID) { - /* -* Nothing to remove if old_idx was invalid. -* This could happen if a rq_offline_dl is -* called for a CPU without -dl tasks running. -*/ - goto out; - } cp->size--; cp->elements[cpu].idx = IDX_INVALID; if (old_idx != cp->size) { @@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->elements[new_cpu].idx = old_idx; cpudl_heapify(cp, old_idx); } - cpumask_set_cpu(cpu, cp->free_cpus); + } + + raw_spin_unlock_irqrestore(&cp->lock, flags); +} + +/* + * cpudl_set - update the cpudl max-heap + * @cp: the cpudl max-heap context + * @cpu: the target cpu + * @dl: the new earliest deadline for this cpu + * + * Notes: assumes cpu_rq(cpu)->lock is locked + * + * Returns: (void) + */ +void cpudl_set(struct cpudl *cp, int cpu, u64 dl) +{ + int old_idx; + unsigned long flags; + + WARN_ON(!cpu_present(cpu)); + + raw_spin_lock_irqsave(&cp->lock, flags); + + old_idx = cp->elements[cpu].idx; + if (old_idx == IDX_INVALID) { + int sz1 = cp->size++; + cp->elements[sz1].dl = dl; + cp->elements[sz1].cpu = cpu; + cp->elements[cpu].idx = sz1; + cpudl_heapify_up(cp, sz1); + + cpumask_clear_cpu(cpu, cp->free_cpus); } else { - if (old_idx == IDX_INVALID) { - int size1 = cp->size++; - cp->elements[size1].dl = dl; - cp->elements[size1].cpu = cpu; - cp->elements[cpu].idx = size1; - cpudl_heapify_up(cp, size1); - - cpumask_clear_cpu(cpu, cp->free_cpus); - } else { - cp->elements[old_idx].dl = dl; - cpudl_heapify(cp, old_idx); - } + cp->elements[old_idx].dl = dl; + cpudl_heapify(cp, old_idx); } -out: raw_spin_unlock_irqrestore(&cp->lock, flags); } diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h index fcbdf83..f7da8c5 100644 --- a/kernel/sched/cpudeadline.h +++ b/kernel/sched/cpudeadline.h @@ -23,7 +23,8 @@ struct cpudl { #ifdef CONFIG_SMP int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask); -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid); +void cpudl_set(struct cpudl *cp, int cpu, u64 dl); +void cpudl_clear(struct cpudl *cp, int cpu); int cpudl_init(struct cpudl *cp); void cpudl_set_freecpu(struct cpudl *cp, int cpu); void cpudl_clear_freecpu(struct cpudl *cp, int cpu); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fcb7f02..f2e8f47 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -795,7 +795,7 @@ static void i
SCHED_DEADLINE cpudeadline.{h,c} fixup
Hi, this is a rework of the cpudeadline bugfix and speed-up patch-set, that integrates all comments received so far from Luca and Juri. The first patch is a minimally invasive (1-line) fix for the deadline wrap-around bug. This leaves some weirdness in how cpudl_change_key() is called. Therefore, the second patch does a minimum of refactory to make things more explicit and clear. The 3rd patch contains now the actual performance enhancement (avoiding unneeded swaps during heapify operations), which has been measured to achieve up to 10% , of speed-up for cpudl_set() calls. This has been measured with a andomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs. Benchmarking code is available at: https://github.com/tomcucinotta/cpudl-bench Obtained speed-up plot: https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl.pdf Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c} and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl) instead of cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */) and cpudl_set(cp, cpu, dl, 1 /* is_valid */). Please, share your comments, thanks! Tommaso
[RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for SCHED_DEADLINE cpu heap
Current code in cpudeadline.c has a bug in re-heapifying when adding a new element at the end of the heap, because a deadline value of 0 is temporarily set in the new elem, then cpudl_change_key() is called with the actual elem deadline as param. However, the function compares the new deadline to set with the one previously in the elem, which is 0. So, if current absolute deadlines grew so much to have negative values as s64, the comparison in cpudl_change_key() makes the wrong decision. Instead, as from dl_time_before(), the kernel should handle correctly abs deadlines wrap-arounds. This patch fixes the problem with a minimally invasive change that forces cpudl_change_key() to heapify up in this case. Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 5be5882..d418449 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) if (old_idx == IDX_INVALID) { cp->size++; - cp->elements[cp->size - 1].dl = 0; + cp->elements[cp->size - 1].dl = dl; cp->elements[cp->size - 1].cpu = cpu; cp->elements[cpu].idx = cp->size - 1; cpudl_change_key(cp, cp->size - 1, dl); -- 2.7.4
[RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory.
1. heapify up factored out in new dedicated function heapify_up() (avoids repeatition of same code) 2. call to cpudl_change_key() replaced with heapify_up() when cpudl_set actually inserts a new node in the heap Cc: Peter Zijlstra Cc: Juri Lelli Cc: Luca Abeni Reviewed-by: Luca Abeni Signed-off-by: Tommaso Cucinotta --- kernel/sched/cpudeadline.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index d418449..3c42702 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b) swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); } -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; @@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx) } } +static void cpudl_heapify_up(struct cpudl *cp, int idx) +{ + while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) { + cpudl_exchange(cp, idx, parent(idx)); + idx = parent(idx); + } +} + static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) { WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); if (dl_time_before(new_dl, cp->elements[idx].dl)) { cp->elements[idx].dl = new_dl; - cpudl_heapify(cp, idx); + cpudl_heapify_down(cp, idx); } else { cp->elements[idx].dl = new_dl; - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + cpudl_heapify_up(cp, idx); } } @@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->size--; cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - while (old_idx > 0 && dl_time_before( - cp->elements[parent(old_idx)].dl, - cp->elements[old_idx].dl)) { - cpudl_exchange(cp, old_idx, parent(old_idx)); - old_idx = parent(old_idx); - } + cpudl_heapify_up(cp, old_idx); cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify(cp, old_idx); +cpudl_heapify_down(cp, old_idx); goto out; } if (old_idx == IDX_INVALID) { - cp->size++; - cp->elements[cp->size - 1].dl = dl; - cp->elements[cp->size - 1].cpu = cpu; - cp->elements[cpu].idx = cp->size - 1; - cpudl_change_key(cp, cp->size - 1, dl); + int size1 = cp->size++; + cp->elements[size1].dl = dl; + cp->elements[size1].cpu = cpu; + cp->elements[cpu].idx = size1; + cpudl_heapify_up(cp, size1); cpumask_clear_cpu(cpu, cp->free_cpus); } else { cpudl_change_key(cp, old_idx, dl); -- 2.7.4
SCHED_DEADLINE cpudeadline.{h,c} fixup
Hi all, I took Luca's advice to isolate the deadline wrap-around bugfix with a first minimally invasive patch (1-line). This leaves some weirdness in how cpudl_change_key() is called. Therefore, the second patch does a minimum of refactory to make things more explicit and clear. The 3rd patch contains now the actual performance enhancement (avoiding unneeded swaps during heapify operations), which, as said in the previous post, achieves up to 6% of speed-up for cpudl_set() calls. Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c} and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl) instead of cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */) and cpudl_set(cp, cpu, dl, 1 /* is_valid */). Please, let me know how this looks like now. Thanks, Tommaso
[RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.
--- kernel/sched/cpudeadline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 5be5882..d418449 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) if (old_idx == IDX_INVALID) { cp->size++; - cp->elements[cp->size - 1].dl = 0; + cp->elements[cp->size - 1].dl = dl; cp->elements[cp->size - 1].cpu = cpu; cp->elements[cpu].idx = cp->size - 1; cpudl_change_key(cp, cp->size - 1, dl); -- 2.7.4
[RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
--- kernel/sched/cpudeadline.c | 114 +++-- 1 file changed, 69 insertions(+), 45 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 3c42702..60f933a 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -31,60 +31,82 @@ static inline int right_child(int i) return (i << 1) + 2; } -static void cpudl_exchange(struct cpudl *cp, int a, int b) -{ - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu; - - swap(cp->elements[a].cpu, cp->elements[b].cpu); - swap(cp->elements[a].dl , cp->elements[b].dl ); - - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); -} - static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (left_child(idx) >= cp->size) + return; + /* adapted from lib/prio_heap.c */ while(1) { + u64 largest_dl; l = left_child(idx); r = right_child(idx); largest = idx; + largest_dl = orig_dl; - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl, - cp->elements[l].dl)) + if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) { largest = l; - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl, - cp->elements[r].dl)) + largest_dl = cp->elements[l].dl; + } + if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl)) largest = r; + if (largest == idx) break; - /* Push idx down the heap one level and bump one up */ - cpudl_exchange(cp, largest, idx); + /* pull largest child onto idx */ + cp->elements[idx].cpu = cp->elements[largest].cpu; + cp->elements[idx].dl = cp->elements[largest].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; idx = largest; } + /* actual push down of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } static void cpudl_heapify_up(struct cpudl *cp, int idx) { - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + int p; + + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + if (idx == 0) + return; + + do { + p = parent(idx); + if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl)) + break; + /* pull parent onto idx */ + cp->elements[idx].cpu = cp->elements[p].cpu; + cp->elements[idx].dl = cp->elements[p].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; + idx = p; + } while (idx != 0); + /* actual push up of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; } -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) +static void cpudl_heapify(struct cpudl *cp, int idx) { WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); + if (idx == IDX_INVALID) + return; - if (dl_time_before(new_dl, cp->elements[idx].dl)) { - cp->elements[idx].dl = new_dl; - cpudl_heapify_down(cp, idx); - } else { - cp->elements[idx].dl = new_dl; + if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) { cpudl_heapify_up(cp, idx); + } else { + cpudl_heapify_down(cp, idx); } } @@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) */ goto out; } - new_cpu = cp->elements[cp->size - 1].cpu; - cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl; - cp->elements[old_idx].cpu = new_cpu; cp->size--; - cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - cpudl_heapify_up(cp, old_idx); - cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify_down(cp, old_idx); - - goto out; - } + if (old_idx != cp->size) { +
[RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with h
--- kernel/sched/cpudeadline.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index d418449..3c42702 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b) swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); } -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; @@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx) } } +static void cpudl_heapify_up(struct cpudl *cp, int idx) +{ + while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) { + cpudl_exchange(cp, idx, parent(idx)); + idx = parent(idx); + } +} + static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) { WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); if (dl_time_before(new_dl, cp->elements[idx].dl)) { cp->elements[idx].dl = new_dl; - cpudl_heapify(cp, idx); + cpudl_heapify_down(cp, idx); } else { cp->elements[idx].dl = new_dl; - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + cpudl_heapify_up(cp, idx); } } @@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->size--; cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - while (old_idx > 0 && dl_time_before( - cp->elements[parent(old_idx)].dl, - cp->elements[old_idx].dl)) { - cpudl_exchange(cp, old_idx, parent(old_idx)); - old_idx = parent(old_idx); - } + cpudl_heapify_up(cp, old_idx); cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify(cp, old_idx); +cpudl_heapify_down(cp, old_idx); goto out; } if (old_idx == IDX_INVALID) { - cp->size++; - cp->elements[cp->size - 1].dl = dl; - cp->elements[cp->size - 1].cpu = cpu; - cp->elements[cpu].idx = cp->size - 1; - cpudl_change_key(cp, cp->size - 1, dl); + int size1 = cp->size++; + cp->elements[size1].dl = dl; + cp->elements[size1].cpu = cpu; + cp->elements[cpu].idx = size1; + cpudl_heapify_up(cp, size1); cpumask_clear_cpu(cpu, cp->free_cpus); } else { cpudl_change_key(cp, old_idx, dl); -- 2.7.4
[RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments.
--- kernel/sched/cpudeadline.c | 71 +- kernel/sched/cpudeadline.h | 3 +- kernel/sched/deadline.c| 10 +++ 3 files changed, 52 insertions(+), 32 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 60f933a..0f276bf 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -147,16 +147,15 @@ out: } /* - * cpudl_set - update the cpudl max-heap + * cpudl_clear - remove a cpu from the cpudl max-heap * @cp: the cpudl max-heap context * @cpu: the target cpu - * @dl: the new earliest deadline for this cpu * * Notes: assumes cpu_rq(cpu)->lock is locked * * Returns: (void) */ -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) +void cpudl_clear(struct cpudl *cp, int cpu) { int old_idx, new_cpu; unsigned long flags; @@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) WARN_ON(!cpu_present(cpu)); raw_spin_lock_irqsave(&cp->lock, flags); + old_idx = cp->elements[cpu].idx; - if (!is_valid) { + if (old_idx == IDX_INVALID) { + /* +* Nothing to remove if old_idx was invalid. +* This could happen if a rq_offline_dl is +* called for a CPU without -dl tasks running. +*/ + } else { /* remove item */ - if (old_idx == IDX_INVALID) { - /* -* Nothing to remove if old_idx was invalid. -* This could happen if a rq_offline_dl is -* called for a CPU without -dl tasks running. -*/ - goto out; - } cp->size--; cp->elements[cpu].idx = IDX_INVALID; if (old_idx != cp->size) { @@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->elements[new_cpu].idx = old_idx; cpudl_heapify(cp, old_idx); } - cpumask_set_cpu(cpu, cp->free_cpus); + } + + raw_spin_unlock_irqrestore(&cp->lock, flags); +} + +/* + * cpudl_set - update the cpudl max-heap + * @cp: the cpudl max-heap context + * @cpu: the target cpu + * @dl: the new earliest deadline for this cpu + * + * Notes: assumes cpu_rq(cpu)->lock is locked + * + * Returns: (void) + */ +void cpudl_set(struct cpudl *cp, int cpu, u64 dl) +{ + int old_idx; + unsigned long flags; + + WARN_ON(!cpu_present(cpu)); + + raw_spin_lock_irqsave(&cp->lock, flags); + + old_idx = cp->elements[cpu].idx; + if (old_idx == IDX_INVALID) { + int sz1 = cp->size++; + cp->elements[sz1].dl = dl; + cp->elements[sz1].cpu = cpu; + cp->elements[cpu].idx = sz1; + cpudl_heapify_up(cp, sz1); + + cpumask_clear_cpu(cpu, cp->free_cpus); } else { - if (old_idx == IDX_INVALID) { - int size1 = cp->size++; - cp->elements[size1].dl = dl; - cp->elements[size1].cpu = cpu; - cp->elements[cpu].idx = size1; - cpudl_heapify_up(cp, size1); - - cpumask_clear_cpu(cpu, cp->free_cpus); - } else { - cp->elements[old_idx].dl = dl; - cpudl_heapify(cp, old_idx); - } + cp->elements[old_idx].dl = dl; + cpudl_heapify(cp, old_idx); } -out: raw_spin_unlock_irqrestore(&cp->lock, flags); } diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h index fcbdf83..f7da8c5 100644 --- a/kernel/sched/cpudeadline.h +++ b/kernel/sched/cpudeadline.h @@ -23,7 +23,8 @@ struct cpudl { #ifdef CONFIG_SMP int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask); -void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid); +void cpudl_set(struct cpudl *cp, int cpu, u64 dl); +void cpudl_clear(struct cpudl *cp, int cpu); int cpudl_init(struct cpudl *cp); void cpudl_set_freecpu(struct cpudl *cp, int cpu); void cpudl_clear_freecpu(struct cpudl *cp, int cpu); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fcb7f02..f2e8f47 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -795,7 +795,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline) if (dl_rq->earliest_dl.curr == 0 || dl_time_before(deadline, dl_rq->earliest_dl.curr)) { dl_rq->earliest_dl.curr = deadline; - cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1); + cpudl_set(&rq->rd->cpudl, rq->cpu, deadline); } } @@ -810,14 +810,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
On 17/05/2016 13:46, luca abeni wrote: Maybe the ... change can be split in a separate patch, which is a bugfix (and IMHO uncontroversial)? Ok, the bugfix alone might look like the attached. Couldn't avoid the little refactoring of the multiple occurrences of the same loop up the heap into the heapify_up(), mirroring the heapify() that was already there (renamed heapify_down() for clarity). I'll rebase the speed-up patch on top of this, if it's a better approach. Anyone with further comments? Thanks again! T. -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso >From cfaa75eb77843f7da875a54c7e6631b271bf0663 Mon Sep 17 00:00:00 2001 From: Tommaso Cucinotta Date: Tue, 17 May 2016 15:54:11 +0200 Subject: [PATCH] Deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap. --- kernel/sched/cpudeadline.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 5be5882..3c42702 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b) swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); } -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; @@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx) } } +static void cpudl_heapify_up(struct cpudl *cp, int idx) +{ + while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, + cp->elements[idx].dl)) { + cpudl_exchange(cp, idx, parent(idx)); + idx = parent(idx); + } +} + static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl) { WARN_ON(idx == IDX_INVALID || !cpu_present(idx)); if (dl_time_before(new_dl, cp->elements[idx].dl)) { cp->elements[idx].dl = new_dl; - cpudl_heapify(cp, idx); + cpudl_heapify_down(cp, idx); } else { cp->elements[idx].dl = new_dl; - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, - cp->elements[idx].dl)) { - cpudl_exchange(cp, idx, parent(idx)); - idx = parent(idx); - } + cpudl_heapify_up(cp, idx); } } @@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid) cp->size--; cp->elements[new_cpu].idx = old_idx; cp->elements[cpu].idx = IDX_INVALID; - while (old_idx > 0 && dl_time_before( -cp->elements[parent(old_idx)].dl, -cp->elements[old_idx].dl)) { - cpudl_exchange(cp, old_idx, parent(old_idx)); - old_idx = parent(old_idx); - } + cpudl_heapify_up(cp, old_idx); cpumask_set_cpu(cpu, cp->free_cpus); -cpudl_heapify(cp, old_idx); +cpudl_heapify_down(cp, old_idx); goto out; } if (old_idx == IDX_INVALID) { - cp->size++; - cp->elements[cp->size - 1].dl = 0; - cp->elements[cp->size - 1].cpu = cpu; - cp->elements[cpu].idx = cp->size - 1; - cpudl_change_key(cp, cp->size - 1, dl); + int size1 = cp->size++; + cp->elements[size1].dl = dl; + cp->elements[size1].cpu = cpu; + cp->elements[cpu].idx = size1; + cpudl_heapify_up(cp, size1); cpumask_clear_cpu(cpu, cp->free_cpus); } else { cpudl_change_key(cp, old_idx, dl); -- 2.7.4
SCHED_DEADLINE cpudeadline.{h,c} fixup
Hi, looking at the SCHED_DEADLINE code, I spotted an opportunity to make cpudeadline.c faster, in that we can skip real swaps during re-heapify()ication of items after addition/removal. As such ops are done under a domain spinlock, it sounded like an interesting try. Indeed, I've got a speed-up of up to ~6% for the cpudl_set() calls on a randomly generated workload of 1K,10K,100K random insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs with 2, 4, ..., 256 CPUs. Details in the attached plot. The attached patch does this, along with a minimum of rework of cpudeadline.c internals, and a final clean-up of the cpudeadline.h interface (second patch). The measurements have been made on an Intel Core2 Duo with the CPU frequency fixed at max, by letting cpudeadline.c be initialized with various numbers of CPUs, then making many calls sequentially, taking the rdtsc among calls, then dumping all numbers through printk(), and I'm plotting the average of clock ticks between consecutive calls. [ I can share the benchmarking code as well if needed ] Also, this fixes what seems to me a bug I noticed comparing the whole heap contents as handledbut the modified code vs the original one, insertion by insertion. The problem is in this code: cp->elements[cp->size - 1].dl = 0; cp->elements[cp->size - 1].cpu = cpu; cp->elements[cpu].idx = cp->size - 1; mycpudl_change_key(cp, cp->size - 1, dl); when fed by an absolute deadline that is so large to have a negative value as a s64. In such a case, as from dl_time_before(), the kernel should handle correctly the abs deadline wrap-around, however the current code in cpudeadline.c goes mad, and doesn't re-heapify correctly the just inserted element... that said, if these are ns, such a bug should be hit after a ~292 years of uptime :-D... I'd be happy to hear comments from others. I can provide additional info / make additional experiments as needed. Please, reply-all to this e-mail, I'm not subscribed to linux-kernel@. Thanks, Tommaso -- Tommaso Cucinotta, Computer Engineering PhD Associate Professor at the Real-Time Systems Laboratory (ReTiS) Scuola Superiore Sant'Anna, Pisa, Italy http://retis.sssup.it/people/tommaso >From ee54c2849f1d9d7f7f8faeb474a61074cae868b9 Mon Sep 17 00:00:00 2001 From: Tommaso Cucinotta Date: Thu, 12 May 2016 19:06:37 +0200 Subject: [PATCH 1/2] Make deadline max-heap faster and fix deadline wrap-around bug. --- kernel/sched/cpudeadline.c | 122 - 1 file changed, 77 insertions(+), 45 deletions(-) diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c index 5a75b08..245d929 100644 --- a/kernel/sched/cpudeadline.c +++ b/kernel/sched/cpudeadline.c @@ -31,55 +31,91 @@ static inline int right_child(int i) return (i << 1) + 2; } -static void cpudl_exchange(struct cpudl *cp, int a, int b) -{ - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu; - - swap(cp->elements[a].cpu, cp->elements[b].cpu); - swap(cp->elements[a].dl , cp->elements[b].dl ); - - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx); -} - -static void cpudl_heapify(struct cpudl *cp, int idx) +static void cpudl_heapify_down(struct cpudl *cp, int idx) { int l, r, largest; + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + /* adapted from lib/prio_heap.c */ while(1) { + u64 largest_dl; l = left_child(idx); r = right_child(idx); largest = idx; + largest_dl = orig_dl; - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl, - cp->elements[l].dl)) + if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) { largest = l; - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl, - cp->elements[r].dl)) + largest_dl = cp->elements[l].dl; + } + if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl)) largest = r; + if (largest == idx) break; - /* Push idx down the heap one level and bump one up */ - cpudl_exchange(cp, largest, idx); + /* pull largest child onto idx */ + cp->elements[idx].cpu = cp->elements[largest].cpu; + cp->elements[idx].dl = cp->elements[largest].dl; + cp->elements[cp->elements[idx].cpu].idx = idx; idx = largest; } + /* actual push down of saved original values orig_* */ + cp->elements[idx].cpu = orig_cpu; + cp->elements[idx].dl = orig_dl; + cp->elements[cp->elements[idx].cpu].idx = idx; +} + +static void cpudl_heapify_up(struct cpudl *cp, int idx) +{ + int p; + + int orig_cpu = cp->elements[idx].cpu; + u64 orig_dl = cp->elements[idx].dl; + + while (idx != 0) { + p = parent(idx); + if (dl_time_before(cp->elements[idx].dl
Re: [PATCH] sched/deadline: Add sched_dl documentation
On 28/01/14 09:31, Peter Zijlstra wrote: >> Exactly. I can remember also a huge period might be harmful, because with it >> even a tiny utilization may lead to a big runtime that starves the whole non >> RT tasks for a significant time. > > Another way to solve that is with explicit slack time scheduling, where > slack here means the time/utilization not allocated to dl tasks (a quick > google shows various different usages of slack, including laxity). > > So suppose we have 50% utilization but with a huge period, we could > schedule the other 50% at a fixed but smaller period and have that run > the rt/fair/etc tasks. > > So effectively we'll always fill up the utilization to 100% and use the > 'slack' time as a server for the other classes. Yesss :-)! Indeed, AFAICR in the AQuoSA API there was the QOS_F_DEFAULT flag http://aquosa.sourceforge.net/aquosa-docs/aquosa-qosres/html/qos__types_8h_source.html (that could only be set by a privileged process) just there to allow for creating a server serving "default" tasks (namely, every non-aquosa task). This way, you could create for example at system boot time a default reservation with a precise (runtime, period) for Linux, so that non-aquosa tasks could have a chance to run in that reservation, even in presence of a RT reservation with a significantly long runtime. T. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched/deadline: Add sched_dl documentation
On 22/01/14 13:51, Peter Zijlstra wrote: >>> Another possibly extension; one proposed by Ingo; is to demote tasks to >>> SCHED_OTHER once they exceed their budget instead of the full block they >>> get now -- we could possibly call this SCHED_FLAG_DL_CBS_SOFT or such. Soft reservations are also very useful, as in multimedia and adaptive RT apps we expect these budgets to be roughly estimated, rather than being carefully computed WCETs. For example, in the experimentation with adaptive budget tuning made with Jack, http://retis.sssup.it/~tommaso/papers/lac11.php the best results were achieved just setting the QOS_F_SOFT flag of the old AQuoSA (apologies for mentioning that old crappy code), that was implementing exactly downgrade of the task to the regular CFS scheduler for the time in which the budget was exhausted. That Jack-related scenario is quite challenging as due to the change in the required budget due to new Jack clients joining, or varying workloads of existing clients (e.g., timidity). At least, as far as the intention of being seamless to applications is kept (apps required no change at all -- only Jack server and client library were changed). FYI, I wish we could replay those modifications on top of SCHED_DEADLINE one day :-)... http://repo.or.cz/w/jack2.git/shortlog/refs/heads/aquosa > We should also again talk about what it would take to allow > unprivileged access to SCHED_DEADLINE. The things I can remember are the > obvious cap on utilization and a minimum period -- the latter so that we > don't DoS the system with a metric ton of tiny tasks. > > But I seem to remember there were a few other issues. Exactly. I can remember also a huge period might be harmful, because with it even a tiny utilization may lead to a big runtime that starves the whole non RT tasks for a significant time. Just in case, I had this paper http://retis.sssup.it/~tommaso/papers/rtas08.php discussing the issues I had found, and how they were tackled in the AQuoSA supervisor. See Section 3.1. Note that the AQuoSA model was probably more complex due to the possibility of hosting multiple tasks in the same deadline-driven reservation, and the possibility for parts of the tasks in a user reservation being capable of being re-wrapped within another app-specific reservation etc I'd expect the situation to be quite simpler with SCHED_DEADLINE. My2c, Tommaso -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/