Re: [PATCH 3/2] sched/deadline: Use deadline instead of period when calculating overflow

2017-02-16 Thread Tommaso Cucinotta

On 15/02/2017 14:33, Daniel Bristot de Oliveira wrote:

[1] For the sake of completeness:
- dl_se->deadline = Absolute deadline
- dl_se->dl_deadline = Relative deadline

Daniel's note [1] reminds me that, would it be worth a rename of these, for the 
sake of clarity, e.g.:
-) abs_deadline vs rel_deadline
-) runtime_left vs runtime_tot
or similar :-) ?

T.



Re: [PATCH V2 2/2] sched/deadline: Throttle a constrained deadline task activated after the deadline

2017-02-14 Thread Tommaso Cucinotta

On 13/02/2017 20:05, Daniel Bristot de Oliveira wrote:

To avoid this problem, in the activation of a constrained deadline
task after the deadline but before the next period, throttle the
task and set the replenishing timer to the begin of the next period,
unless it is boosted.


my only comment is that, by throttling on (dl < wakeuptime < period), we force the 
app to sync its activation time with the kernel, and the cbs doesn't self-sync anymore 
with the app own periodicity, which is what normally happens with dl=period. With 
dl=period, we loose the cbs self-sync and we force the app to sync with the kernel 
periodic timer only if we use explicitly yield(), but now this becomes also implicit 
just if we set dl
attr.sched_policy   = SCHED_DEADLINE;
attr.sched_runtime  = 2 * 1000 * 1000;  /* 2 ms */
attr.sched_deadline = 2 * 1000 * 1000;  /* 2 ms */
attr.sched_period   = 2 * 1000 * 1000 * 1000;   /* 2 s */

...

On my box, this reproducer uses almost 50% of the CPU time, which is
obviously wrong for a task with 2/2000 reservation.


just a note here: in this example of runtime=deadline=2ms, shall we rely
on a utilization-based test, then we should assume the task is taking 100%.
More precise tests for EDF with deadlinehttp://retis.sssup.it/people/tommaso


[tip:sched/core] sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched

2017-01-14 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  59f8c2989283bbd3df9fcfb22494d84f4852e536
Gitweb: http://git.kernel.org/tip/59f8c2989283bbd3df9fcfb22494d84f4852e536
Author: Tommaso Cucinotta 
AuthorDate: Wed, 26 Oct 2016 11:17:17 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 14 Jan 2017 11:30:00 +0100

sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched

This patch allows for reading the current (leftover) runtime and
absolute deadline of a SCHED_DEADLINE task through /proc/*/sched
(entries dl.runtime and dl.deadline), while debugging/testing.

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Juri Lelli 
Reviewed-by: Luca Abeni 
Acked-by: Daniel Bistrot de Oliveira 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1477473437-10346-2-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 6 ++
 kernel/sched/debug.c   | 4 
 2 files changed, 10 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 8e37b0b..cbc1b46 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -408,6 +408,11 @@ CONTENTS
   * the new scheduling related syscalls that manipulate it, i.e.,
 sched_setattr() and sched_getattr() are implemented.
 
+ For debugging purposes, the leftover runtime and absolute deadline of a
+ SCHED_DEADLINE task can be retrieved through /proc//sched (entries
+ dl.runtime and dl.deadline, both values in ns). A programmatic way to
+ retrieve these values from production code is under discussion.
+
 
 4.3 Default behavior
 -
@@ -476,6 +481,7 @@ CONTENTS
 
  Still missing:
 
+  - programmatic way to retrieve current runtime and absolute deadline
   - refinements to deadline inheritance, especially regarding the possibility
 of retaining bandwidth isolation among non-interacting tasks. This is
 being studied from both theoretical and practical points of view, and
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index fa178b6..109adc0 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
 #endif
P(policy);
P(prio);
+   if (p->policy == SCHED_DEADLINE) {
+   P(dl.runtime);
+   P(dl.deadline);
+   }
 #undef PN_SCHEDSTAT
 #undef PN
 #undef __PN


Re: [RFD] sched/deadline: Support single CPU affinity

2016-12-15 Thread Tommaso Cucinotta

Hi Peter,

On 13/12/2016 11:21, Peter Zijlstra wrote:

On Thu, Nov 10, 2016 at 11:01:59AM +0100, Tommaso Cucinotta wrote:

Just a note: if you want to recover arbitrary task affinities, you can re-cast 
your above test like this:

for_each_processor(cpu)
   \sum U[t]/A[t] \leq 1 (or U_max), for each task t on cpu, with utilization 
U[t] and A[t] tasks overall in its affinity mask


Do I read it correct when I interpret A[t] as the number of CPUs in its
affinity mask?


yes, exactly, A[t] number of CPUs in the task affinity mask (sorry for my bad 
write-up)


Also, does recoverable mean a bound tardiness, or is that something
weaker still?


nope, nothing exact -- it just meant providing flexible but simple & consistent 
(ie, towards recovering affinity masks) options from the kernel/scheduler side, leaving 
more complex & exact tests to user-space, or future add-ons to the kernel.

Thanks,

T.


Re: [PATCH 2/3] cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task

2016-11-11 Thread Tommaso Cucinotta

Hi,

On 11/11/2016 11:22, Viresh Kumar wrote:

If slow path frequency changes are conducted in a SCHED_OTHER context
then they may be delayed for some amount of time, including
indefinitely, when real time or deadline activity is taking place.

Move the slow path to a real time kernel thread. In the future the
thread should be made SCHED_DEADLINE.


would you have an insight, as to what runtime/deadline/period to set, and/or
what specific timing constraints you'd like to set, just for this cpufreq
slowpath work?


The RT priority is arbitrarily set
to 50 for now.

[...]

+   struct sched_param param = { .sched_priority = 50 };


won't you have a tunable here? (sysctl?)

Thanks,

T.
--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso


Re: [RFD] sched/deadline: Support single CPU affinity

2016-11-10 Thread Tommaso Cucinotta

On 10/11/2016 10:06, luca abeni wrote:

is equivalent to the "least laxity first" (LLF) algorithm.
Giving precedence to tasks with 0 laxity is a technique that is often
used to improve the schedulability on multi-processor systems.


EDZL (EDF / Zero Laxity first), right? AFAICR, there's quite a lot of
analysis on EDZL for multi-cores... eg, Insik Shin et al

  http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6374195

But, before going the EDZL way, isn't it worthwhile to consider
just splitting tasks among 2 cpus

  https://people.mpi-sws.org/~bbb/papers/pdf/rtss16b.pdf

? ... we're working at RETIS on simpler ways to make the AC for
these split tasks cases (cc-ing Alessandro) that doesn't need
demand-bound complex analysis...

My2c,

    T.
--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso


Re: [RFD] sched/deadline: Support single CPU affinity

2016-11-10 Thread Tommaso Cucinotta

Hi,

On 10/11/2016 09:08, Peter Zijlstra wrote:

Add support for single CPU affinity to SCHED_DEADLINE; the supposed reason for
wanting single CPU affinity is better QoS than provided by G-EDF.

Therefore the aim is to provide harder guarantees, similar to UP, for single
CPU affine tasks. This then leads to a mixed criticality scheduling
requirement for the CPU scheduler. G-EDF like for the non-affine (global)
tasks and UP like for the single CPU tasks.



ADMISSION CONTROL

Do simple UP admission control on the CPU local tasks, and subtract the
admitted bandwidth from the global total when doing global admission control.

  single cpu:   U[n] := \Sum tl_u,n <= 1
  global:   \Sum tg_u <= N - \Sum U[n]


+1, even with the current G-EDF: we need in the kernel a minimum permissive 
admission control simple enough to just avoid ill-formed workloads that pile-up 
forever without hope of recovering (as opposed to an AC that runs a complex test 
and doesn't allow you to deploy a task unless it's absolutely guaranteed to 
schedule its runtime by its deadline), even though it won't be perfect in terms of 
hard RT guarantees; the latter would require anyway more complex analysis 
techniques, considering also frequency of interrupts & the likes, and can be 
done in user-space by proper middleware, libraries, or even at design-time for 
static embedded systems where everything is known upfront and doesn't change often.

That said, it's good if in addition the mechanism behaves well from an analysis 
viewpoint (and we have a tardiness bound), the only problem being that there's 
a zillion proposals in research (see upcoming reply to Luca's).

Just a note: if you want to recover arbitrary task affinities, you can re-cast 
your above test like this:

for_each_processor(cpu)
  \sum U[t]/A[t] \leq 1 (or U_max), for each task t on cpu, with utilization 
U[t] and A[t] tasks overall in its affinity mask

(I'm not claiming we need scenarios with overlapping cpusets and G-EDF tasks, 
it's just in case it simplifies code)

T.
--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso


Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature

2016-11-07 Thread Tommaso Cucinotta

On 07/11/2016 14:51, Daniel Bristot de Oliveira wrote:

Hi Tommaso,


Hi,

I'm cc-ing Luca for GRUB et al., pls find a few further notes below...


On 11/07/2016 11:31 AM, Tommaso Cucinotta wrote:

as anticipated live to Daniel:
-) +1 for the general concept, we'd need something similar also for
SCHED_DEADLINE


Resumed: the sum of the runtime of deadline tasks will not be greater
than the "to_ratio(global_rt_period(), global_rt_runtime())" - see
init_dl_bw(). Therefore, DL rq will not be throttle by the RT throttling
mechanism.

Extended: RT tasks' throttling aims to bound, for all CPUS of a domain -
when RT_RUNTIME_SHARING sharing is enabled; or per-rq - when
RT_RUNTIME_SHARING is disabled; the amount of time that RT tasks can run
continuously, in such way to provide some CPU time for non-real-time
tasks to run. RT tasks need this global/local throttling mechanism to
avoid the starvation of non-rt tasks because RT tasks do not have a
limited runtime - RT task (or taskset) can run for an infinity runtime.

DL tasks' throttling has another meaning. DL tasks' throttling aims to
avoid *a* DL task for running for more than *its own* pre-allocated runtime.


sure, and having an option to let it run for longer, if there's nothing else
running in the system, is still interesting for pretty much similar reasons
to those being discussed in this thread ...


The sum of allocated runtime for all DL tasks will not to be greater
than RT throttling enforcement runtime. The DL scheduler admission
control already avoids this by limiting the amount of CPU time all DL
tasks can consume (see init_dl_bw()). So, DL tasks are avoid ind the
"global" throttling on before hand - in the admission control.

GRUB might implement something <> for the DEADLINE scheduler.
With GRUB, a deadline tasks will have more runtime than previously
set/granted.


yes, the main difference being: GRUB will let a DL task run for longer
than its own runtime, but still let it starve anything below (RT as well
as OTHER tasks); perhaps Luca (cc) has some further comment on this...

Thanks,

T.


But I am quite sure it will still be bounded by the sum
of the already allocated DL runtime, that will continue being smaller
than "to_ratio(global_rt_period(), global_rt_runtime())".

Am I missing something?


-) only issue might be that, if a non-RT task wakes up after the
unthrottle, it will have to wait, but worst-case it will have a chance
in the next throttling window


In the current default behavior (RT_RUNTIME_SHARING), in a domain with
more than two CPUs, the worst case easily become "infinity," because a
CPU can borrow runtime from another CPU. There is no guarantee for
minimum latency for non-rt tasks. Anyway, if the user wants to provide
such guarantee, they just need not enable this feature, while disabling
RT_RUNTIME_SHARING (or run the non-rt task as a deadline task ;-))


-) an alternative to unthrottling might be temporary class downgrade to
sched_other, but that might be much more complex, instead this Daniel's
one looks quite simple


Yeah, decrease the priority of the task would be something way more
complicated and prone to errors. RT tasks would need to reduce its
priority to a level higher than the IDLE task, but lower than SCHED_IDLE...


-) when considering also DEADLINE tasks, it might be good to think about
how we'd like the throttling of DEADLINE and RT tasks to inter-relate,
e.g.:


Currently, DL tasks are limited (in the bw control) to the global RT
throttling limit...

I think that this might be an extension to GRUB... that is extending the
current behavior... so... things for the future - and IMHO it is another
topic - way more challenging.

Comments are welcome :-)

-- Daniel




--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso


Re: [PATCH] sched/rt: RT_RUNTIME_GREED sched feature

2016-11-07 Thread Tommaso Cucinotta

as anticipated live to Daniel:
-) +1 for the general concept, we'd need something similar also for 
SCHED_DEADLINE
-) only issue might be that, if a non-RT task wakes up after the unthrottle, it 
will have to wait, but worst-case it will have a chance in the next throttling 
window
-) an alternative to unthrottling might be temporary class downgrade to 
sched_other, but that might be much more complex, instead this Daniel's one 
looks quite simple
-) when considering also DEADLINE tasks, it might be good to think about how 
we'd like the throttling of DEADLINE and RT tasks to inter-relate, e.g.:
  a) DEADLINE unthrottles if there's no RT nor OTHER tasks? what if there's an 
unthrottled RT?
  b) DEADLINE throttles by downgrading to OTHER?
  c) DEADLINE throttles by downgrading to RT (RR/FIFO and what prio?)

My2c, thanks!

T.

On 07/11/2016 09:17, Daniel Bristot de Oliveira wrote:

The rt throttling mechanism prevents the starvation of non-real-time
tasks by CPU intensive real-time tasks. In terms of percentage,
the default behavior allows real-time tasks to run up to 95% of a
given period, leaving the other 5% of the period for non-real-time
tasks. In the absence of non-rt tasks, the system goes idle for 5%
of the period.

Although this behavior works fine for the purpose of avoiding
bad real-time tasks that can hang the system, some greed users
want to allow the real-time task to continue running in the absence
of non-real-time tasks starving. In other words, they do not want to
see the system going idle.

This patch implements the RT_RUNTIME_GREED scheduler feature for greedy
users (TM). When enabled, this feature will check if non-rt tasks are
starving before throttling the real-time task. If the real-time task
becomes throttled, it will be unthrottled as soon as the system goes
idle, or when the next period starts, whichever comes first.

This feature is enabled with the following command:
   # echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features

The user might also want to disable NO_RT_RUNTIME_SHARE logic,
to keep all CPUs with the same rt_runtime.
   # echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features

With these two options set, the user will guarantee some runtime
for non-rt-tasks on all CPUs, while keeping real-time tasks running
as much as possible.

The feature is disabled by default, keeping the current behavior.

Signed-off-by: Daniel Bristot de Oliveira 
Reviewed-by: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Christoph Lameter 
Cc: linux-rt-users 
Cc: LKML 

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 42d4027..c4c62ee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3275,7 +3275,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
struct pin_cookie cookie
if (unlikely(!p))
p = idle_sched_class.pick_next_task(rq, prev, cookie);
  
-		return p;

+   if (likely(p != RETRY_TASK))
+   return p;
}
  
  again:

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 69631fa..3bd7a6d 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -66,6 +66,7 @@ SCHED_FEAT(RT_PUSH_IPI, true)
  
  SCHED_FEAT(FORCE_SD_OVERLAP, false)

  SCHED_FEAT(RT_RUNTIME_SHARE, true)
+SCHED_FEAT(RT_RUNTIME_GREED, false)
  SCHED_FEAT(LB_MIN, false)
  SCHED_FEAT(ATTACH_AGE_LOAD, true)
  
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c

index 5405d3f..0f23e06 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -26,6 +26,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct 
task_struct *p, int fl
  static struct task_struct *
  pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct 
pin_cookie cookie)
  {
+   if (sched_feat(RT_RUNTIME_GREED))
+   if (try_to_unthrottle_rt_rq(&rq->rt))
+   return RETRY_TASK;
+
put_prev_task(rq, prev);
update_idle_core(rq);
schedstat_inc(rq->sched_goidle);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2516b8d..a6961a5 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -631,6 +631,22 @@ static inline struct rt_bandwidth 
*sched_rt_bandwidth(struct rt_rq *rt_rq)
  
  #endif /* CONFIG_RT_GROUP_SCHED */
  
+static inline void unthrottle_rt_rq(struct rt_rq *rt_rq)

+{
+   rt_rq->rt_time = 0;
+   rt_rq->rt_throttled = 0;
+   sched_rt_rq_enqueue(rt_rq);
+}
+
+int try_to_unthrottle_rt_rq(struct rt_rq *rt_rq)
+{
+   if (rt_rq_throttled(rt_rq)) {
+   unthrottle_rt_rq(rt_rq);
+   return 1;
+   }
+   return 0;
+}
+
  bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
  {
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
@@ -920,6 +936,18 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
 * but accrue some time due to boosting.
 */
if (likely(rt_b->rt_runt

[PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/-/sched

2016-10-26 Thread Tommaso Cucinotta
This patch allows for reading the current (leftover) runtime and
absolute deadline in /proc/*/sched, for debugging purposes.

This revised patch:
1) has a fixed format, Signed-off-by, etc., as suggested by Juri
2) now includes related docs in Documentation/
3) now I'm sending these e-mails with the correct From:

[PATCH] sched/deadline: show leftover runtime and abs deadline in

Thanks,

  T.
--
Hi all,

this is a tiny patch providing readings of the current (leftover)
runtime and absolute deadline in /proc/*/sched. Mostly useful for
debugging, I heard others playing with SCHED_DEADLINE had some need
for similar patches as well.

In addition to debugging, reading the leftover runtime is generally
useful for adaptive/incremental RT logics that need to check whether
there's enough runtime left for refining the computed result, or just
use what we've computed so far and block till the next instance.
Also, knowing what the absolute scheduling deadline is (along with
what clock it refers to) might be useful for synchronization purposes.
(albeit, for real production code, I wouldn't like to parse /proc anyway,
rather I'd prefer to retrieve those params via eg sched_getscheduler()?)


[PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/*/sched

2016-10-26 Thread Tommaso Cucinotta
This patch allows for reading the current (leftover) runtime and
absolute deadline of a SCHED_DEADLINE task through /proc/*/sched
(entries dl.runtime and dl.deadline), while debugging/testing.

Signed-off-by: Tommaso Cucinotta 
Reviewed-by: Juri Lelli 
Reviewed-by: Luca Abeni 
Acked-by: Daniel Bistrot de Oliveira 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 6 ++
 kernel/sched/debug.c   | 4 
 2 files changed, 10 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 8e37b0b..cbc1b46 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -408,6 +408,11 @@ CONTENTS
   * the new scheduling related syscalls that manipulate it, i.e.,
 sched_setattr() and sched_getattr() are implemented.
 
+ For debugging purposes, the leftover runtime and absolute deadline of a
+ SCHED_DEADLINE task can be retrieved through /proc//sched (entries
+ dl.runtime and dl.deadline, both values in ns). A programmatic way to
+ retrieve these values from production code is under discussion.
+
 
 4.3 Default behavior
 -
@@ -476,6 +481,7 @@ CONTENTS
 
  Still missing:
 
+  - programmatic way to retrieve current runtime and absolute deadline
   - refinements to deadline inheritance, especially regarding the possibility
 of retaining bandwidth isolation among non-interacting tasks. This is
 being studied from both theoretical and practical points of view, and
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index fa178b6..109adc0 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
 #endif
P(policy);
P(prio);
+   if (p->policy == SCHED_DEADLINE) {
+   P(dl.runtime);
+   P(dl.deadline);
+   }
 #undef PN_SCHEDSTAT
 #undef PN
 #undef __PN
-- 
2.7.4



[RFC PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/-/sched

2016-10-24 Thread Tommaso Cucinotta
Hi all,

this is a tiny patch providing readings of the current (leftover)
runtime and absolute deadline in /proc/*/sched. Mostly useful for
debugging, I heard others playing with SCHED_DEADLINE had some need
for similar patches as well.

In addition to debugging, reading the leftover runtime is generally
useful for adaptive/incremental RT logics that need to check whether
there's enough runtime left for refining the computed result, or just
use what we've computed so far and block till the next instance.
Also, knowing what the absolute scheduling deadline is (along with
what clock it refers to) might be useful for synchronization purposes.
(albeit, for real production code, I wouldn't like to parse /proc anyway,
rather I'd prefer to retrieve those params via eg sched_getscheduler()?)

Please, share your thoughts, thanks.

  Tommaso



[RFC PATCH] sched/deadline: show leftover runtime and abs deadline in /proc/*/sched

2016-10-24 Thread Tommaso Cucinotta
From: Tommaso Cucinotta 

---
 kernel/sched/debug.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index fa178b6..109adc0 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -953,6 +953,10 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
 #endif
P(policy);
P(prio);
+   if (p->policy == SCHED_DEADLINE) {
+   P(dl.runtime);
+   P(dl.deadline);
+   }
 #undef PN_SCHEDSTAT
 #undef PN
 #undef __PN
-- 
2.7.4



[tip:sched/core] sched/deadline: Document behavior of sched_yield()

2016-09-10 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  b95202a3b6bb8715a716dbdb15cdb82bf622260b
Gitweb: http://git.kernel.org/tip/b95202a3b6bb8715a716dbdb15cdb82bf622260b
Author: Tommaso Cucinotta 
AuthorDate: Fri, 9 Sep 2016 19:45:17 +0200
Committer:  Ingo Molnar 
CommitDate: Sat, 10 Sep 2016 11:17:41 +0200

sched/deadline: Document behavior of sched_yield()

This is a documentation only patch, explaining the
behavior of sched_yield() when a SCHED_DEADLINE
task calls it (give up remaining runtime and be
throttled until next period begins).

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Juri Lelli 
Reviewed-by: Luca Abeni 
Reviewed-by: Daniel Bristot de Oliveira 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@retis.sssup.it
Link: 
http://lkml.kernel.org/r/1473443117-11794-2-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 53a2fe1..8e37b0b 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -16,6 +16,7 @@ CONTENTS
4.1 System-wide settings
4.2 Task interface
4.3 Default behavior
+   4.4 Behavior of sched_yield()
  5. Tasks CPU affinity
5.1 SCHED_DEADLINE and cpusets HOWTO
  6. Future plans
@@ -426,6 +427,23 @@ CONTENTS
  Finally, notice that in order not to jeopardize the admission control a
  -deadline task cannot fork.
 
+
+4.4 Behavior of sched_yield()
+-
+
+ When a SCHED_DEADLINE task calls sched_yield(), it gives up its
+ remaining runtime and is immediately throttled, until the next
+ period, when its runtime will be replenished (a special flag
+ dl_yielded is set and used to handle correctly throttling and runtime
+ replenishment after a call to sched_yield()).
+
+ This behavior of sched_yield() allows the task to wake-up exactly at
+ the beginning of the next period. Also, this may be useful in the
+ future with bandwidth reclaiming mechanisms, where sched_yield() will
+ make the leftoever runtime available for reclamation by other
+ SCHED_DEADLINE tasks.
+
+
 5. Tasks CPU affinity
 =
 


[PATCH] sched/deadline: document behavior of sched_yield()

2016-09-09 Thread Tommaso Cucinotta
This is a documentation only patch, explaining the
behavior of sched_yield() when a SCHED_DEADLINE
task calls it (give up remaining runtime and be
throttled until next period begins).

Signed-off-by: Tommaso Cucinotta 
Reviewed-by: Juri Lelli 
Reviewed-by: Luca Abeni 
Reviewed-by: Daniel Bristot de Oliveira 
---
 Documentation/scheduler/sched-deadline.txt | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 53a2fe1..8e37b0b 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -16,6 +16,7 @@ CONTENTS
4.1 System-wide settings
4.2 Task interface
4.3 Default behavior
+   4.4 Behavior of sched_yield()
  5. Tasks CPU affinity
5.1 SCHED_DEADLINE and cpusets HOWTO
  6. Future plans
@@ -426,6 +427,23 @@ CONTENTS
  Finally, notice that in order not to jeopardize the admission control a
  -deadline task cannot fork.
 
+
+4.4 Behavior of sched_yield()
+-
+
+ When a SCHED_DEADLINE task calls sched_yield(), it gives up its
+ remaining runtime and is immediately throttled, until the next
+ period, when its runtime will be replenished (a special flag
+ dl_yielded is set and used to handle correctly throttling and runtime
+ replenishment after a call to sched_yield()).
+
+ This behavior of sched_yield() allows the task to wake-up exactly at
+ the beginning of the next period. Also, this may be useful in the
+ future with bandwidth reclaiming mechanisms, where sched_yield() will
+ make the leftoever runtime available for reclamation by other
+ SCHED_DEADLINE tasks.
+
+
 5. Tasks CPU affinity
 =
 
-- 
2.7.4



[PATCH] sched/deadline: document behavior of sched_yield()

2016-09-09 Thread Tommaso Cucinotta
Hi again,

this is the reworked text following the comments by Luca, Juri and Daniel
(thanks everybody), namely:
-) "throttled" instead of "suspended"
-) no more "reservation period" is mentioned -- just "period"

Please, let me know if this sounds better now, thanks!

T.



[PATCH] sched/deadline: document behavior of sched_yield()

2016-09-09 Thread Tommaso Cucinotta
This is a documentation only patch, explaining the
behavior of sched_yield() when a SCHED_DEADLINE
task calls it (give up remaining runtime and
suspend till next period).

Signed-off-by: Tommaso Cucinotta 
Reviewed-by: Juri Lelli 
---
 Documentation/scheduler/sched-deadline.txt | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 53a2fe1..cb43421 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -16,6 +16,7 @@ CONTENTS
4.1 System-wide settings
4.2 Task interface
4.3 Default behavior
+   4.4 Behavior of sched_yield()
  5. Tasks CPU affinity
5.1 SCHED_DEADLINE and cpusets HOWTO
  6. Future plans
@@ -426,6 +427,18 @@ CONTENTS
  Finally, notice that in order not to jeopardize the admission control a
  -deadline task cannot fork.
 
+4.4 Behavior of sched_yield()
+-
+
+ When a SCHED_DEADLINE task calls sched_yield(), it gives up its
+ remaining runtime and is suspended till the next reservation period,
+ when its runtime will be replenished.  This allows the task to
+ wake-up exactly at the beginning of the next period. Also, this may
+ be useful in the future with bandwidth reclaiming mechanisms, where
+ sched_yield() will make the leftoever runtime available for
+ reclamation by other SCHED_DEADLINE tasks.
+
+
 5. Tasks CPU affinity
 =
 
-- 
2.7.4



[PATCH] sched/deadline: document behavior of sched_yield()

2016-09-09 Thread Tommaso Cucinotta
Added SoB, as per Juri's comment (thanks).

This is a documentation only patch, explaining the
behavior of sched_yield() when a SCHED_DEADLINE
task calls it (give up remaining runtime and
suspend till next period).

  T.


[PATCH] sched/deadline: document behavior of sched_yield()

2016-09-08 Thread Tommaso Cucinotta
This is a documentation only patch, explaining the
behavior of sched_yield() when a SCHED_DEADLINE
task calls it (give up remaining runtime and
suspend till next period).
---
 Documentation/scheduler/sched-deadline.txt | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 53a2fe1..cb43421 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -16,6 +16,7 @@ CONTENTS
4.1 System-wide settings
4.2 Task interface
4.3 Default behavior
+   4.4 Behavior of sched_yield()
  5. Tasks CPU affinity
5.1 SCHED_DEADLINE and cpusets HOWTO
  6. Future plans
@@ -426,6 +427,18 @@ CONTENTS
  Finally, notice that in order not to jeopardize the admission control a
  -deadline task cannot fork.
 
+4.4 Behavior of sched_yield()
+-
+
+ When a SCHED_DEADLINE task calls sched_yield(), it gives up its
+ remaining runtime and is suspended till the next reservation period,
+ when its runtime will be replenished.  This allows the task to
+ wake-up exactly at the beginning of the next period. Also, this may
+ be useful in the future with bandwidth reclaiming mechanisms, where
+ sched_yield() will make the leftoever runtime available for
+ reclamation by other SCHED_DEADLINE tasks.
+
+
 5. Tasks CPU affinity
 =
 
-- 
2.7.4



[tip:sched/core] sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()

2016-09-05 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  d8206bb3ffe0eaee03abfad46fd44d8b17142e88
Gitweb: http://git.kernel.org/tip/d8206bb3ffe0eaee03abfad46fd44d8b17142e88
Author: Tommaso Cucinotta 
AuthorDate: Sun, 14 Aug 2016 16:27:08 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 5 Sep 2016 13:29:43 +0200

sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()

These 2 exercise independent code paths and need different arguments.

After this change, you call:

  cpudl_clear(cp, cpu);
  cpudl_set(cp, cpu, dl);

instead of:

  cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */);
  cpudl_set(cp, cpu, dl, 1 /* is_valid */);

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@retis.sssup.it
Link: 
http://lkml.kernel.org/r/1471184828-12644-4-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpudeadline.c | 49 +++---
 kernel/sched/cpudeadline.h |  3 ++-
 kernel/sched/deadline.c| 10 +-
 3 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0ace75a..e731190 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -145,16 +145,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
int old_idx, new_cpu;
unsigned long flags;
@@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
WARN_ON(!cpu_present(cpu));
 
raw_spin_lock_irqsave(&cp->lock, flags);
+
old_idx = cp->elements[cpu].idx;
-   if (!is_valid) {
-   /* remove item */
-   if (old_idx == IDX_INVALID) {
-   /*
-* Nothing to remove if old_idx was invalid.
-* This could happen if a rq_offline_dl is
-* called for a CPU without -dl tasks running.
-*/
-   goto out;
-   }
+   if (old_idx == IDX_INVALID) {
+   /*
+* Nothing to remove if old_idx was invalid.
+* This could happen if a rq_offline_dl is
+* called for a CPU without -dl tasks running.
+*/
+   } else {
new_cpu = cp->elements[cp->size - 1].cpu;
cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
cp->elements[old_idx].cpu = new_cpu;
@@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
cpudl_heapify(cp, old_idx);
-   cpumask_set_cpu(cpu, cp->free_cpus);
 
-   goto out;
+   cpumask_set_cpu(cpu, cp->free_cpus);
}
+   raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+   int old_idx;
+   unsigned long flags;
+
+   WARN_ON(!cpu_present(cpu));
 
+   raw_spin_lock_irqsave(&cp->lock, flags);
+
+   old_idx = cp->elements[cpu].idx;
if (old_idx == IDX_INVALID) {
int new_idx = cp->size++;
cp->elements[new_idx].dl = dl;
@@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cpudl_heapify(cp, old_idx);
}
 
-out:
raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
   struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d091f4a..18fb0b8 100644
--- a/kernel/sc

[tip:sched/core] sched/deadline: Make CPU heap faster avoiding real swaps on heapify

2016-09-05 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2
Gitweb: http://git.kernel.org/tip/8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2
Author: Tommaso Cucinotta 
AuthorDate: Sun, 14 Aug 2016 16:27:07 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 5 Sep 2016 13:29:43 +0200

sched/deadline: Make CPU heap faster avoiding real swaps on heapify

This change goes from heapify() ops done by swapping with parent/child
so that the item to fix moves along, to heapify() ops done by just
pulling the parent/child chain by 1 pos, then storing the item to fix
just at the end. On a non-trivial heapify(), this performs roughly half
stores wrt swaps.

This has been measured to achieve up to 10% of speed-up for cpudl_set()
calls, with a randomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@retis.sssup.it
Link: 
http://lkml.kernel.org/r/1471184828-12644-3-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpudeadline.c | 66 +++---
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0acb0d4..0ace75a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,48 +31,72 @@ static inline int right_child(int i)
return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-   int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-   swap(cp->elements[a].cpu, cp->elements[b].cpu);
-   swap(cp->elements[a].dl , cp->elements[b].dl );
-
-   swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (left_child(idx) >= cp->size)
+   return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+   u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+   largest_dl = orig_dl;
 
-   if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-   cp->elements[l].dl))
+   if ((l < cp->size) && dl_time_before(orig_dl,
+   cp->elements[l].dl)) {
largest = l;
-   if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-   cp->elements[r].dl))
+   largest_dl = cp->elements[l].dl;
+   }
+   if ((r < cp->size) && dl_time_before(largest_dl,
+   cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
 
-   /* Push idx down the heap one level and bump one up */
-   cpudl_exchange(cp, largest, idx);
+   /* pull largest child onto idx */
+   cp->elements[idx].cpu = cp->elements[largest].cpu;
+   cp->elements[idx].dl = cp->elements[largest].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+   /* actual push down of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   int p;
+
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (idx == 0)
+   return;
+
+   do {
+   p = parent(idx);
+   if (dl_time_before(orig_dl, cp->elements[p].dl))
+   break;
+   /* pull parent onto idx */
+   cp->elements[idx].cpu = cp->elements[p].cpu;
+   cp->elements[idx].dl = cp->elements[p].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
+   idx = p;
+   } while (idx != 0);
+   /* actual push

[tip:sched/core] sched/deadline: Refactor CPU heap code

2016-09-05 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  126b3b6842cc848fc9880e7816e0a8d743be51f1
Gitweb: http://git.kernel.org/tip/126b3b6842cc848fc9880e7816e0a8d743be51f1
Author: Tommaso Cucinotta 
AuthorDate: Sun, 14 Aug 2016 16:27:06 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 5 Sep 2016 13:29:42 +0200

sched/deadline: Refactor CPU heap code

 1. heapify up factored out in new dedicated function heapify_up()
(avoids repetition of same code)

 2. call to cpudl_change_key() replaced with heapify_up() when
cpudl_set actually inserts a new node in the heap

 3. cpudl_change_key() replaced with heapify() that heapifies up
or down as needed.

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@retis.sssup.it
Link: 
http://lkml.kernel.org/r/1471184828-12644-2-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpudeadline.c | 50 +-
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..0acb0d4 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
@@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
}
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
-
-   if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-   cp->elements[idx].dl = new_dl;
-   cpudl_heapify(cp, idx);
-   } else {
-   cp->elements[idx].dl = new_dl;
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl)) {
+   cpudl_exchange(cp, idx, parent(idx));
+   idx = parent(idx);
}
 }
 
+static void cpudl_heapify(struct cpudl *cp, int idx)
+{
+   if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl))
+   cpudl_heapify_up(cp, idx);
+   else
+   cpudl_heapify_down(cp, idx);
+}
+
 static inline int cpudl_maximum(struct cpudl *cp)
 {
return cp->elements[0].cpu;
@@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->size--;
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
-   while (old_idx > 0 && dl_time_before(
-   cp->elements[parent(old_idx)].dl,
-   cp->elements[old_idx].dl)) {
-   cpudl_exchange(cp, old_idx, parent(old_idx));
-   old_idx = parent(old_idx);
-   }
+   cpudl_heapify(cp, old_idx);
cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify(cp, old_idx);
 
goto out;
}
 
if (old_idx == IDX_INVALID) {
-   cp->size++;
-   cp->elements[cp->size - 1].dl = dl;
-   cp->elements[cp->size - 1].cpu = cpu;
-   cp->elements[cpu].idx = cp->size - 1;
-   cpudl_change_key(cp, cp->size - 1, dl);
+   int new_idx = cp->size++;
+   cp->elements[new_idx].dl = dl;
+   cp->elements[new_idx].cpu = cpu;
+   cp->elements[cpu].idx = new_idx;
+   cpudl_heapify_up(cp, new_idx);
cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
-   cpudl_change_key(cp, old_idx, dl);
+   cp->elements[old_idx].dl = dl;
+   cpudl_heapify(cp, old_idx);
}
 
 out:


SCHED_DEADLINE cpudeadline.{h,c} fixup

2016-08-14 Thread Tommaso Cucinotta
Hi,

this is a rework of the cpudeadline bugfix and speed-up patch-set, that
integrates all comments received so far from Luca, Juri and Peter.

Compared with the previous post, here:
-) I'm keeping out the minimally invasive bugfix, as it's already been
   merged in tip/sched/core
-) I moved some little code refactory around change_key_dl() out of the
  (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just
  changes the heapify_up/down() functions
-) I rebased on top of commit f0b22e39
-) I repeated an extensive set of tests through the framework published
   separately at: https://github.com/tomcucinotta/cpudl-bench
   repeating new no-behavior-change tests, new heap-consistency tests,
   and new a/b benchmarks (I'm working on a new i5 laptop now), results at:
 https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-10.pdf
   highlighting up to a 14% speed-up when averaging over 100K ops. See the
   enclosed README in that repo for more info.

I'm leaving below the original description of all 4 patches.

--
The first patch is a minimally invasive (1-line) fix for the deadline
wrap-around bug. This leaves some weirdness in how cpudl_change_key() is
called. Therefore, the second patch does a minimum of refactory to make
things more explicit and clear.

The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which has been measured to
achieve up to 14% of speed-up for cpudl_set() calls.
This has been measured with a randomly generated workload of 1K,10K,100K
random heap insertions and deletions (75% cpudl_set() calls with is_valid=1
and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs. Benchmarking code at: https://github.com/tomcucinotta/cpudl-bench

Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Any further comment is welcome, thanks!

  Tommaso
--


[RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify

2016-08-14 Thread Tommaso Cucinotta
This change goes from heapify() ops done by swapping with parent/child
so that the item to fix moves along, to heapify() ops done by just
pulling the parent/child chain by 1 pos, then storing the item to fix
just at the end. On a non-trivial heapify(), this performs roughly half
stores wrt swaps.

This has been measured to achieve up to 10% of speed-up for cpudl_set()
calls, with a randomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 66 +++---
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0acb0d4..0ace75a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,48 +31,72 @@ static inline int right_child(int i)
return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-   int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-   swap(cp->elements[a].cpu, cp->elements[b].cpu);
-   swap(cp->elements[a].dl , cp->elements[b].dl );
-
-   swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (left_child(idx) >= cp->size)
+   return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+   u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+   largest_dl = orig_dl;
 
-   if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-   cp->elements[l].dl))
+   if ((l < cp->size) && dl_time_before(orig_dl,
+   cp->elements[l].dl)) {
largest = l;
-   if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-   cp->elements[r].dl))
+   largest_dl = cp->elements[l].dl;
+   }
+   if ((r < cp->size) && dl_time_before(largest_dl,
+   cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
 
-   /* Push idx down the heap one level and bump one up */
-   cpudl_exchange(cp, largest, idx);
+   /* pull largest child onto idx */
+   cp->elements[idx].cpu = cp->elements[largest].cpu;
+   cp->elements[idx].dl = cp->elements[largest].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+   /* actual push down of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   int p;
+
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (idx == 0)
+   return;
+
+   do {
+   p = parent(idx);
+   if (dl_time_before(orig_dl, cp->elements[p].dl))
+   break;
+   /* pull parent onto idx */
+   cp->elements[idx].cpu = cp->elements[p].cpu;
+   cp->elements[idx].dl = cp->elements[p].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
+   idx = p;
+   } while (idx != 0);
+   /* actual push up of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify(struct cpudl *cp, int idx)
-- 
2.7.4



[RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear()

2016-08-14 Thread Tommaso Cucinotta
These 2 exercise independent code paths and need different arguments.
After this change, you call
  cpudl_clear(cp, cpu)
  cpudl_set(cp, cpu, dl)
instead of
  cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */)
  cpudl_set(cp, cpu, dl, 1 /* is_valid */)
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 49 +++---
 kernel/sched/cpudeadline.h |  3 ++-
 kernel/sched/deadline.c| 10 +-
 3 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0ace75a..e731190 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -145,16 +145,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
int old_idx, new_cpu;
unsigned long flags;
@@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
WARN_ON(!cpu_present(cpu));
 
raw_spin_lock_irqsave(&cp->lock, flags);
+
old_idx = cp->elements[cpu].idx;
-   if (!is_valid) {
-   /* remove item */
-   if (old_idx == IDX_INVALID) {
-   /*
-* Nothing to remove if old_idx was invalid.
-* This could happen if a rq_offline_dl is
-* called for a CPU without -dl tasks running.
-*/
-   goto out;
-   }
+   if (old_idx == IDX_INVALID) {
+   /*
+* Nothing to remove if old_idx was invalid.
+* This could happen if a rq_offline_dl is
+* called for a CPU without -dl tasks running.
+*/
+   } else {
new_cpu = cp->elements[cp->size - 1].cpu;
cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
cp->elements[old_idx].cpu = new_cpu;
@@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
cpudl_heapify(cp, old_idx);
-   cpumask_set_cpu(cpu, cp->free_cpus);
 
-   goto out;
+   cpumask_set_cpu(cpu, cp->free_cpus);
}
+   raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+   int old_idx;
+   unsigned long flags;
+
+   WARN_ON(!cpu_present(cpu));
 
+   raw_spin_lock_irqsave(&cp->lock, flags);
+
+   old_idx = cp->elements[cpu].idx;
if (old_idx == IDX_INVALID) {
int new_idx = cp->size++;
cp->elements[new_idx].dl = dl;
@@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cpudl_heapify(cp, old_idx);
}
 
-out:
raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
   struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d091f4a..18fb0b8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -798,7 +798,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 
deadline)
if (dl_rq->earliest_dl.curr == 0 ||
dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
dl_rq->earliest_dl.curr = deadline;
-   cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+   cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
}
 }
 
@@ -813,14 +813,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 
deadline)
if (!d

[RFC PATCH 1/3] sched/deadline: refactor cpu heap code

2016-08-14 Thread Tommaso Cucinotta
1. heapify up factored out in new dedicated function heapify_up()
   (avoids repetition of same code)
2. call to cpudl_change_key() replaced with heapify_up() when
   cpudl_set actually inserts a new node in the heap
3. cpudl_change_key() replaced with heapify() that heapifies up
   or down as needed.

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Reviewed-by: Juri Lelli 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 50 +-
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..0acb0d4 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
@@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
}
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
-
-   if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-   cp->elements[idx].dl = new_dl;
-   cpudl_heapify(cp, idx);
-   } else {
-   cp->elements[idx].dl = new_dl;
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl)) {
+   cpudl_exchange(cp, idx, parent(idx));
+   idx = parent(idx);
}
 }
 
+static void cpudl_heapify(struct cpudl *cp, int idx)
+{
+   if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl))
+   cpudl_heapify_up(cp, idx);
+   else
+   cpudl_heapify_down(cp, idx);
+}
+
 static inline int cpudl_maximum(struct cpudl *cp)
 {
return cp->elements[0].cpu;
@@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->size--;
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
-   while (old_idx > 0 && dl_time_before(
-   cp->elements[parent(old_idx)].dl,
-   cp->elements[old_idx].dl)) {
-   cpudl_exchange(cp, old_idx, parent(old_idx));
-   old_idx = parent(old_idx);
-   }
+   cpudl_heapify(cp, old_idx);
cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify(cp, old_idx);
 
goto out;
}
 
if (old_idx == IDX_INVALID) {
-   cp->size++;
-   cp->elements[cp->size - 1].dl = dl;
-   cp->elements[cp->size - 1].cpu = cpu;
-   cp->elements[cpu].idx = cp->size - 1;
-   cpudl_change_key(cp, cp->size - 1, dl);
+   int new_idx = cp->size++;
+   cp->elements[new_idx].dl = dl;
+   cp->elements[new_idx].cpu = cpu;
+   cp->elements[cpu].idx = new_idx;
+   cpudl_heapify_up(cp, new_idx);
cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
-   cpudl_change_key(cp, old_idx, dl);
+   cp->elements[old_idx].dl = dl;
+   cpudl_heapify(cp, old_idx);
}
 
 out:
-- 
2.7.4



[tip:sched/core] sched/deadline: Fix wrap-around in DL heap

2016-08-10 Thread tip-bot for Tommaso Cucinotta
Commit-ID:  a23eadfae2fd45536a355b785d5a1533e1955c22
Gitweb: http://git.kernel.org/tip/a23eadfae2fd45536a355b785d5a1533e1955c22
Author: Tommaso Cucinotta 
AuthorDate: Tue, 19 Jul 2016 11:44:50 +0200
Committer:  Ingo Molnar 
CommitDate: Wed, 10 Aug 2016 13:32:55 +0200

sched/deadline: Fix wrap-around in DL heap

Current code in cpudeadline.c has a bug in re-heapifying when adding a
new element at the end of the heap, because a deadline value of 0 is
temporarily set in the new elem, then cpudl_change_key() is called
with the actual elem deadline as param.

However, the function compares the new deadline to set with the one
previously in the elem, which is 0.  So, if current absolute deadlines
grew so much to have negative values as s64, the comparison in
cpudl_change_key() makes the wrong decision.  Instead, as from
dl_time_before(), the kernel should handle correctly abs deadlines
wrap-arounds.

This patch fixes the problem with a minimally invasive change that
forces cpudl_change_key() to heapify up in this case.

Signed-off-by: Tommaso Cucinotta 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Luca Abeni 
Cc: Juri Lelli 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1468921493-10054-2-git-send-email-tommaso.cucino...@sssup.it
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cpudeadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..d418449 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
 
if (old_idx == IDX_INVALID) {
cp->size++;
-   cp->elements[cp->size - 1].dl = 0;
+   cp->elements[cp->size - 1].dl = dl;
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
cpudl_change_key(cp, cp->size - 1, dl);


[RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.

2016-07-19 Thread Tommaso Cucinotta
This change achieves up to 10% of speed-up for cpudl_set() calls, as
measured with a andomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 114 +++--
 1 file changed, 69 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-   int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-   swap(cp->elements[a].cpu, cp->elements[b].cpu);
-   swap(cp->elements[a].dl , cp->elements[b].dl );
-
-   swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (left_child(idx) >= cp->size)
+   return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+   u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+   largest_dl = orig_dl;
 
-   if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-   cp->elements[l].dl))
+   if ((l < cp->size) && dl_time_before(orig_dl, 
cp->elements[l].dl)) {
largest = l;
-   if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-   cp->elements[r].dl))
+   largest_dl = cp->elements[l].dl;
+   }
+   if ((r < cp->size) && dl_time_before(largest_dl, 
cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
 
-   /* Push idx down the heap one level and bump one up */
-   cpudl_exchange(cp, largest, idx);
+   /* pull largest child onto idx */
+   cp->elements[idx].cpu = cp->elements[largest].cpu;
+   cp->elements[idx].dl = cp->elements[largest].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+   /* actual push down of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   int p;
+
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (idx == 0)
+   return;
+
+   do {
+   p = parent(idx);
+   if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+   break;
+   /* pull parent onto idx */
+   cp->elements[idx].cpu = cp->elements[p].cpu;
+   cp->elements[idx].dl = cp->elements[p].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
+   idx = p;
+   } while (idx != 0);
+   /* actual push up of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
 {
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+   if (idx == IDX_INVALID)
+   return;
 
-   if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-   cp->elements[idx].dl = new_dl;
-   cpudl_heapify_down(cp, idx);
-   } else {
-   cp->elements[idx].dl = new_dl;
+   if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, 
cp->elements[idx].dl)) {
cpudl_heapify_up(cp, idx);
+   } else {
+   cpudl_heapify_down(cp, idx);
}
 }
 
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu,

[RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear().

2016-07-19 Thread Tommaso Cucinotta
These 2 exercise independent code paths and need different arguments.
Now you call
  cpudl_clear(cp, cpu)
  cpudl_set(cp, cpu, dl)
instead of
  cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */)
  cpudl_set(cp, cpu, dl, 1 /* is_valid */)

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 71 +-
 kernel/sched/cpudeadline.h |  3 +-
 kernel/sched/deadline.c| 10 +++
 3 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 60f933a..0f276bf 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -147,16 +147,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
int old_idx, new_cpu;
unsigned long flags;
@@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
WARN_ON(!cpu_present(cpu));
 
raw_spin_lock_irqsave(&cp->lock, flags);
+
old_idx = cp->elements[cpu].idx;
-   if (!is_valid) {
+   if (old_idx == IDX_INVALID) {
+   /*
+* Nothing to remove if old_idx was invalid.
+* This could happen if a rq_offline_dl is
+* called for a CPU without -dl tasks running.
+*/
+   } else {
/* remove item */
-   if (old_idx == IDX_INVALID) {
-   /*
-* Nothing to remove if old_idx was invalid.
-* This could happen if a rq_offline_dl is
-* called for a CPU without -dl tasks running.
-*/
-   goto out;
-   }
cp->size--;
cp->elements[cpu].idx = IDX_INVALID;
if (old_idx != cp->size) {
@@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->elements[new_cpu].idx = old_idx;
cpudl_heapify(cp, old_idx);
}
-
cpumask_set_cpu(cpu, cp->free_cpus);
+   }
+
+   raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+   int old_idx;
+   unsigned long flags;
+
+   WARN_ON(!cpu_present(cpu));
+
+   raw_spin_lock_irqsave(&cp->lock, flags);
+
+   old_idx = cp->elements[cpu].idx;
+   if (old_idx == IDX_INVALID) {
+   int sz1 = cp->size++;
+   cp->elements[sz1].dl = dl;
+   cp->elements[sz1].cpu = cpu;
+   cp->elements[cpu].idx = sz1;
+   cpudl_heapify_up(cp, sz1);
+
+   cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
-   if (old_idx == IDX_INVALID) {
-   int size1 = cp->size++;
-   cp->elements[size1].dl = dl;
-   cp->elements[size1].cpu = cpu;
-   cp->elements[cpu].idx = size1;
-   cpudl_heapify_up(cp, size1);
-
-   cpumask_clear_cpu(cpu, cp->free_cpus);
-   } else {
-   cp->elements[old_idx].dl = dl;
-   cpudl_heapify(cp, old_idx);
-   }
+   cp->elements[old_idx].dl = dl;
+   cpudl_heapify(cp, old_idx);
}
 
-out:
raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
   struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fcb7f02..f2e8f47 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -795,7 +795,7 @@ static void i

SCHED_DEADLINE cpudeadline.{h,c} fixup

2016-07-19 Thread Tommaso Cucinotta
Hi,

this is a rework of the cpudeadline bugfix and speed-up patch-set, that
integrates all comments received so far from Luca and Juri.

The first patch is a minimally invasive (1-line) fix for the deadline
wrap-around bug. This leaves some weirdness in how cpudl_change_key() is
called. Therefore, the second patch does a minimum of refactory to make
things more explicit and clear.
The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which has been measured to
achieve up to 10% , of speed-up for cpudl_set() calls.
This has been measured with a andomly generated workload of 1K,10K,100K
random heap insertions and deletions (75% cpudl_set() calls with is_valid=1
and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs.

Benchmarking code is available at:
  https://github.com/tomcucinotta/cpudl-bench
Obtained speed-up plot:
  https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl.pdf

Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Please, share your comments, thanks!

  Tommaso



[RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for SCHED_DEADLINE cpu heap

2016-07-19 Thread Tommaso Cucinotta
Current code in cpudeadline.c has a bug in re-heapifying when adding a
new element at the end of the heap, because a deadline value of 0 is
temporarily set in the new elem, then cpudl_change_key() is called
with the actual elem deadline as param. However, the function compares
the new deadline to set with the one previously in the elem, which is
0.  So, if current absolute deadlines grew so much to have negative
values as s64, the comparison in cpudl_change_key() makes the wrong
decision.  Instead, as from dl_time_before(), the kernel should handle
correctly abs deadlines wrap-arounds.

This patch fixes the problem with a minimally invasive change that
forces cpudl_change_key() to heapify up in this case.

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..d418449 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
 
if (old_idx == IDX_INVALID) {
cp->size++;
-   cp->elements[cp->size - 1].dl = 0;
+   cp->elements[cp->size - 1].dl = dl;
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
cpudl_change_key(cp, cp->size - 1, dl);
-- 
2.7.4



[RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory.

2016-07-19 Thread Tommaso Cucinotta
1. heapify up factored out in new dedicated function heapify_up()
   (avoids repeatition of same code)
2. call to cpudl_change_key() replaced with heapify_up() when
   cpudl_set actually inserts a new node in the heap

Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Luca Abeni 
Reviewed-by: Luca Abeni 
Signed-off-by: Tommaso Cucinotta 
---
 kernel/sched/cpudeadline.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
}
 }
 
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl)) {
+   cpudl_exchange(cp, idx, parent(idx));
+   idx = parent(idx);
+   }
+}
+
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
 
if (dl_time_before(new_dl, cp->elements[idx].dl)) {
cp->elements[idx].dl = new_dl;
-   cpudl_heapify(cp, idx);
+   cpudl_heapify_down(cp, idx);
} else {
cp->elements[idx].dl = new_dl;
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   cpudl_heapify_up(cp, idx);
}
 }
 
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->size--;
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
-   while (old_idx > 0 && dl_time_before(
-   cp->elements[parent(old_idx)].dl,
-   cp->elements[old_idx].dl)) {
-   cpudl_exchange(cp, old_idx, parent(old_idx));
-   old_idx = parent(old_idx);
-   }
+   cpudl_heapify_up(cp, old_idx);
cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify(cp, old_idx);
+cpudl_heapify_down(cp, old_idx);
 
goto out;
}
 
if (old_idx == IDX_INVALID) {
-   cp->size++;
-   cp->elements[cp->size - 1].dl = dl;
-   cp->elements[cp->size - 1].cpu = cpu;
-   cp->elements[cpu].idx = cp->size - 1;
-   cpudl_change_key(cp, cp->size - 1, dl);
+   int size1 = cp->size++;
+   cp->elements[size1].dl = dl;
+   cp->elements[size1].cpu = cpu;
+   cp->elements[cpu].idx = size1;
+   cpudl_heapify_up(cp, size1);
cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
cpudl_change_key(cp, old_idx, dl);
-- 
2.7.4



SCHED_DEADLINE cpudeadline.{h,c} fixup

2016-05-19 Thread Tommaso Cucinotta
Hi all,

I took Luca's advice to isolate the deadline wrap-around bugfix with a
first minimally invasive patch (1-line). This leaves some weirdness in
how cpudl_change_key() is called.
Therefore, the second patch does a minimum of refactory to make things
more explicit and clear.
The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which, as said in the previous
post, achieves up to 6% of speed-up for cpudl_set() calls.
Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Please, let me know how this looks like now.

Thanks,

  Tommaso



[RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.

2016-05-19 Thread Tommaso Cucinotta
---
 kernel/sched/cpudeadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..d418449 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
 
if (old_idx == IDX_INVALID) {
cp->size++;
-   cp->elements[cp->size - 1].dl = 0;
+   cp->elements[cp->size - 1].dl = dl;
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
cpudl_change_key(cp, cp->size - 1, dl);
-- 
2.7.4



[RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.

2016-05-19 Thread Tommaso Cucinotta
---
 kernel/sched/cpudeadline.c | 114 +++--
 1 file changed, 69 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-   int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-   swap(cp->elements[a].cpu, cp->elements[b].cpu);
-   swap(cp->elements[a].dl , cp->elements[b].dl );
-
-   swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (left_child(idx) >= cp->size)
+   return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+   u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+   largest_dl = orig_dl;
 
-   if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-   cp->elements[l].dl))
+   if ((l < cp->size) && dl_time_before(orig_dl, 
cp->elements[l].dl)) {
largest = l;
-   if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-   cp->elements[r].dl))
+   largest_dl = cp->elements[l].dl;
+   }
+   if ((r < cp->size) && dl_time_before(largest_dl, 
cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
 
-   /* Push idx down the heap one level and bump one up */
-   cpudl_exchange(cp, largest, idx);
+   /* pull largest child onto idx */
+   cp->elements[idx].cpu = cp->elements[largest].cpu;
+   cp->elements[idx].dl = cp->elements[largest].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+   /* actual push down of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   int p;
+
+   int orig_cpu = cp->elements[idx].cpu;
+   u64 orig_dl = cp->elements[idx].dl;
+
+   if (idx == 0)
+   return;
+
+   do {
+   p = parent(idx);
+   if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+   break;
+   /* pull parent onto idx */
+   cp->elements[idx].cpu = cp->elements[p].cpu;
+   cp->elements[idx].dl = cp->elements[p].dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
+   idx = p;
+   } while (idx != 0);
+   /* actual push up of saved original values orig_* */
+   cp->elements[idx].cpu = orig_cpu;
+   cp->elements[idx].dl = orig_dl;
+   cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
 {
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+   if (idx == IDX_INVALID)
+   return;
 
-   if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-   cp->elements[idx].dl = new_dl;
-   cpudl_heapify_down(cp, idx);
-   } else {
-   cp->elements[idx].dl = new_dl;
+   if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, 
cp->elements[idx].dl)) {
cpudl_heapify_up(cp, idx);
+   } else {
+   cpudl_heapify_down(cp, idx);
}
 }
 
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
 */
goto out;
}
-   new_cpu = cp->elements[cp->size - 1].cpu;
-   cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
-   cp->elements[old_idx].cpu = new_cpu;
cp->size--;
-   cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
-   cpudl_heapify_up(cp, old_idx);
-   cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify_down(cp, old_idx);
-
-   goto out;
-   }
+   if (old_idx != cp->size) {
+   

[RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with h

2016-05-19 Thread Tommaso Cucinotta
---
 kernel/sched/cpudeadline.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
int l, r, largest;
 
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
}
 }
 
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+   cp->elements[idx].dl)) {
+   cpudl_exchange(cp, idx, parent(idx));
+   idx = parent(idx);
+   }
+}
+
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
 
if (dl_time_before(new_dl, cp->elements[idx].dl)) {
cp->elements[idx].dl = new_dl;
-   cpudl_heapify(cp, idx);
+   cpudl_heapify_down(cp, idx);
} else {
cp->elements[idx].dl = new_dl;
-   while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-   cp->elements[idx].dl)) {
-   cpudl_exchange(cp, idx, parent(idx));
-   idx = parent(idx);
-   }
+   cpudl_heapify_up(cp, idx);
}
 }
 
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->size--;
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
-   while (old_idx > 0 && dl_time_before(
-   cp->elements[parent(old_idx)].dl,
-   cp->elements[old_idx].dl)) {
-   cpudl_exchange(cp, old_idx, parent(old_idx));
-   old_idx = parent(old_idx);
-   }
+   cpudl_heapify_up(cp, old_idx);
cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify(cp, old_idx);
+cpudl_heapify_down(cp, old_idx);
 
goto out;
}
 
if (old_idx == IDX_INVALID) {
-   cp->size++;
-   cp->elements[cp->size - 1].dl = dl;
-   cp->elements[cp->size - 1].cpu = cpu;
-   cp->elements[cpu].idx = cp->size - 1;
-   cpudl_change_key(cp, cp->size - 1, dl);
+   int size1 = cp->size++;
+   cp->elements[size1].dl = dl;
+   cp->elements[size1].cpu = cpu;
+   cp->elements[cpu].idx = size1;
+   cpudl_heapify_up(cp, size1);
cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
cpudl_change_key(cp, old_idx, dl);
-- 
2.7.4



[RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments.

2016-05-19 Thread Tommaso Cucinotta
---
 kernel/sched/cpudeadline.c | 71 +-
 kernel/sched/cpudeadline.h |  3 +-
 kernel/sched/deadline.c| 10 +++
 3 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 60f933a..0f276bf 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -147,16 +147,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
int old_idx, new_cpu;
unsigned long flags;
@@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
WARN_ON(!cpu_present(cpu));
 
raw_spin_lock_irqsave(&cp->lock, flags);
+
old_idx = cp->elements[cpu].idx;
-   if (!is_valid) {
+   if (old_idx == IDX_INVALID) {
+   /*
+* Nothing to remove if old_idx was invalid.
+* This could happen if a rq_offline_dl is
+* called for a CPU without -dl tasks running.
+*/
+   } else {
/* remove item */
-   if (old_idx == IDX_INVALID) {
-   /*
-* Nothing to remove if old_idx was invalid.
-* This could happen if a rq_offline_dl is
-* called for a CPU without -dl tasks running.
-*/
-   goto out;
-   }
cp->size--;
cp->elements[cpu].idx = IDX_INVALID;
if (old_idx != cp->size) {
@@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int 
is_valid)
cp->elements[new_cpu].idx = old_idx;
cpudl_heapify(cp, old_idx);
}
-
cpumask_set_cpu(cpu, cp->free_cpus);
+   }
+
+   raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+   int old_idx;
+   unsigned long flags;
+
+   WARN_ON(!cpu_present(cpu));
+
+   raw_spin_lock_irqsave(&cp->lock, flags);
+
+   old_idx = cp->elements[cpu].idx;
+   if (old_idx == IDX_INVALID) {
+   int sz1 = cp->size++;
+   cp->elements[sz1].dl = dl;
+   cp->elements[sz1].cpu = cpu;
+   cp->elements[cpu].idx = sz1;
+   cpudl_heapify_up(cp, sz1);
+
+   cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
-   if (old_idx == IDX_INVALID) {
-   int size1 = cp->size++;
-   cp->elements[size1].dl = dl;
-   cp->elements[size1].cpu = cpu;
-   cp->elements[cpu].idx = size1;
-   cpudl_heapify_up(cp, size1);
-
-   cpumask_clear_cpu(cpu, cp->free_cpus);
-   } else {
-   cp->elements[old_idx].dl = dl;
-   cpudl_heapify(cp, old_idx);
-   }
+   cp->elements[old_idx].dl = dl;
+   cpudl_heapify(cp, old_idx);
}
 
-out:
raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
   struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fcb7f02..f2e8f47 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -795,7 +795,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 
deadline)
if (dl_rq->earliest_dl.curr == 0 ||
dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
dl_rq->earliest_dl.curr = deadline;
-   cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+   cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
}
 }
 
@@ -810,14 +810,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 
deadline)
  

Re: SCHED_DEADLINE cpudeadline.{h,c} fixup

2016-05-17 Thread Tommaso Cucinotta

On 17/05/2016 13:46, luca abeni wrote:

Maybe the ... change can be split in a separate
patch, which is a bugfix (and IMHO uncontroversial)?


Ok, the bugfix alone might look like the attached. Couldn't avoid
the little refactoring of the multiple occurrences of the same loop
up the heap into the heapify_up(), mirroring the heapify() that was
already there (renamed heapify_down() for clarity).

I'll rebase the speed-up patch on top of this, if it's a better approach.

Anyone with further comments?

Thanks again!

    T.
--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso
>From cfaa75eb77843f7da875a54c7e6631b271bf0663 Mon Sep 17 00:00:00 2001
From: Tommaso Cucinotta 
Date: Tue, 17 May 2016 15:54:11 +0200
Subject: [PATCH] Deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.

---
 kernel/sched/cpudeadline.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
 	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
 	}
 }
 
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+			cp->elements[idx].dl)) {
+		cpudl_exchange(cp, idx, parent(idx));
+		idx = parent(idx);
+	}
+}
+
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
 
 	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
 		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
+		cpudl_heapify_down(cp, idx);
 	} else {
 		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-	cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+		cpudl_heapify_up(cp, idx);
 	}
 }
 
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->size--;
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-cp->elements[parent(old_idx)].dl,
-cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
-		}
+		cpudl_heapify_up(cp, old_idx);
 		cpumask_set_cpu(cpu, cp->free_cpus);
-cpudl_heapify(cp, old_idx);
+cpudl_heapify_down(cp, old_idx);
 
 		goto out;
 	}
 
 	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = 0;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
+		int size1 = cp->size++;
+		cp->elements[size1].dl = dl;
+		cp->elements[size1].cpu = cpu;
+		cp->elements[cpu].idx = size1;
+		cpudl_heapify_up(cp, size1);
 		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
 		cpudl_change_key(cp, old_idx, dl);
-- 
2.7.4



SCHED_DEADLINE cpudeadline.{h,c} fixup

2016-05-16 Thread Tommaso Cucinotta

Hi,

looking at the SCHED_DEADLINE code, I spotted an opportunity to
make cpudeadline.c faster, in that we can skip real swaps during
re-heapify()ication of items after addition/removal. As such ops
are done under a domain spinlock, it sounded like an interesting
try.

Indeed, I've got a speed-up of up to ~6% for the cpudl_set() calls
on a randomly generated workload of 1K,10K,100K random insertions
and deletions (75% cpudl_set() calls with is_valid=1 and 25% with
is_valid=0), and randomly generated cpu IDs with 2, 4, ..., 256 CPUs.
Details in the attached plot.

The attached patch does this, along with a minimum of rework of
cpudeadline.c internals, and a final clean-up of the cpudeadline.h
interface (second patch).

The measurements have been made on an Intel Core2 Duo with the CPU
frequency fixed at max, by letting cpudeadline.c be initialized with
various numbers of CPUs, then  making many calls sequentially, taking
the rdtsc among calls, then dumping all numbers through printk(),
and I'm plotting the average of clock ticks between consecutive calls.
[ I can share the benchmarking code as well if needed ]

Also, this fixes what seems to me a bug I noticed comparing the whole
heap contents as handledbut the modified code vs the original one,
insertion by insertion. The problem is in this code:

cp->elements[cp->size - 1].dl = 0;
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
mycpudl_change_key(cp, cp->size - 1, dl);

when fed by an absolute deadline that is so large to have a negative
value as a s64. In such a case, as from dl_time_before(), the kernel
should handle correctly the abs deadline wrap-around, however the
current code in cpudeadline.c goes mad, and doesn't re-heapify
correctly the just inserted element... that said, if these are ns,
such a bug should be hit after a ~292 years of uptime :-D...

I'd be happy to hear comments from others. I can provide additional
info / make additional experiments as needed.

Please, reply-all to this e-mail, I'm not subscribed to linux-kernel@.

Thanks,

Tommaso
--
Tommaso Cucinotta, Computer Engineering PhD
Associate Professor at the Real-Time Systems Laboratory (ReTiS)
Scuola Superiore Sant'Anna, Pisa, Italy
http://retis.sssup.it/people/tommaso
>From ee54c2849f1d9d7f7f8faeb474a61074cae868b9 Mon Sep 17 00:00:00 2001
From: Tommaso Cucinotta 
Date: Thu, 12 May 2016 19:06:37 +0200
Subject: [PATCH 1/2] Make deadline max-heap faster and fix deadline
 wrap-around bug.

---
 kernel/sched/cpudeadline.c | 122 -
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5a75b08..245d929 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,55 +31,91 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-			cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-			cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
+}
+
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	while (idx != 0) {
+		p = parent(idx);
+		if (dl_time_before(cp->elements[idx].dl

Re: [PATCH] sched/deadline: Add sched_dl documentation

2014-01-28 Thread Tommaso Cucinotta
On 28/01/14 09:31, Peter Zijlstra wrote:
>> Exactly. I can remember also a huge period might be harmful, because with it
>> even a tiny utilization may lead to a big runtime that starves the whole non
>> RT tasks for a significant time. 
> 
> Another way to solve that is with explicit slack time scheduling, where
> slack here means the time/utilization not allocated to dl tasks (a quick
> google shows various different usages of slack, including laxity).
> 
> So suppose we have 50% utilization but with a huge period, we could
> schedule the other 50% at a fixed but smaller period and have that run
> the rt/fair/etc tasks.
> 
> So effectively we'll always fill up the utilization to 100% and use the
> 'slack' time as a server for the other classes.

Yesss :-)! Indeed, AFAICR in the AQuoSA API there was the QOS_F_DEFAULT flag

  
http://aquosa.sourceforge.net/aquosa-docs/aquosa-qosres/html/qos__types_8h_source.html

(that could only be set by a  privileged process) just there to allow
for creating a server serving "default" tasks (namely, every non-aquosa task).
This way, you could create for example at system boot time a default
reservation with a precise (runtime, period) for Linux, so that non-aquosa
tasks could have a chance to run in that reservation, even in presence of
a RT reservation with a significantly long runtime.

T.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched/deadline: Add sched_dl documentation

2014-01-24 Thread Tommaso Cucinotta
On 22/01/14 13:51, Peter Zijlstra wrote:
>>> Another possibly extension; one proposed by Ingo; is to demote tasks to
>>> SCHED_OTHER once they exceed their budget instead of the full block they
>>> get now -- we could possibly call this SCHED_FLAG_DL_CBS_SOFT or such.

Soft reservations are also very useful, as in multimedia and adaptive RT  apps
we expect these budgets to be roughly estimated, rather than being carefully
computed WCETs.

For example, in the experimentation with adaptive budget tuning made with Jack,

  http://retis.sssup.it/~tommaso/papers/lac11.php

the best results were achieved just setting the QOS_F_SOFT flag of the old
AQuoSA (apologies for mentioning that old crappy code), that was implementing
exactly downgrade of the task to the regular CFS scheduler for the time in
which the budget was exhausted.
That Jack-related scenario is quite challenging as due to the change in the
required budget due to new Jack clients joining, or varying workloads of
existing clients (e.g., timidity). At least, as far as the intention of being
seamless to applications is kept (apps required no change at all -- only Jack
server and client library were changed). FYI, I wish we could replay those
modifications on top of SCHED_DEADLINE one day :-)...

  http://repo.or.cz/w/jack2.git/shortlog/refs/heads/aquosa

> We should also again talk about what it would take to allow
> unprivileged access to SCHED_DEADLINE. The things I can remember are the
> obvious cap on utilization and a minimum period -- the latter so that we
> don't DoS the system with a metric ton of tiny tasks.
> 
> But I seem to remember there were a few other issues.

Exactly. I can remember also a huge period might be harmful, because with it
even a tiny utilization may lead to a big runtime that starves the whole non
RT tasks for a significant time. Just in case, I had this paper

  http://retis.sssup.it/~tommaso/papers/rtas08.php

discussing the issues I had found, and how they were tackled in the AQuoSA
supervisor. See Section 3.1. Note that the AQuoSA model was probably more
complex due to the possibility of hosting multiple tasks in the same
deadline-driven reservation, and the possibility for parts of the tasks
in a user reservation being capable of being re-wrapped within another
app-specific reservation etc I'd expect the situation to be quite
simpler with SCHED_DEADLINE.

My2c,

Tommaso

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/