Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread luca abeni
Hi Peter,

On Tue, 6 Dec 2016 09:35:01 +0100
Peter Zijlstra  wrote:
[...]
> > This is because of the definition used when CONFIG_SCHED_DEBUG is
> > not defined (I noticed the issue when testing with random kernel
> > configurations).  
> 
> I'm fine changing the definition, just find something that works. The
> current ((void)(x)) thing was to avoid unused complaints -- although
> I'm not sure there were any.

Below is what I came up with... It fixes the build, and seems to work
fine generating no warnings (I tested with gcc 5.4.0). To write this
patch, I re-used some code from include/asm-generic/bug.h, that has no
copyright header, so I just added my signed-off-by (but I am not sure
if this is the correct way to go).


Luca



From 74e67d61c4b98c2498880932b953c65e9653c121 Mon Sep 17 00:00:00 2001
From: Luca Abeni 
Date: Tue, 6 Dec 2016 10:02:28 +0100
Subject: [PATCH 7/7] sched.h: Improve SCHED_WARN_ON() when CONFIG_SCHED_DEBUG 
is not defined

With the current definition of SCHED_WARN_ON(), something like
if (SCHED_WARN_ON(condition)) ...
fails with
error: void value not ignored as it ought to be
 #define SCHED_WARN_ON(x) ((void)(x))
  ^

This patch fixes the issue by using the same code used in WARN_ON()

Signed-off-by: Luca Abeni 
---
 kernel/sched/sched.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef4bdaa..2e96aa4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -19,7 +19,10 @@
 #ifdef CONFIG_SCHED_DEBUG
 #define SCHED_WARN_ON(x)   WARN_ONCE(x, #x)
 #else
-#define SCHED_WARN_ON(x)   ((void)(x))
+#define SCHED_WARN_ON(x)   ({  \
+   int __ret_warn_on = !!(x);  \
+   unlikely(__ret_warn_on);\
+})
 #endif
 
 struct rq;
-- 
2.7.4




Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread luca abeni
Hi Peter,

On Tue, 6 Dec 2016 09:35:01 +0100
Peter Zijlstra  wrote:
[...]
> > This is because of the definition used when CONFIG_SCHED_DEBUG is
> > not defined (I noticed the issue when testing with random kernel
> > configurations).  
> 
> I'm fine changing the definition, just find something that works. The
> current ((void)(x)) thing was to avoid unused complaints -- although
> I'm not sure there were any.

Below is what I came up with... It fixes the build, and seems to work
fine generating no warnings (I tested with gcc 5.4.0). To write this
patch, I re-used some code from include/asm-generic/bug.h, that has no
copyright header, so I just added my signed-off-by (but I am not sure
if this is the correct way to go).


Luca



From 74e67d61c4b98c2498880932b953c65e9653c121 Mon Sep 17 00:00:00 2001
From: Luca Abeni 
Date: Tue, 6 Dec 2016 10:02:28 +0100
Subject: [PATCH 7/7] sched.h: Improve SCHED_WARN_ON() when CONFIG_SCHED_DEBUG 
is not defined

With the current definition of SCHED_WARN_ON(), something like
if (SCHED_WARN_ON(condition)) ...
fails with
error: void value not ignored as it ought to be
 #define SCHED_WARN_ON(x) ((void)(x))
  ^

This patch fixes the issue by using the same code used in WARN_ON()

Signed-off-by: Luca Abeni 
---
 kernel/sched/sched.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef4bdaa..2e96aa4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -19,7 +19,10 @@
 #ifdef CONFIG_SCHED_DEBUG
 #define SCHED_WARN_ON(x)   WARN_ONCE(x, #x)
 #else
-#define SCHED_WARN_ON(x)   ((void)(x))
+#define SCHED_WARN_ON(x)   ({  \
+   int __ret_warn_on = !!(x);  \
+   unlikely(__ret_warn_on);\
+})
 #endif
 
 struct rq;
-- 
2.7.4




Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread luca abeni
On Tue, 6 Dec 2016 09:35:01 +0100
Peter Zijlstra  wrote:

> On Mon, Dec 05, 2016 at 11:30:05PM +0100, luca abeni wrote:
> > Hi Peter,
> > 
> > On Fri, 18 Nov 2016 15:23:59 +0100
> > Peter Zijlstra  wrote:
> > [...]  
> > >   u64 running_bw;
> > > 
> > > static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) {
> > >   u64 old = dl_rq->running_bw;
> > > 
> > >   dl_rq->running_bw += dl_se->dl_bw;
> > >   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> > > }
> > > 
> > > static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) {
> > >   u64 old = dl_rq->running_bw;
> > > 
> > >   dl_rq->running_bw -= dl_se->dl_bw;
> > >   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> > > }  
> > 
> > I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
> > underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw =
> > 0" (to avoid using nonsensical "running_bw" values), but I see that
> > "SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
> > difference respect to "SCHED_WARN()").  
> 
> There's a SCHED_WARN? Did you mean to say WARN_ON()?

Sorry, I managed to confuse myself... I was thinking about WARN_ON()
(the one I used in the previous version of my patches).

> And yes, mostly by accident I think, I'm not a big user of that
> pattern and neglected it when I did SCHED_WARN_ON().

You mean the "if(WARN(...))" pattern? I think it was suggested in a
previous round of reviews.


> > This is because of the definition used when CONFIG_SCHED_DEBUG is
> > not defined (I noticed the issue when testing with random kernel
> > configurations).  
> 
> I'm fine changing the definition, just find something that works. The
> current ((void)(x)) thing was to avoid unused complaints -- although
> I'm not sure there were any.

Ok; I'll see if I manage to find a working definition.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread luca abeni
On Tue, 6 Dec 2016 09:35:01 +0100
Peter Zijlstra  wrote:

> On Mon, Dec 05, 2016 at 11:30:05PM +0100, luca abeni wrote:
> > Hi Peter,
> > 
> > On Fri, 18 Nov 2016 15:23:59 +0100
> > Peter Zijlstra  wrote:
> > [...]  
> > >   u64 running_bw;
> > > 
> > > static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) {
> > >   u64 old = dl_rq->running_bw;
> > > 
> > >   dl_rq->running_bw += dl_se->dl_bw;
> > >   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> > > }
> > > 
> > > static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) {
> > >   u64 old = dl_rq->running_bw;
> > > 
> > >   dl_rq->running_bw -= dl_se->dl_bw;
> > >   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> > > }  
> > 
> > I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
> > underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw =
> > 0" (to avoid using nonsensical "running_bw" values), but I see that
> > "SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
> > difference respect to "SCHED_WARN()").  
> 
> There's a SCHED_WARN? Did you mean to say WARN_ON()?

Sorry, I managed to confuse myself... I was thinking about WARN_ON()
(the one I used in the previous version of my patches).

> And yes, mostly by accident I think, I'm not a big user of that
> pattern and neglected it when I did SCHED_WARN_ON().

You mean the "if(WARN(...))" pattern? I think it was suggested in a
previous round of reviews.


> > This is because of the definition used when CONFIG_SCHED_DEBUG is
> > not defined (I noticed the issue when testing with random kernel
> > configurations).  
> 
> I'm fine changing the definition, just find something that works. The
> current ((void)(x)) thing was to avoid unused complaints -- although
> I'm not sure there were any.

Ok; I'll see if I manage to find a working definition.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread Peter Zijlstra
On Mon, Dec 05, 2016 at 11:30:05PM +0100, luca abeni wrote:
> Hi Peter,
> 
> On Fri, 18 Nov 2016 15:23:59 +0100
> Peter Zijlstra  wrote:
> [...]
> > u64 running_bw;
> > 
> > static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) {
> > u64 old = dl_rq->running_bw;
> > 
> > dl_rq->running_bw += dl_se->dl_bw;
> > SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> > }
> > 
> > static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) {
> > u64 old = dl_rq->running_bw;
> > 
> > dl_rq->running_bw -= dl_se->dl_bw;
> > SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> > }
> 
> I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
> underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw = 0" (to
> avoid using nonsensical "running_bw" values), but I see that
> "SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
> difference respect to "SCHED_WARN()").

There's a SCHED_WARN? Did you mean to say WARN_ON()?

And yes, mostly by accident I think, I'm not a big user of that pattern
and neglected it when I did SCHED_WARN_ON().

> This is because of the definition used when CONFIG_SCHED_DEBUG is not
> defined (I noticed the issue when testing with random kernel
> configurations).

I'm fine changing the definition, just find something that works. The
current ((void)(x)) thing was to avoid unused complaints -- although I'm
not sure there were any.


Re: [RFC v3 1/6] Track the active utilisation

2016-12-06 Thread Peter Zijlstra
On Mon, Dec 05, 2016 at 11:30:05PM +0100, luca abeni wrote:
> Hi Peter,
> 
> On Fri, 18 Nov 2016 15:23:59 +0100
> Peter Zijlstra  wrote:
> [...]
> > u64 running_bw;
> > 
> > static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) {
> > u64 old = dl_rq->running_bw;
> > 
> > dl_rq->running_bw += dl_se->dl_bw;
> > SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> > }
> > 
> > static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) {
> > u64 old = dl_rq->running_bw;
> > 
> > dl_rq->running_bw -= dl_se->dl_bw;
> > SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> > }
> 
> I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
> underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw = 0" (to
> avoid using nonsensical "running_bw" values), but I see that
> "SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
> difference respect to "SCHED_WARN()").

There's a SCHED_WARN? Did you mean to say WARN_ON()?

And yes, mostly by accident I think, I'm not a big user of that pattern
and neglected it when I did SCHED_WARN_ON().

> This is because of the definition used when CONFIG_SCHED_DEBUG is not
> defined (I noticed the issue when testing with random kernel
> configurations).

I'm fine changing the definition, just find something that works. The
current ((void)(x)) thing was to avoid unused complaints -- although I'm
not sure there were any.


Re: [RFC v3 1/6] Track the active utilisation

2016-12-05 Thread luca abeni
Hi Peter,

On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:
[...]
>   u64 running_bw;
> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }

I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw = 0" (to
avoid using nonsensical "running_bw" values), but I see that
"SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
difference respect to "SCHED_WARN()").
This is because of the definition used when CONFIG_SCHED_DEBUG is not
defined (I noticed the issue when testing with random kernel
configurations).

Is this expected? If yes, what should I do in this case? Something like
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;
?
Or something else?


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-12-05 Thread luca abeni
Hi Peter,

On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:
[...]
>   u64 running_bw;
> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }

I wanted to change "SCHED_WARN_ON(dl_rq->running_bw > old); /*
underflow */" into "if (SCHED_WARN_ON(...)) dl_rq->running_bw = 0" (to
avoid using nonsensical "running_bw" values), but I see that
"SCHED_WARN_ON()" cannot be used inside an if (this seems to be a
difference respect to "SCHED_WARN()").
This is because of the definition used when CONFIG_SCHED_DEBUG is not
defined (I noticed the issue when testing with random kernel
configurations).

Is this expected? If yes, what should I do in this case? Something like
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;
?
Or something else?


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Steven Rostedt
On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:

 
> That said; there's something to be said for:
> 
>   u64 running_bw;
> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
> {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
> {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }
> 

Acked-by: me

-- Steve



Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Steven Rostedt
On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:

 
> That said; there's something to be said for:
> 
>   u64 running_bw;
> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
> {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
> {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }
> 

Acked-by: me

-- Steve



Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Fri, Nov 18, 2016 at 04:10:17PM +0100, luca abeni wrote:

> Ok; I originally made it signed because I wanted the "running_bw < 0"
> check, but I can change it to "dl_rq->running_bw > old" (I did not
> think about it).

Happy accident of me staring at unsigned over/under-flow for the past
several days :-)


Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Fri, Nov 18, 2016 at 04:10:17PM +0100, luca abeni wrote:

> Ok; I originally made it signed because I wanted the "running_bw < 0"
> check, but I can change it to "dl_rq->running_bw > old" (I did not
> think about it).

Happy accident of me staring at unsigned over/under-flow for the past
several days :-)


Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread luca abeni
On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:

> On Tue, Oct 25, 2016 at 09:58:11AM -0400, Steven Rostedt wrote:
> 
> > I agree with Daniel, especially since I don't usually trust the
> > compiler. And the added variable is more of a distraction as it
> > doesn't seem to have any real purpose.  
> 
> I don't think there's anything here to trust the compiler on. Either
> it inlines or it doesn't, it should generate 'correct' code either
> way.
> 
> If it doesn't inline, its a dumb compiler and it will make these dumb
> decisions throughout the tree and your kernel will be slow, not my
> problem really ;-)
> 
> That said, I agree that the single line thing is actually easier to
> read.

Ok; I already made that change locally.


> 
> That said; there's something to be said for:
> 
>   u64 running_bw;

Ok; I originally made it signed because I wanted the "running_bw < 0"
check, but I can change it to "dl_rq->running_bw > old" (I did not
think about it).

I'll make these changes locally ASAP.


Thanks,
Luca

> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }
> 
> 



Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread luca abeni
On Fri, 18 Nov 2016 15:23:59 +0100
Peter Zijlstra  wrote:

> On Tue, Oct 25, 2016 at 09:58:11AM -0400, Steven Rostedt wrote:
> 
> > I agree with Daniel, especially since I don't usually trust the
> > compiler. And the added variable is more of a distraction as it
> > doesn't seem to have any real purpose.  
> 
> I don't think there's anything here to trust the compiler on. Either
> it inlines or it doesn't, it should generate 'correct' code either
> way.
> 
> If it doesn't inline, its a dumb compiler and it will make these dumb
> decisions throughout the tree and your kernel will be slow, not my
> problem really ;-)
> 
> That said, I agree that the single line thing is actually easier to
> read.

Ok; I already made that change locally.


> 
> That said; there's something to be said for:
> 
>   u64 running_bw;

Ok; I originally made it signed because I wanted the "running_bw < 0"
check, but I can change it to "dl_rq->running_bw > old" (I did not
think about it).

I'll make these changes locally ASAP.


Thanks,
Luca

> 
> static void add_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw += dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
> }
> 
> static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> dl_rq *dl_rq) {
>   u64 old = dl_rq->running_bw;
> 
>   dl_rq->running_bw -= dl_se->dl_bw;
>   SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> }
> 
> 



Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread luca abeni
Hi Peter,

On Fri, 18 Nov 2016 14:55:54 +0100
Peter Zijlstra  wrote:

> On Mon, Oct 24, 2016 at 04:06:33PM +0200, Luca Abeni wrote:
> 
> > @@ -498,6 +514,8 @@ static void update_dl_entity(struct
> > sched_dl_entity *dl_se, struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > struct rq *rq = rq_of_dl_rq(dl_rq);
> >  
> > +   add_running_bw(dl_se, dl_rq);
> > +
> > if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> > dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
> > dl_se->deadline = rq_clock(rq) +
> > pi_se->dl_deadline; @@ -947,14 +965,19 @@ static void
> > enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> > return; }
> >  
> > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > +   add_running_bw(>dl, >dl);
> > +
> > /*
> >  * If p is throttled, we do nothing. In fact, if it
> > exhausted
> >  * its budget it needs a replenishment and, since it now
> > is on
> >  * its rq, the bandwidth timer callback (which clearly has
> > not
> >  * run yet) will take care of this.
> >  */
> > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > +   add_running_bw(>dl, >dl);
> > return;
> > +   }
> >  
> > enqueue_dl_entity(>dl, pi_se, flags);
> >
> 
> I realize the enqueue path is a bit of a maze, but this hurts my head.
> 
> Isn't there anything we can do to streamline this a bit?
> 
> Maybe move the add_running_bw() from update_dl_entity() to the
> ENQUEUE_WAKEUP branch in enqueue_dl_entity()? Because that's what you
> really want, isn't it? Its not actually related to recomputing the
> absolute deadline.

Right... I'll change the code in this way. I am currently modifying
this code (because after Juri's review I realized that there is an
issue - see last email exchange with Juri); I'll make this change as
soon as the code stabilizes.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread luca abeni
Hi Peter,

On Fri, 18 Nov 2016 14:55:54 +0100
Peter Zijlstra  wrote:

> On Mon, Oct 24, 2016 at 04:06:33PM +0200, Luca Abeni wrote:
> 
> > @@ -498,6 +514,8 @@ static void update_dl_entity(struct
> > sched_dl_entity *dl_se, struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > struct rq *rq = rq_of_dl_rq(dl_rq);
> >  
> > +   add_running_bw(dl_se, dl_rq);
> > +
> > if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> > dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
> > dl_se->deadline = rq_clock(rq) +
> > pi_se->dl_deadline; @@ -947,14 +965,19 @@ static void
> > enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> > return; }
> >  
> > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > +   add_running_bw(>dl, >dl);
> > +
> > /*
> >  * If p is throttled, we do nothing. In fact, if it
> > exhausted
> >  * its budget it needs a replenishment and, since it now
> > is on
> >  * its rq, the bandwidth timer callback (which clearly has
> > not
> >  * run yet) will take care of this.
> >  */
> > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > +   add_running_bw(>dl, >dl);
> > return;
> > +   }
> >  
> > enqueue_dl_entity(>dl, pi_se, flags);
> >
> 
> I realize the enqueue path is a bit of a maze, but this hurts my head.
> 
> Isn't there anything we can do to streamline this a bit?
> 
> Maybe move the add_running_bw() from update_dl_entity() to the
> ENQUEUE_WAKEUP branch in enqueue_dl_entity()? Because that's what you
> really want, isn't it? Its not actually related to recomputing the
> absolute deadline.

Right... I'll change the code in this way. I am currently modifying
this code (because after Juri's review I realized that there is an
issue - see last email exchange with Juri); I'll make this change as
soon as the code stabilizes.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Tue, Nov 08, 2016 at 05:56:35PM +, Juri Lelli wrote:
> Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> (which btw can be actually put together with an or condition), so I
> don't think that any of those turn out to be true when the task dies.
> Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> handled in finish_task_switch(), so I don't think we are taking care of
> the "task is dying" condition.
> 
> Peter, does what I'm saying make any sense? :)

do_task_dead():
__set_current_state(TASK_DEAD);
schedule():
if (prev->state)
deactivate_task(DEQUEUE_SLEEP);




Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Tue, Nov 08, 2016 at 05:56:35PM +, Juri Lelli wrote:
> Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> (which btw can be actually put together with an or condition), so I
> don't think that any of those turn out to be true when the task dies.
> Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> handled in finish_task_switch(), so I don't think we are taking care of
> the "task is dying" condition.
> 
> Peter, does what I'm saying make any sense? :)

do_task_dead():
__set_current_state(TASK_DEAD);
schedule():
if (prev->state)
deactivate_task(DEQUEUE_SLEEP);




Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Tue, Oct 25, 2016 at 09:58:11AM -0400, Steven Rostedt wrote:

> I agree with Daniel, especially since I don't usually trust the
> compiler. And the added variable is more of a distraction as it doesn't
> seem to have any real purpose.

I don't think there's anything here to trust the compiler on. Either
it inlines or it doesn't, it should generate 'correct' code either way.

If it doesn't inline, its a dumb compiler and it will make these dumb
decisions throughout the tree and your kernel will be slow, not my
problem really ;-)

That said, I agree that the single line thing is actually easier to
read.

That said; there's something to be said for:

u64 running_bw;

static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
u64 old = dl_rq->running_bw;

dl_rq->running_bw += dl_se->dl_bw;
SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
}

static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
u64 old = dl_rq->running_bw;

dl_rq->running_bw -= dl_se->dl_bw;
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
}




Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Tue, Oct 25, 2016 at 09:58:11AM -0400, Steven Rostedt wrote:

> I agree with Daniel, especially since I don't usually trust the
> compiler. And the added variable is more of a distraction as it doesn't
> seem to have any real purpose.

I don't think there's anything here to trust the compiler on. Either
it inlines or it doesn't, it should generate 'correct' code either way.

If it doesn't inline, its a dumb compiler and it will make these dumb
decisions throughout the tree and your kernel will be slow, not my
problem really ;-)

That said, I agree that the single line thing is actually easier to
read.

That said; there's something to be said for:

u64 running_bw;

static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
u64 old = dl_rq->running_bw;

dl_rq->running_bw += dl_se->dl_bw;
SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
}

static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
u64 old = dl_rq->running_bw;

dl_rq->running_bw -= dl_se->dl_bw;
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
}




Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Mon, Oct 24, 2016 at 04:06:33PM +0200, Luca Abeni wrote:

> @@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity 
> *dl_se,
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>   struct rq *rq = rq_of_dl_rq(dl_rq);
>  
> + add_running_bw(dl_se, dl_rq);
> +
>   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
>   dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>   return;
>   }
>  
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + add_running_bw(>dl, >dl);
> +
>   /*
>* If p is throttled, we do nothing. In fact, if it exhausted
>* its budget it needs a replenishment and, since it now is on
>* its rq, the bandwidth timer callback (which clearly has not
>* run yet) will take care of this.
>*/
> - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> + add_running_bw(>dl, >dl);
>   return;
> + }
>  
>   enqueue_dl_entity(>dl, pi_se, flags);
>  

I realize the enqueue path is a bit of a maze, but this hurts my head.

Isn't there anything we can do to streamline this a bit?

Maybe move the add_running_bw() from update_dl_entity() to the
ENQUEUE_WAKEUP branch in enqueue_dl_entity()? Because that's what you
really want, isn't it? Its not actually related to recomputing the
absolute deadline.

> @@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>  {
>   update_curr_dl(rq);
>   __dequeue_task_dl(rq, p, flags);
> +
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + sub_running_bw(>dl, >dl);
> +
> + if (flags & DEQUEUE_SLEEP)
> + sub_running_bw(>dl, >dl);
>  }

We could look at adding more enqueue/dequeue flags to streamline this a
bit, bit lets not do that now.



Re: [RFC v3 1/6] Track the active utilisation

2016-11-18 Thread Peter Zijlstra
On Mon, Oct 24, 2016 at 04:06:33PM +0200, Luca Abeni wrote:

> @@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity 
> *dl_se,
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>   struct rq *rq = rq_of_dl_rq(dl_rq);
>  
> + add_running_bw(dl_se, dl_rq);
> +
>   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
>   dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>   return;
>   }
>  
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + add_running_bw(>dl, >dl);
> +
>   /*
>* If p is throttled, we do nothing. In fact, if it exhausted
>* its budget it needs a replenishment and, since it now is on
>* its rq, the bandwidth timer callback (which clearly has not
>* run yet) will take care of this.
>*/
> - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> + add_running_bw(>dl, >dl);
>   return;
> + }
>  
>   enqueue_dl_entity(>dl, pi_se, flags);
>  

I realize the enqueue path is a bit of a maze, but this hurts my head.

Isn't there anything we can do to streamline this a bit?

Maybe move the add_running_bw() from update_dl_entity() to the
ENQUEUE_WAKEUP branch in enqueue_dl_entity()? Because that's what you
really want, isn't it? Its not actually related to recomputing the
absolute deadline.

> @@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>  {
>   update_curr_dl(rq);
>   __dequeue_task_dl(rq, p, flags);
> +
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + sub_running_bw(>dl, >dl);
> +
> + if (flags & DEQUEUE_SLEEP)
> + sub_running_bw(>dl, >dl);
>  }

We could look at adding more enqueue/dequeue flags to streamline this a
bit, bit lets not do that now.



Re: [RFC v3 1/6] Track the active utilisation

2016-11-09 Thread luca abeni
On Tue, 8 Nov 2016 17:56:35 +
Juri Lelli  wrote:
[...]
> > > > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > > > task_struct *p, int flags)
> > > > return;
> > > > }
> > > >  
> > > > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > > > +   add_running_bw(>dl, >dl);
> > > > +
> > > > /*
> > > >  * If p is throttled, we do nothing. In fact, if it exhausted
> > > >  * its budget it needs a replenishment and, since it now is on
> > > >  * its rq, the bandwidth timer callback (which clearly has not
> > > >  * run yet) will take care of this.
> > > >  */
> > > > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > > > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > > > +   add_running_bw(>dl, >dl);
> > > 
> > > Don't rememeber if we discussed this already, but do we need to add the 
> > > bw here
> > > even if the task is not actually enqueued until after the replenishment 
> > > timer
> > > fires?  
> > I think yes... The active utilization does not depend on the fact that the 
> > task
> > is on the runqueue or not, but depends on the task's state (in GRUB 
> > parlance,
> > "inactive" vs "active contending"). In other words, even when a task is 
> > throttled
> > its utilization must be counted in the active utilization.
> >   
> 
> OK. Could you add a comment about this point please (so that I don't
> forget again :)?
So, I just changed the comment in

/*
 * If p is throttled, we do not enqueue it. In fact, if it exhausted
 * its budget it needs a replenishment and, since it now is on
 * its rq, the bandwidth timer callback (which clearly has not
 * run yet) will take care of this.
 * However, the active utilization does not depend on the fact
 * that the task is on the runqueue or not (but depends on the
 * task's state - in GRUB parlance, "inactive" vs "active contending").
 * In other words, even if a task is throttled its utilization must
 * be counted in the active utilization; hence, we need to call
 * add_running_bw().
 */

Is this ok?


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-09 Thread luca abeni
On Tue, 8 Nov 2016 17:56:35 +
Juri Lelli  wrote:
[...]
> > > > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > > > task_struct *p, int flags)
> > > > return;
> > > > }
> > > >  
> > > > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > > > +   add_running_bw(>dl, >dl);
> > > > +
> > > > /*
> > > >  * If p is throttled, we do nothing. In fact, if it exhausted
> > > >  * its budget it needs a replenishment and, since it now is on
> > > >  * its rq, the bandwidth timer callback (which clearly has not
> > > >  * run yet) will take care of this.
> > > >  */
> > > > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > > > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > > > +   add_running_bw(>dl, >dl);
> > > 
> > > Don't rememeber if we discussed this already, but do we need to add the 
> > > bw here
> > > even if the task is not actually enqueued until after the replenishment 
> > > timer
> > > fires?  
> > I think yes... The active utilization does not depend on the fact that the 
> > task
> > is on the runqueue or not, but depends on the task's state (in GRUB 
> > parlance,
> > "inactive" vs "active contending"). In other words, even when a task is 
> > throttled
> > its utilization must be counted in the active utilization.
> >   
> 
> OK. Could you add a comment about this point please (so that I don't
> forget again :)?
So, I just changed the comment in

/*
 * If p is throttled, we do not enqueue it. In fact, if it exhausted
 * its budget it needs a replenishment and, since it now is on
 * its rq, the bandwidth timer callback (which clearly has not
 * run yet) will take care of this.
 * However, the active utilization does not depend on the fact
 * that the task is on the runqueue or not (but depends on the
 * task's state - in GRUB parlance, "inactive" vs "active contending").
 * In other words, even if a task is throttled its utilization must
 * be counted in the active utilization; hence, we need to call
 * add_running_bw().
 */

Is this ok?


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-09 Thread luca abeni
On Tue, 8 Nov 2016 20:02:29 +
Juri Lelli  wrote:
[...]
> > > So, it actually matters for next patch,
> > > not here. But, maybe we want to do things clean from start?  
> > You mean, because patch 2/6 adds
> > +   if (hrtimer_active(>dl.inactive_timer)) {
> > +   raw_spin_lock_irq(_rq(p)->lock);
> > +   sub_running_bw(>dl, dl_rq_of_se(>dl));
> > +   raw_spin_unlock_irq(_rq(p)->lock);
> > +   }
> > in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
> > is wrong :). I am trying to remember why it is there, but I cannot find
> > any reason... In the next days, I'll run some tests to check if that
> > hunk is actually needed. If yes, then I'll modify patch 1/6 as you
> > suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
> > not do this change to patch 1/6... Is this ok?
> >   
> 
> I guess yes, if we don't need to differentiate.
Ok; so, I ran some tests (and I found some old notes of mine). The
modifications to task_dead_dl() mentioned above are not actually needed;
I added them as a preparation for a change needed by patch 3... But I
now think this was an error; I am reworking this part of the code
(removing changes from task_dead_dl() and adding a "p->state == TASK_DEAD"
check in the inactive timer handler).

I'll post an update for patches 2 and 3 in few days, after I finish
some more tests.



Luca

> Maybe just add a comment as I  am saying above?
> 
> Thanks,
> 
> - Juri



Re: [RFC v3 1/6] Track the active utilisation

2016-11-09 Thread luca abeni
On Tue, 8 Nov 2016 20:02:29 +
Juri Lelli  wrote:
[...]
> > > So, it actually matters for next patch,
> > > not here. But, maybe we want to do things clean from start?  
> > You mean, because patch 2/6 adds
> > +   if (hrtimer_active(>dl.inactive_timer)) {
> > +   raw_spin_lock_irq(_rq(p)->lock);
> > +   sub_running_bw(>dl, dl_rq_of_se(>dl));
> > +   raw_spin_unlock_irq(_rq(p)->lock);
> > +   }
> > in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
> > is wrong :). I am trying to remember why it is there, but I cannot find
> > any reason... In the next days, I'll run some tests to check if that
> > hunk is actually needed. If yes, then I'll modify patch 1/6 as you
> > suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
> > not do this change to patch 1/6... Is this ok?
> >   
> 
> I guess yes, if we don't need to differentiate.
Ok; so, I ran some tests (and I found some old notes of mine). The
modifications to task_dead_dl() mentioned above are not actually needed;
I added them as a preparation for a change needed by patch 3... But I
now think this was an error; I am reworking this part of the code
(removing changes from task_dead_dl() and adding a "p->state == TASK_DEAD"
check in the inactive timer handler).

I'll post an update for patches 2 and 3 in few days, after I finish
some more tests.



Luca

> Maybe just add a comment as I  am saying above?
> 
> Thanks,
> 
> - Juri



Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 08/11/16 20:09, Luca Abeni wrote:
> Hi again,
> 
> On Tue, 8 Nov 2016 18:53:09 +
> Juri Lelli  wrote:
> [...]
> > > > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > > > handled in finish_task_switch(), so I don't think we are taking
> > > > care of the "task is dying" condition.
> > > Ok, so I am missing something... The state is set to TASK_DEAD, and
> > > then schedule() is called... So, __schedule() sees the dying task as
> > > "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> > > After that, finish_task_switch() calls task_dead_dl(). Is this
> > > wrong? If not, why aren't we taking care of the "task is dying"
> > > condition?
> > > 
> > 
> > No, I think you are right. But, semantically this cleanup goes in
> > task_dead_dl(), IMHO.
> Just to be sure I understand correctly: you suggest to add a check for
> "state == TASK_DEAD" (skipping the cleanup if the condition is true) in
> dequeue_task_dl(), and to add a sub_running_bw() in task_dead_dl()...
> Is this understanding correct?

This is more ugly, I know. It makes probably sense though if we then need it in
the next patch. But you are saying the contrary (we don't actually need it), so
in that case we might just want to add a comment here explaining why we handle
the "task is dying" case together with "task is going to sleep" (so that I
don't forget? :).

> 
> > It's most probably moot if it complicates
> > things, but it might be helpful to differentiate the case between a
> > task that is actually going to sleep (and for which we want to
> > activate the timer) and a task that is dying (and for which we want
> > to release bw immediately).
> I suspect the two cases should be handled in the same way :)
> 
> > So, it actually matters for next patch,
> > not here. But, maybe we want to do things clean from start?
> You mean, because patch 2/6 adds
> +   if (hrtimer_active(>dl.inactive_timer)) {
> +   raw_spin_lock_irq(_rq(p)->lock);
> +   sub_running_bw(>dl, dl_rq_of_se(>dl));
> +   raw_spin_unlock_irq(_rq(p)->lock);
> +   }
> in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
> is wrong :). I am trying to remember why it is there, but I cannot find
> any reason... In the next days, I'll run some tests to check if that
> hunk is actually needed. If yes, then I'll modify patch 1/6 as you
> suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
> not do this change to patch 1/6... Is this ok?
> 

I guess yes, if we don't need to differentiate. Maybe just add a comment as I
am saying above?

Thanks,

- Juri


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 08/11/16 20:09, Luca Abeni wrote:
> Hi again,
> 
> On Tue, 8 Nov 2016 18:53:09 +
> Juri Lelli  wrote:
> [...]
> > > > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > > > handled in finish_task_switch(), so I don't think we are taking
> > > > care of the "task is dying" condition.
> > > Ok, so I am missing something... The state is set to TASK_DEAD, and
> > > then schedule() is called... So, __schedule() sees the dying task as
> > > "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> > > After that, finish_task_switch() calls task_dead_dl(). Is this
> > > wrong? If not, why aren't we taking care of the "task is dying"
> > > condition?
> > > 
> > 
> > No, I think you are right. But, semantically this cleanup goes in
> > task_dead_dl(), IMHO.
> Just to be sure I understand correctly: you suggest to add a check for
> "state == TASK_DEAD" (skipping the cleanup if the condition is true) in
> dequeue_task_dl(), and to add a sub_running_bw() in task_dead_dl()...
> Is this understanding correct?

This is more ugly, I know. It makes probably sense though if we then need it in
the next patch. But you are saying the contrary (we don't actually need it), so
in that case we might just want to add a comment here explaining why we handle
the "task is dying" case together with "task is going to sleep" (so that I
don't forget? :).

> 
> > It's most probably moot if it complicates
> > things, but it might be helpful to differentiate the case between a
> > task that is actually going to sleep (and for which we want to
> > activate the timer) and a task that is dying (and for which we want
> > to release bw immediately).
> I suspect the two cases should be handled in the same way :)
> 
> > So, it actually matters for next patch,
> > not here. But, maybe we want to do things clean from start?
> You mean, because patch 2/6 adds
> +   if (hrtimer_active(>dl.inactive_timer)) {
> +   raw_spin_lock_irq(_rq(p)->lock);
> +   sub_running_bw(>dl, dl_rq_of_se(>dl));
> +   raw_spin_unlock_irq(_rq(p)->lock);
> +   }
> in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
> is wrong :). I am trying to remember why it is there, but I cannot find
> any reason... In the next days, I'll run some tests to check if that
> hunk is actually needed. If yes, then I'll modify patch 1/6 as you
> suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
> not do this change to patch 1/6... Is this ok?
> 

I guess yes, if we don't need to differentiate. Maybe just add a comment as I
am saying above?

Thanks,

- Juri


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Luca Abeni
Hi again,

On Tue, 8 Nov 2016 18:53:09 +
Juri Lelli  wrote:
[...]
> > > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > > handled in finish_task_switch(), so I don't think we are taking
> > > care of the "task is dying" condition.
> > Ok, so I am missing something... The state is set to TASK_DEAD, and
> > then schedule() is called... So, __schedule() sees the dying task as
> > "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> > After that, finish_task_switch() calls task_dead_dl(). Is this
> > wrong? If not, why aren't we taking care of the "task is dying"
> > condition?
> > 
> 
> No, I think you are right. But, semantically this cleanup goes in
> task_dead_dl(), IMHO.
Just to be sure I understand correctly: you suggest to add a check for
"state == TASK_DEAD" (skipping the cleanup if the condition is true) in
dequeue_task_dl(), and to add a sub_running_bw() in task_dead_dl()...
Is this understanding correct?

> It's most probably moot if it complicates
> things, but it might be helpful to differentiate the case between a
> task that is actually going to sleep (and for which we want to
> activate the timer) and a task that is dying (and for which we want
> to release bw immediately).
I suspect the two cases should be handled in the same way :)

> So, it actually matters for next patch,
> not here. But, maybe we want to do things clean from start?
You mean, because patch 2/6 adds
+   if (hrtimer_active(>dl.inactive_timer)) {
+   raw_spin_lock_irq(_rq(p)->lock);
+   sub_running_bw(>dl, dl_rq_of_se(>dl));
+   raw_spin_unlock_irq(_rq(p)->lock);
+   }
in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
is wrong :). I am trying to remember why it is there, but I cannot find
any reason... In the next days, I'll run some tests to check if that
hunk is actually needed. If yes, then I'll modify patch 1/6 as you
suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
not do this change to patch 1/6... Is this ok?



Thanks,
Luca


> 
> > 
> > > Peter, does what I'm saying make any sense? :)
> > > 
> > > I still have to set up things here to test these patches (sorry,
> > > I was travelling), but could you try to create some tasks and
> > > that kill them from another shell to see if the accounting
> > > deviates or not? Or did you already do this test?
> > I think this is one of the tests I tried... 
> > I have to check if I changed this code after the test (but I do not
> > think I did). Anyway, tomorrow I'll write a script for automating
> > this test, and I'll leave it running for some hours.
> > 
> 
> OK, thanks. As said I think that you actually handle the case already,
> but I'll try to setup testing as well soon.
> 
> Thanks,
> 
> - Juri



Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Luca Abeni
Hi again,

On Tue, 8 Nov 2016 18:53:09 +
Juri Lelli  wrote:
[...]
> > > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > > handled in finish_task_switch(), so I don't think we are taking
> > > care of the "task is dying" condition.
> > Ok, so I am missing something... The state is set to TASK_DEAD, and
> > then schedule() is called... So, __schedule() sees the dying task as
> > "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> > After that, finish_task_switch() calls task_dead_dl(). Is this
> > wrong? If not, why aren't we taking care of the "task is dying"
> > condition?
> > 
> 
> No, I think you are right. But, semantically this cleanup goes in
> task_dead_dl(), IMHO.
Just to be sure I understand correctly: you suggest to add a check for
"state == TASK_DEAD" (skipping the cleanup if the condition is true) in
dequeue_task_dl(), and to add a sub_running_bw() in task_dead_dl()...
Is this understanding correct?

> It's most probably moot if it complicates
> things, but it might be helpful to differentiate the case between a
> task that is actually going to sleep (and for which we want to
> activate the timer) and a task that is dying (and for which we want
> to release bw immediately).
I suspect the two cases should be handled in the same way :)

> So, it actually matters for next patch,
> not here. But, maybe we want to do things clean from start?
You mean, because patch 2/6 adds
+   if (hrtimer_active(>dl.inactive_timer)) {
+   raw_spin_lock_irq(_rq(p)->lock);
+   sub_running_bw(>dl, dl_rq_of_se(>dl));
+   raw_spin_unlock_irq(_rq(p)->lock);
+   }
in task_dead_dl()? I suspect this hunk is actually unneeded (worse, it
is wrong :). I am trying to remember why it is there, but I cannot find
any reason... In the next days, I'll run some tests to check if that
hunk is actually needed. If yes, then I'll modify patch 1/6 as you
suggest; if it is not needed, I'll remove it from patch 2/6 and I'll
not do this change to patch 1/6... Is this ok?



Thanks,
Luca


> 
> > 
> > > Peter, does what I'm saying make any sense? :)
> > > 
> > > I still have to set up things here to test these patches (sorry,
> > > I was travelling), but could you try to create some tasks and
> > > that kill them from another shell to see if the accounting
> > > deviates or not? Or did you already do this test?
> > I think this is one of the tests I tried... 
> > I have to check if I changed this code after the test (but I do not
> > think I did). Anyway, tomorrow I'll write a script for automating
> > this test, and I'll leave it running for some hours.
> > 
> 
> OK, thanks. As said I think that you actually handle the case already,
> but I'll try to setup testing as well soon.
> 
> Thanks,
> 
> - Juri



Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 08/11/16 19:17, Luca Abeni wrote:
> Hi Juri,
> 
> On Tue, 8 Nov 2016 17:56:35 +
> Juri Lelli  wrote:
> [...]
> > > > >  static void switched_to_dl(struct rq *rq, struct task_struct
> > > > > *p) {
> > > > > + add_running_bw(>dl, >dl);
> > > > >  
> > > > >   /* If p is not queued we will update its parameters at
> > > > > next wakeup. */ if (!task_on_rq_queued(p))  
> > > > 
> > > > Don't we also need to remove bw in task_dead_dl()?
> > > I think task_dead_dl() is invoked after invoking dequeue_task_dl(),
> > > which takes care of this... Or am I wrong? (I think I explicitly
> > > tested this, and modifications to task_dead_dl() turned out to be
> > > unneeded)
> > > 
> > 
> > Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> > (which btw can be actually put together with an or condition), so I
> > don't think that any of those turn out to be true when the task dies.
> I might be very wrong here, but I think do_exit() just does something
> like
>   tsk->state = TASK_DEAD;
> and then invokes schedule(), and __schedule() does
> if (!preempt && prev->state) {
> if (unlikely(signal_pending_state(prev->state, prev))) {
> prev->state = TASK_RUNNING;
> } else {
> deactivate_task(rq, prev, DEQUEUE_SLEEP);
>   [...]
> so dequeue_task_dl() will see DEQUEUE_SLEEP... Or am I misunderstanding
> what you are saying?
> 
> > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > handled in finish_task_switch(), so I don't think we are taking care
> > of the "task is dying" condition.
> Ok, so I am missing something... The state is set to TASK_DEAD, and
> then schedule() is called... So, __schedule() sees the dying task as
> "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> After that, finish_task_switch() calls task_dead_dl(). Is this wrong?
> If not, why aren't we taking care of the "task is dying" condition?
> 

No, I think you are right. But, semantically this cleanup goes in
task_dead_dl(), IMHO. It's most probably moot if it complicates things,
but it might be helpful to differentiate the case between a task that is
actually going to sleep (and for which we want to activate the timer)
and a task that is dying (and for which we want to release bw
immediately). So, it actually matters for next patch, not here. But,
maybe we want to do things clean from start?

> 
> > Peter, does what I'm saying make any sense? :)
> > 
> > I still have to set up things here to test these patches (sorry, I was
> > travelling), but could you try to create some tasks and that kill them
> > from another shell to see if the accounting deviates or not? Or did
> > you already do this test?
> I think this is one of the tests I tried... 
> I have to check if I changed this code after the test (but I do not
> think I did). Anyway, tomorrow I'll write a script for automating this
> test, and I'll leave it running for some hours.
> 

OK, thanks. As said I think that you actually handle the case already,
but I'll try to setup testing as well soon.

Thanks,

- Juri


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 08/11/16 19:17, Luca Abeni wrote:
> Hi Juri,
> 
> On Tue, 8 Nov 2016 17:56:35 +
> Juri Lelli  wrote:
> [...]
> > > > >  static void switched_to_dl(struct rq *rq, struct task_struct
> > > > > *p) {
> > > > > + add_running_bw(>dl, >dl);
> > > > >  
> > > > >   /* If p is not queued we will update its parameters at
> > > > > next wakeup. */ if (!task_on_rq_queued(p))  
> > > > 
> > > > Don't we also need to remove bw in task_dead_dl()?
> > > I think task_dead_dl() is invoked after invoking dequeue_task_dl(),
> > > which takes care of this... Or am I wrong? (I think I explicitly
> > > tested this, and modifications to task_dead_dl() turned out to be
> > > unneeded)
> > > 
> > 
> > Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> > (which btw can be actually put together with an or condition), so I
> > don't think that any of those turn out to be true when the task dies.
> I might be very wrong here, but I think do_exit() just does something
> like
>   tsk->state = TASK_DEAD;
> and then invokes schedule(), and __schedule() does
> if (!preempt && prev->state) {
> if (unlikely(signal_pending_state(prev->state, prev))) {
> prev->state = TASK_RUNNING;
> } else {
> deactivate_task(rq, prev, DEQUEUE_SLEEP);
>   [...]
> so dequeue_task_dl() will see DEQUEUE_SLEEP... Or am I misunderstanding
> what you are saying?
> 
> > Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> > handled in finish_task_switch(), so I don't think we are taking care
> > of the "task is dying" condition.
> Ok, so I am missing something... The state is set to TASK_DEAD, and
> then schedule() is called... So, __schedule() sees the dying task as
> "prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
> After that, finish_task_switch() calls task_dead_dl(). Is this wrong?
> If not, why aren't we taking care of the "task is dying" condition?
> 

No, I think you are right. But, semantically this cleanup goes in
task_dead_dl(), IMHO. It's most probably moot if it complicates things,
but it might be helpful to differentiate the case between a task that is
actually going to sleep (and for which we want to activate the timer)
and a task that is dying (and for which we want to release bw
immediately). So, it actually matters for next patch, not here. But,
maybe we want to do things clean from start?

> 
> > Peter, does what I'm saying make any sense? :)
> > 
> > I still have to set up things here to test these patches (sorry, I was
> > travelling), but could you try to create some tasks and that kill them
> > from another shell to see if the accounting deviates or not? Or did
> > you already do this test?
> I think this is one of the tests I tried... 
> I have to check if I changed this code after the test (but I do not
> think I did). Anyway, tomorrow I'll write a script for automating this
> test, and I'll leave it running for some hours.
> 

OK, thanks. As said I think that you actually handle the case already,
but I'll try to setup testing as well soon.

Thanks,

- Juri


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Luca Abeni
Hi Juri,

On Tue, 8 Nov 2016 17:56:35 +
Juri Lelli  wrote:
[...]
> > > >  static void switched_to_dl(struct rq *rq, struct task_struct
> > > > *p) {
> > > > +   add_running_bw(>dl, >dl);
> > > >  
> > > > /* If p is not queued we will update its parameters at
> > > > next wakeup. */ if (!task_on_rq_queued(p))  
> > > 
> > > Don't we also need to remove bw in task_dead_dl()?
> > I think task_dead_dl() is invoked after invoking dequeue_task_dl(),
> > which takes care of this... Or am I wrong? (I think I explicitly
> > tested this, and modifications to task_dead_dl() turned out to be
> > unneeded)
> > 
> 
> Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> (which btw can be actually put together with an or condition), so I
> don't think that any of those turn out to be true when the task dies.
I might be very wrong here, but I think do_exit() just does something
like
tsk->state = TASK_DEAD;
and then invokes schedule(), and __schedule() does
if (!preempt && prev->state) {
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
deactivate_task(rq, prev, DEQUEUE_SLEEP);
[...]
so dequeue_task_dl() will see DEQUEUE_SLEEP... Or am I misunderstanding
what you are saying?

> Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> handled in finish_task_switch(), so I don't think we are taking care
> of the "task is dying" condition.
Ok, so I am missing something... The state is set to TASK_DEAD, and
then schedule() is called... So, __schedule() sees the dying task as
"prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
After that, finish_task_switch() calls task_dead_dl(). Is this wrong?
If not, why aren't we taking care of the "task is dying" condition?


> Peter, does what I'm saying make any sense? :)
> 
> I still have to set up things here to test these patches (sorry, I was
> travelling), but could you try to create some tasks and that kill them
> from another shell to see if the accounting deviates or not? Or did
> you already do this test?
I think this is one of the tests I tried... 
I have to check if I changed this code after the test (but I do not
think I did). Anyway, tomorrow I'll write a script for automating this
test, and I'll leave it running for some hours.



Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Luca Abeni
Hi Juri,

On Tue, 8 Nov 2016 17:56:35 +
Juri Lelli  wrote:
[...]
> > > >  static void switched_to_dl(struct rq *rq, struct task_struct
> > > > *p) {
> > > > +   add_running_bw(>dl, >dl);
> > > >  
> > > > /* If p is not queued we will update its parameters at
> > > > next wakeup. */ if (!task_on_rq_queued(p))  
> > > 
> > > Don't we also need to remove bw in task_dead_dl()?
> > I think task_dead_dl() is invoked after invoking dequeue_task_dl(),
> > which takes care of this... Or am I wrong? (I think I explicitly
> > tested this, and modifications to task_dead_dl() turned out to be
> > unneeded)
> > 
> 
> Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
> (which btw can be actually put together with an or condition), so I
> don't think that any of those turn out to be true when the task dies.
I might be very wrong here, but I think do_exit() just does something
like
tsk->state = TASK_DEAD;
and then invokes schedule(), and __schedule() does
if (!preempt && prev->state) {
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
deactivate_task(rq, prev, DEQUEUE_SLEEP);
[...]
so dequeue_task_dl() will see DEQUEUE_SLEEP... Or am I misunderstanding
what you are saying?

> Also, AFAIU, do_exit() works on current and the TASK_DEAD case is
> handled in finish_task_switch(), so I don't think we are taking care
> of the "task is dying" condition.
Ok, so I am missing something... The state is set to TASK_DEAD, and
then schedule() is called... So, __schedule() sees the dying task as
"prev" and invokes deactivate_task() with the DEQUEUE_SLEEP flag...
After that, finish_task_switch() calls task_dead_dl(). Is this wrong?
If not, why aren't we taking care of the "task is dying" condition?


> Peter, does what I'm saying make any sense? :)
> 
> I still have to set up things here to test these patches (sorry, I was
> travelling), but could you try to create some tasks and that kill them
> from another shell to see if the accounting deviates or not? Or did
> you already do this test?
I think this is one of the tests I tried... 
I have to check if I changed this code after the test (but I do not
think I did). Anyway, tomorrow I'll write a script for automating this
test, and I'll leave it running for some hours.



Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 01/11/16 22:10, Luca Abeni wrote:
> Hi Juri,
> 
> On Tue, 1 Nov 2016 16:45:43 +
> Juri Lelli  wrote:
> 
> > Hi,
> > 
> > a few nitpicks on subject and changelog and a couple of questions below.
> > 
> > Subject should be changed to something like
> > 
> >  sched/deadline: track the active utilisation
> Ok; that's easy :)
> I guess a similar change should be applied to the subjects of all the
> other patches, right?
> 

Yep. Subject have usually the form:

 : 

> 
> > 
> > On 24/10/16 16:06, Luca Abeni wrote:
> > > The active utilisation here is defined as the total utilisation of the  
> > 
> > s/The active/Active/
> > s/here//
> > s/of the active/of active/
> Ok; I'll do this in the next revision of the patchset.
> 

Thanks.

> 
> > > active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> > > when a task wakes up and is decreased when a task blocks.
> > > 
> > > When a task is migrated from CPUi to CPUj, immediately subtract the task's
> > > utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> > > modifying the pull and push functions.
> > > Note: this is not fully correct from the theoretical point of view
> > > (the utilisation should be removed from CPUi only at the 0 lag time),  
> > 
> > a more theoretically sound solution will follow.
> Notice that even the next patch (introducing the "inactive timer") ends up
> migrating the utilisation immediately (on tasks' migration), without waiting
> for the 0-lag time.
> This is because of the reason explained in the following paragraph:
> 

OK, but is still _more_ theoretically sound. :)

> > > but doing the right thing would be _MUCH_ more complex (leaving the
> > > timer armed when the task is on a different CPU... Inactive timers should
> > > be moved from per-task timers to per-runqueue lists of timers! Bah...)  
> > 
> > I'd remove this paragraph above.
> Ok. Re-reading the changelog, I suspect this is not the correct place for this
> comment.
> 
> 
> > > The utilisation tracking mechanism implemented in this commit can be
> > > fixed / improved by decreasing the active utilisation at the so-called
> > > "0-lag time" instead of when the task blocks.  
> > 
> > And maybe this as well, or put it as more information about the "more
> > theoretically sound" solution?
> Ok... I can remove the paragraph, or point to the next commit (which
> implements the more theoretically sound solution). Is such a "forward
> reference" in changelogs ok?
> 

I'd just say that a better solution will follow. The details about why
it's better might be then put in the changelog and as comments in the
code of the next patch.

> [...]
> > > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > > task_struct *p, int flags)
> > >   return;
> > >   }
> > >  
> > > + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > > + add_running_bw(>dl, >dl);
> > > +
> > >   /*
> > >* If p is throttled, we do nothing. In fact, if it exhausted
> > >* its budget it needs a replenishment and, since it now is on
> > >* its rq, the bandwidth timer callback (which clearly has not
> > >* run yet) will take care of this.
> > >*/
> > > - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > > + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > > + add_running_bw(>dl, >dl);  
> > 
> > Don't rememeber if we discussed this already, but do we need to add the bw 
> > here
> > even if the task is not actually enqueued until after the replenishment 
> > timer
> > fires?
> I think yes... The active utilization does not depend on the fact that the 
> task
> is on the runqueue or not, but depends on the task's state (in GRUB parlance,
> "inactive" vs "active contending"). In other words, even when a task is 
> throttled
> its utilization must be counted in the active utilization.
> 

OK. Could you add a comment about this point please (so that I don't
forget again :)?

> 
> [...]
> > >   /*
> > >* Since this might be the only -deadline task on the rq,
> > >* this is the right place to try to pull some other one
> > > @@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
> > > task_struct *p)
> > >   */
> > >  static void switched_to_dl(struct rq *rq, struct task_struct *p)
> > >  {
> > > + add_running_bw(>dl, >dl);
> > >  
> > >   /* If p is not queued we will update its parameters at next wakeup. */
> > >   if (!task_on_rq_queued(p))  
> > 
> > Don't we also need to remove bw in task_dead_dl()?
> I think task_dead_dl() is invoked after invoking dequeue_task_dl(), which 
> takes care
> of this... Or am I wrong? (I think I explicitly tested this, and 
> modifications to
> task_dead_dl() turned out to be unneeded)
> 

Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
(which btw can be actually put together with an or condition), so I
don't think that any of those turn out to be true when the task dies.
Also, 

Re: [RFC v3 1/6] Track the active utilisation

2016-11-08 Thread Juri Lelli
On 01/11/16 22:10, Luca Abeni wrote:
> Hi Juri,
> 
> On Tue, 1 Nov 2016 16:45:43 +
> Juri Lelli  wrote:
> 
> > Hi,
> > 
> > a few nitpicks on subject and changelog and a couple of questions below.
> > 
> > Subject should be changed to something like
> > 
> >  sched/deadline: track the active utilisation
> Ok; that's easy :)
> I guess a similar change should be applied to the subjects of all the
> other patches, right?
> 

Yep. Subject have usually the form:

 : 

> 
> > 
> > On 24/10/16 16:06, Luca Abeni wrote:
> > > The active utilisation here is defined as the total utilisation of the  
> > 
> > s/The active/Active/
> > s/here//
> > s/of the active/of active/
> Ok; I'll do this in the next revision of the patchset.
> 

Thanks.

> 
> > > active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> > > when a task wakes up and is decreased when a task blocks.
> > > 
> > > When a task is migrated from CPUi to CPUj, immediately subtract the task's
> > > utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> > > modifying the pull and push functions.
> > > Note: this is not fully correct from the theoretical point of view
> > > (the utilisation should be removed from CPUi only at the 0 lag time),  
> > 
> > a more theoretically sound solution will follow.
> Notice that even the next patch (introducing the "inactive timer") ends up
> migrating the utilisation immediately (on tasks' migration), without waiting
> for the 0-lag time.
> This is because of the reason explained in the following paragraph:
> 

OK, but is still _more_ theoretically sound. :)

> > > but doing the right thing would be _MUCH_ more complex (leaving the
> > > timer armed when the task is on a different CPU... Inactive timers should
> > > be moved from per-task timers to per-runqueue lists of timers! Bah...)  
> > 
> > I'd remove this paragraph above.
> Ok. Re-reading the changelog, I suspect this is not the correct place for this
> comment.
> 
> 
> > > The utilisation tracking mechanism implemented in this commit can be
> > > fixed / improved by decreasing the active utilisation at the so-called
> > > "0-lag time" instead of when the task blocks.  
> > 
> > And maybe this as well, or put it as more information about the "more
> > theoretically sound" solution?
> Ok... I can remove the paragraph, or point to the next commit (which
> implements the more theoretically sound solution). Is such a "forward
> reference" in changelogs ok?
> 

I'd just say that a better solution will follow. The details about why
it's better might be then put in the changelog and as comments in the
code of the next patch.

> [...]
> > > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > > task_struct *p, int flags)
> > >   return;
> > >   }
> > >  
> > > + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > > + add_running_bw(>dl, >dl);
> > > +
> > >   /*
> > >* If p is throttled, we do nothing. In fact, if it exhausted
> > >* its budget it needs a replenishment and, since it now is on
> > >* its rq, the bandwidth timer callback (which clearly has not
> > >* run yet) will take care of this.
> > >*/
> > > - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > > + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > > + add_running_bw(>dl, >dl);  
> > 
> > Don't rememeber if we discussed this already, but do we need to add the bw 
> > here
> > even if the task is not actually enqueued until after the replenishment 
> > timer
> > fires?
> I think yes... The active utilization does not depend on the fact that the 
> task
> is on the runqueue or not, but depends on the task's state (in GRUB parlance,
> "inactive" vs "active contending"). In other words, even when a task is 
> throttled
> its utilization must be counted in the active utilization.
> 

OK. Could you add a comment about this point please (so that I don't
forget again :)?

> 
> [...]
> > >   /*
> > >* Since this might be the only -deadline task on the rq,
> > >* this is the right place to try to pull some other one
> > > @@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
> > > task_struct *p)
> > >   */
> > >  static void switched_to_dl(struct rq *rq, struct task_struct *p)
> > >  {
> > > + add_running_bw(>dl, >dl);
> > >  
> > >   /* If p is not queued we will update its parameters at next wakeup. */
> > >   if (!task_on_rq_queued(p))  
> > 
> > Don't we also need to remove bw in task_dead_dl()?
> I think task_dead_dl() is invoked after invoking dequeue_task_dl(), which 
> takes care
> of this... Or am I wrong? (I think I explicitly tested this, and 
> modifications to
> task_dead_dl() turned out to be unneeded)
> 

Mmm. You explicitly check that TASK_ON_RQ_MIGRATING or DEQUEUE_SLEEP
(which btw can be actually put together with an or condition), so I
don't think that any of those turn out to be true when the task dies.
Also, AFAIU, do_exit() works on current and 

Re: [RFC v3 1/6] Track the active utilisation

2016-11-01 Thread luca abeni
Hi Juri,

On Tue, 1 Nov 2016 16:45:43 +
Juri Lelli  wrote:

> Hi,
> 
> a few nitpicks on subject and changelog and a couple of questions below.
> 
> Subject should be changed to something like
> 
>  sched/deadline: track the active utilisation
Ok; that's easy :)
I guess a similar change should be applied to the subjects of all the
other patches, right?


> 
> On 24/10/16 16:06, Luca Abeni wrote:
> > The active utilisation here is defined as the total utilisation of the  
> 
> s/The active/Active/
> s/here//
> s/of the active/of active/
Ok; I'll do this in the next revision of the patchset.


> > active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> > when a task wakes up and is decreased when a task blocks.
> > 
> > When a task is migrated from CPUi to CPUj, immediately subtract the task's
> > utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> > modifying the pull and push functions.
> > Note: this is not fully correct from the theoretical point of view
> > (the utilisation should be removed from CPUi only at the 0 lag time),  
> 
> a more theoretically sound solution will follow.
Notice that even the next patch (introducing the "inactive timer") ends up
migrating the utilisation immediately (on tasks' migration), without waiting
for the 0-lag time.
This is because of the reason explained in the following paragraph:

> > but doing the right thing would be _MUCH_ more complex (leaving the
> > timer armed when the task is on a different CPU... Inactive timers should
> > be moved from per-task timers to per-runqueue lists of timers! Bah...)  
> 
> I'd remove this paragraph above.
Ok. Re-reading the changelog, I suspect this is not the correct place for this
comment.


> > The utilisation tracking mechanism implemented in this commit can be
> > fixed / improved by decreasing the active utilisation at the so-called
> > "0-lag time" instead of when the task blocks.  
> 
> And maybe this as well, or put it as more information about the "more
> theoretically sound" solution?
Ok... I can remove the paragraph, or point to the next commit (which
implements the more theoretically sound solution). Is such a "forward
reference" in changelogs ok?

[...]
> > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > task_struct *p, int flags)
> > return;
> > }
> >  
> > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > +   add_running_bw(>dl, >dl);
> > +
> > /*
> >  * If p is throttled, we do nothing. In fact, if it exhausted
> >  * its budget it needs a replenishment and, since it now is on
> >  * its rq, the bandwidth timer callback (which clearly has not
> >  * run yet) will take care of this.
> >  */
> > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > +   add_running_bw(>dl, >dl);  
> 
> Don't rememeber if we discussed this already, but do we need to add the bw 
> here
> even if the task is not actually enqueued until after the replenishment timer
> fires?
I think yes... The active utilization does not depend on the fact that the task
is on the runqueue or not, but depends on the task's state (in GRUB parlance,
"inactive" vs "active contending"). In other words, even when a task is 
throttled
its utilization must be counted in the active utilization.


[...]
> > /*
> >  * Since this might be the only -deadline task on the rq,
> >  * this is the right place to try to pull some other one
> > @@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
> > task_struct *p)
> >   */
> >  static void switched_to_dl(struct rq *rq, struct task_struct *p)
> >  {
> > +   add_running_bw(>dl, >dl);
> >  
> > /* If p is not queued we will update its parameters at next wakeup. */
> > if (!task_on_rq_queued(p))  
> 
> Don't we also need to remove bw in task_dead_dl()?
I think task_dead_dl() is invoked after invoking dequeue_task_dl(), which takes 
care
of this... Or am I wrong? (I think I explicitly tested this, and modifications 
to
task_dead_dl() turned out to be unneeded)



Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-01 Thread luca abeni
Hi Juri,

On Tue, 1 Nov 2016 16:45:43 +
Juri Lelli  wrote:

> Hi,
> 
> a few nitpicks on subject and changelog and a couple of questions below.
> 
> Subject should be changed to something like
> 
>  sched/deadline: track the active utilisation
Ok; that's easy :)
I guess a similar change should be applied to the subjects of all the
other patches, right?


> 
> On 24/10/16 16:06, Luca Abeni wrote:
> > The active utilisation here is defined as the total utilisation of the  
> 
> s/The active/Active/
> s/here//
> s/of the active/of active/
Ok; I'll do this in the next revision of the patchset.


> > active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> > when a task wakes up and is decreased when a task blocks.
> > 
> > When a task is migrated from CPUi to CPUj, immediately subtract the task's
> > utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> > modifying the pull and push functions.
> > Note: this is not fully correct from the theoretical point of view
> > (the utilisation should be removed from CPUi only at the 0 lag time),  
> 
> a more theoretically sound solution will follow.
Notice that even the next patch (introducing the "inactive timer") ends up
migrating the utilisation immediately (on tasks' migration), without waiting
for the 0-lag time.
This is because of the reason explained in the following paragraph:

> > but doing the right thing would be _MUCH_ more complex (leaving the
> > timer armed when the task is on a different CPU... Inactive timers should
> > be moved from per-task timers to per-runqueue lists of timers! Bah...)  
> 
> I'd remove this paragraph above.
Ok. Re-reading the changelog, I suspect this is not the correct place for this
comment.


> > The utilisation tracking mechanism implemented in this commit can be
> > fixed / improved by decreasing the active utilisation at the so-called
> > "0-lag time" instead of when the task blocks.  
> 
> And maybe this as well, or put it as more information about the "more
> theoretically sound" solution?
Ok... I can remove the paragraph, or point to the next commit (which
implements the more theoretically sound solution). Is such a "forward
reference" in changelogs ok?

[...]
> > @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> > task_struct *p, int flags)
> > return;
> > }
> >  
> > +   if (p->on_rq == TASK_ON_RQ_MIGRATING)
> > +   add_running_bw(>dl, >dl);
> > +
> > /*
> >  * If p is throttled, we do nothing. In fact, if it exhausted
> >  * its budget it needs a replenishment and, since it now is on
> >  * its rq, the bandwidth timer callback (which clearly has not
> >  * run yet) will take care of this.
> >  */
> > -   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> > +   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> > +   add_running_bw(>dl, >dl);  
> 
> Don't rememeber if we discussed this already, but do we need to add the bw 
> here
> even if the task is not actually enqueued until after the replenishment timer
> fires?
I think yes... The active utilization does not depend on the fact that the task
is on the runqueue or not, but depends on the task's state (in GRUB parlance,
"inactive" vs "active contending"). In other words, even when a task is 
throttled
its utilization must be counted in the active utilization.


[...]
> > /*
> >  * Since this might be the only -deadline task on the rq,
> >  * this is the right place to try to pull some other one
> > @@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
> > task_struct *p)
> >   */
> >  static void switched_to_dl(struct rq *rq, struct task_struct *p)
> >  {
> > +   add_running_bw(>dl, >dl);
> >  
> > /* If p is not queued we will update its parameters at next wakeup. */
> > if (!task_on_rq_queued(p))  
> 
> Don't we also need to remove bw in task_dead_dl()?
I think task_dead_dl() is invoked after invoking dequeue_task_dl(), which takes 
care
of this... Or am I wrong? (I think I explicitly tested this, and modifications 
to
task_dead_dl() turned out to be unneeded)



Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-11-01 Thread Juri Lelli
Hi,

a few nitpicks on subject and changelog and a couple of questions below.

Subject should be changed to something like

 sched/deadline: track the active utilisation

On 24/10/16 16:06, Luca Abeni wrote:
> The active utilisation here is defined as the total utilisation of the

s/The active/Active/
s/here//
s/of the active/of active/

> active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> when a task wakes up and is decreased when a task blocks.
> 
> When a task is migrated from CPUi to CPUj, immediately subtract the task's
> utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> modifying the pull and push functions.
> Note: this is not fully correct from the theoretical point of view
> (the utilisation should be removed from CPUi only at the 0 lag time),

a more theoretically sound solution will follow.

> but doing the right thing would be _MUCH_ more complex (leaving the
> timer armed when the task is on a different CPU... Inactive timers should
> be moved from per-task timers to per-runqueue lists of timers! Bah...)

I'd remove this paragraph above.

> 
> The utilisation tracking mechanism implemented in this commit can be
> fixed / improved by decreasing the active utilisation at the so-called
> "0-lag time" instead of when the task blocks.

And maybe this as well, or put it as more information about the "more
theoretically sound" solution?

> 
> Signed-off-by: Juri Lelli 
> Signed-off-by: Luca Abeni 
> ---
>  kernel/sched/deadline.c | 39 ++-
>  kernel/sched/sched.h|  6 ++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 37e2449..3d95c1d 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
>   return !RB_EMPTY_NODE(_se->rb_node);
>  }
>  
> +static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw += se_bw;
> +}
> +
> +static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw -= se_bw;
> + if (WARN_ON(dl_rq->running_bw < 0))
> + dl_rq->running_bw = 0;
> +}
> +
>  static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
>  {
>   struct sched_dl_entity *dl_se = >dl;
> @@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity 
> *dl_se,
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>   struct rq *rq = rq_of_dl_rq(dl_rq);
>  
> + add_running_bw(dl_se, dl_rq);
> +
>   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
>   dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>   return;
>   }
>  
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + add_running_bw(>dl, >dl);
> +
>   /*
>* If p is throttled, we do nothing. In fact, if it exhausted
>* its budget it needs a replenishment and, since it now is on
>* its rq, the bandwidth timer callback (which clearly has not
>* run yet) will take care of this.
>*/
> - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> + add_running_bw(>dl, >dl);

Don't rememeber if we discussed this already, but do we need to add the bw here
even if the task is not actually enqueued until after the replenishment timer
fires?

>   return;
> + }
>  
>   enqueue_dl_entity(>dl, pi_se, flags);
>  
> @@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>  {
>   update_curr_dl(rq);
>   __dequeue_task_dl(rq, p, flags);
> +
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + sub_running_bw(>dl, >dl);
> +
> + if (flags & DEQUEUE_SLEEP)
> + sub_running_bw(>dl, >dl);
>  }
>  
>  /*
> @@ -1501,7 +1530,9 @@ static int push_dl_task(struct rq *rq)
>   }
>  
>   deactivate_task(rq, next_task, 0);
> + sub_running_bw(_task->dl, >dl);
>   set_task_cpu(next_task, later_rq->cpu);
> + add_running_bw(_task->dl, _rq->dl);
>   activate_task(later_rq, next_task, 0);
>   ret = 1;
>  
> @@ -1589,7 +1620,9 @@ static void pull_dl_task(struct rq *this_rq)
>   resched = true;
>  
>   deactivate_task(src_rq, p, 0);
> + sub_running_bw(>dl, _rq->dl);
>   set_task_cpu(p, this_cpu);
> + add_running_bw(>dl, _rq->dl);
>   activate_task(this_rq, p, 0);
>   dmin = 

Re: [RFC v3 1/6] Track the active utilisation

2016-11-01 Thread Juri Lelli
Hi,

a few nitpicks on subject and changelog and a couple of questions below.

Subject should be changed to something like

 sched/deadline: track the active utilisation

On 24/10/16 16:06, Luca Abeni wrote:
> The active utilisation here is defined as the total utilisation of the

s/The active/Active/
s/here//
s/of the active/of active/

> active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> when a task wakes up and is decreased when a task blocks.
> 
> When a task is migrated from CPUi to CPUj, immediately subtract the task's
> utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> modifying the pull and push functions.
> Note: this is not fully correct from the theoretical point of view
> (the utilisation should be removed from CPUi only at the 0 lag time),

a more theoretically sound solution will follow.

> but doing the right thing would be _MUCH_ more complex (leaving the
> timer armed when the task is on a different CPU... Inactive timers should
> be moved from per-task timers to per-runqueue lists of timers! Bah...)

I'd remove this paragraph above.

> 
> The utilisation tracking mechanism implemented in this commit can be
> fixed / improved by decreasing the active utilisation at the so-called
> "0-lag time" instead of when the task blocks.

And maybe this as well, or put it as more information about the "more
theoretically sound" solution?

> 
> Signed-off-by: Juri Lelli 
> Signed-off-by: Luca Abeni 
> ---
>  kernel/sched/deadline.c | 39 ++-
>  kernel/sched/sched.h|  6 ++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 37e2449..3d95c1d 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
>   return !RB_EMPTY_NODE(_se->rb_node);
>  }
>  
> +static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw += se_bw;
> +}
> +
> +static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw -= se_bw;
> + if (WARN_ON(dl_rq->running_bw < 0))
> + dl_rq->running_bw = 0;
> +}
> +
>  static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
>  {
>   struct sched_dl_entity *dl_se = >dl;
> @@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity 
> *dl_se,
>   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>   struct rq *rq = rq_of_dl_rq(dl_rq);
>  
> + add_running_bw(dl_se, dl_rq);
> +
>   if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>   dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
>   dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
> @@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>   return;
>   }
>  
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + add_running_bw(>dl, >dl);
> +
>   /*
>* If p is throttled, we do nothing. In fact, if it exhausted
>* its budget it needs a replenishment and, since it now is on
>* its rq, the bandwidth timer callback (which clearly has not
>* run yet) will take care of this.
>*/
> - if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
> + if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
> + add_running_bw(>dl, >dl);

Don't rememeber if we discussed this already, but do we need to add the bw here
even if the task is not actually enqueued until after the replenishment timer
fires?

>   return;
> + }
>  
>   enqueue_dl_entity(>dl, pi_se, flags);
>  
> @@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>  {
>   update_curr_dl(rq);
>   __dequeue_task_dl(rq, p, flags);
> +
> + if (p->on_rq == TASK_ON_RQ_MIGRATING)
> + sub_running_bw(>dl, >dl);
> +
> + if (flags & DEQUEUE_SLEEP)
> + sub_running_bw(>dl, >dl);
>  }
>  
>  /*
> @@ -1501,7 +1530,9 @@ static int push_dl_task(struct rq *rq)
>   }
>  
>   deactivate_task(rq, next_task, 0);
> + sub_running_bw(_task->dl, >dl);
>   set_task_cpu(next_task, later_rq->cpu);
> + add_running_bw(_task->dl, _rq->dl);
>   activate_task(later_rq, next_task, 0);
>   ret = 1;
>  
> @@ -1589,7 +1620,9 @@ static void pull_dl_task(struct rq *this_rq)
>   resched = true;
>  
>   deactivate_task(src_rq, p, 0);
> + sub_running_bw(>dl, _rq->dl);
>   set_task_cpu(p, this_cpu);
> + add_running_bw(>dl, _rq->dl);
>   activate_task(this_rq, p, 0);
>   dmin = p->dl.deadline;
>  
> @@ -1695,6 +1728,9 @@ 

Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Luca Abeni
On Tue, 25 Oct 2016 09:58:11 -0400
Steven Rostedt  wrote:

> On Tue, 25 Oct 2016 11:29:16 +0200
> luca abeni  wrote:
> 
> > Hi Daniel,
> > 
> > On Tue, 25 Oct 2016 11:09:52 +0200
> > Daniel Bristot de Oliveira  wrote:
> > [...]
> > > > +static void add_running_bw(struct sched_dl_entity *dl_se,
> > > > struct dl_rq *dl_rq) +{
> > > > +   u64 se_bw = dl_se->dl_bw;
> > > > +
> > > > +   dl_rq->running_bw += se_bw;
> > > > +}
> > > 
> > > why not...
> > > 
> > > static *inline*
> > > void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> > > *dl_rq) {
> > >   dl_rq->running_bw += dl_se->dl_bw;
> > > }
> > > 
> > > am I missing something?  
> > 
> > I do not know... Maybe I am the one missing something :)
> > I assumed that the compiler is smart enough to inline the function
> > (and to avoid creating a local variable on the stack), but if there
> > is agreement I can change the function in this way.
> > 
> > 
> 
> I agree with Daniel, especially since I don't usually trust the
> compiler. And the added variable is more of a distraction as it
> doesn't seem to have any real purpose.

Ok, then; I'll fix this in the next round of patches.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Luca Abeni
On Tue, 25 Oct 2016 09:58:11 -0400
Steven Rostedt  wrote:

> On Tue, 25 Oct 2016 11:29:16 +0200
> luca abeni  wrote:
> 
> > Hi Daniel,
> > 
> > On Tue, 25 Oct 2016 11:09:52 +0200
> > Daniel Bristot de Oliveira  wrote:
> > [...]
> > > > +static void add_running_bw(struct sched_dl_entity *dl_se,
> > > > struct dl_rq *dl_rq) +{
> > > > +   u64 se_bw = dl_se->dl_bw;
> > > > +
> > > > +   dl_rq->running_bw += se_bw;
> > > > +}
> > > 
> > > why not...
> > > 
> > > static *inline*
> > > void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> > > *dl_rq) {
> > >   dl_rq->running_bw += dl_se->dl_bw;
> > > }
> > > 
> > > am I missing something?  
> > 
> > I do not know... Maybe I am the one missing something :)
> > I assumed that the compiler is smart enough to inline the function
> > (and to avoid creating a local variable on the stack), but if there
> > is agreement I can change the function in this way.
> > 
> > 
> 
> I agree with Daniel, especially since I don't usually trust the
> compiler. And the added variable is more of a distraction as it
> doesn't seem to have any real purpose.

Ok, then; I'll fix this in the next round of patches.


Thanks,
Luca


Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Steven Rostedt
On Tue, 25 Oct 2016 11:29:16 +0200
luca abeni  wrote:

> Hi Daniel,
> 
> On Tue, 25 Oct 2016 11:09:52 +0200
> Daniel Bristot de Oliveira  wrote:
> [...]
> > > +static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) +{
> > > + u64 se_bw = dl_se->dl_bw;
> > > +
> > > + dl_rq->running_bw += se_bw;
> > > +}
> > 
> > why not...
> > 
> > static *inline*
> > void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> > *dl_rq) {
> > dl_rq->running_bw += dl_se->dl_bw;
> > }
> > 
> > am I missing something?  
> 
> I do not know... Maybe I am the one missing something :)
> I assumed that the compiler is smart enough to inline the function (and
> to avoid creating a local variable on the stack), but if there is
> agreement I can change the function in this way.
> 
> 

I agree with Daniel, especially since I don't usually trust the
compiler. And the added variable is more of a distraction as it doesn't
seem to have any real purpose.

-- Steve


Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Steven Rostedt
On Tue, 25 Oct 2016 11:29:16 +0200
luca abeni  wrote:

> Hi Daniel,
> 
> On Tue, 25 Oct 2016 11:09:52 +0200
> Daniel Bristot de Oliveira  wrote:
> [...]
> > > +static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > > dl_rq *dl_rq) +{
> > > + u64 se_bw = dl_se->dl_bw;
> > > +
> > > + dl_rq->running_bw += se_bw;
> > > +}
> > 
> > why not...
> > 
> > static *inline*
> > void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> > *dl_rq) {
> > dl_rq->running_bw += dl_se->dl_bw;
> > }
> > 
> > am I missing something?  
> 
> I do not know... Maybe I am the one missing something :)
> I assumed that the compiler is smart enough to inline the function (and
> to avoid creating a local variable on the stack), but if there is
> agreement I can change the function in this way.
> 
> 

I agree with Daniel, especially since I don't usually trust the
compiler. And the added variable is more of a distraction as it doesn't
seem to have any real purpose.

-- Steve


Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread luca abeni
Hi Daniel,

On Tue, 25 Oct 2016 11:09:52 +0200
Daniel Bristot de Oliveira  wrote:
[...]
> > +static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) +{
> > +   u64 se_bw = dl_se->dl_bw;
> > +
> > +   dl_rq->running_bw += se_bw;
> > +}  
> 
> why not...
> 
> static *inline*
> void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> *dl_rq) {
>   dl_rq->running_bw += dl_se->dl_bw;
> }
> 
> am I missing something?

I do not know... Maybe I am the one missing something :)
I assumed that the compiler is smart enough to inline the function (and
to avoid creating a local variable on the stack), but if there is
agreement I can change the function in this way.



Thanks,
Luca


> 
> > +static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) +{
> > +   u64 se_bw = dl_se->dl_bw;
> > +
> > +   dl_rq->running_bw -= se_bw;
> > +   if (WARN_ON(dl_rq->running_bw < 0))
> > +   dl_rq->running_bw = 0;
> > +}  
> 
> (if I am not missing anything...)
> 
> the same in the above function: use inline and remove the se_bw
> variable.
> 
> -- Daniel



Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread luca abeni
Hi Daniel,

On Tue, 25 Oct 2016 11:09:52 +0200
Daniel Bristot de Oliveira  wrote:
[...]
> > +static void add_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) +{
> > +   u64 se_bw = dl_se->dl_bw;
> > +
> > +   dl_rq->running_bw += se_bw;
> > +}  
> 
> why not...
> 
> static *inline*
> void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq
> *dl_rq) {
>   dl_rq->running_bw += dl_se->dl_bw;
> }
> 
> am I missing something?

I do not know... Maybe I am the one missing something :)
I assumed that the compiler is smart enough to inline the function (and
to avoid creating a local variable on the stack), but if there is
agreement I can change the function in this way.



Thanks,
Luca


> 
> > +static void sub_running_bw(struct sched_dl_entity *dl_se, struct
> > dl_rq *dl_rq) +{
> > +   u64 se_bw = dl_se->dl_bw;
> > +
> > +   dl_rq->running_bw -= se_bw;
> > +   if (WARN_ON(dl_rq->running_bw < 0))
> > +   dl_rq->running_bw = 0;
> > +}  
> 
> (if I am not missing anything...)
> 
> the same in the above function: use inline and remove the se_bw
> variable.
> 
> -- Daniel



Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Daniel Bristot de Oliveira
Il 24/10/2016 16:06, Luca Abeni ha scritto:
> The active utilisation here is defined as the total utilisation of the
> active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> when a task wakes up and is decreased when a task blocks.
> 
> When a task is migrated from CPUi to CPUj, immediately subtract the task's
> utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> modifying the pull and push functions.
> Note: this is not fully correct from the theoretical point of view
> (the utilisation should be removed from CPUi only at the 0 lag time),
> but doing the right thing would be _MUCH_ more complex (leaving the
> timer armed when the task is on a different CPU... Inactive timers should
> be moved from per-task timers to per-runqueue lists of timers! Bah...)
> 
> The utilisation tracking mechanism implemented in this commit can be
> fixed / improved by decreasing the active utilisation at the so-called
> "0-lag time" instead of when the task blocks.
> 
> Signed-off-by: Juri Lelli 
> Signed-off-by: Luca Abeni 
> ---
>  kernel/sched/deadline.c | 39 ++-
>  kernel/sched/sched.h|  6 ++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 37e2449..3d95c1d 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
>   return !RB_EMPTY_NODE(_se->rb_node);
>  }
>  
> +static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw += se_bw;
> +}

why not...

static *inline*
void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
dl_rq->running_bw += dl_se->dl_bw;
}

am I missing something?

> +static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw -= se_bw;
> + if (WARN_ON(dl_rq->running_bw < 0))
> + dl_rq->running_bw = 0;
> +}

(if I am not missing anything...)

the same in the above function: use inline and remove the se_bw variable.

-- Daniel


Re: [RFC v3 1/6] Track the active utilisation

2016-10-25 Thread Daniel Bristot de Oliveira
Il 24/10/2016 16:06, Luca Abeni ha scritto:
> The active utilisation here is defined as the total utilisation of the
> active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
> when a task wakes up and is decreased when a task blocks.
> 
> When a task is migrated from CPUi to CPUj, immediately subtract the task's
> utilisation from CPUi and add it to CPUj. This mechanism is implemented by
> modifying the pull and push functions.
> Note: this is not fully correct from the theoretical point of view
> (the utilisation should be removed from CPUi only at the 0 lag time),
> but doing the right thing would be _MUCH_ more complex (leaving the
> timer armed when the task is on a different CPU... Inactive timers should
> be moved from per-task timers to per-runqueue lists of timers! Bah...)
> 
> The utilisation tracking mechanism implemented in this commit can be
> fixed / improved by decreasing the active utilisation at the so-called
> "0-lag time" instead of when the task blocks.
> 
> Signed-off-by: Juri Lelli 
> Signed-off-by: Luca Abeni 
> ---
>  kernel/sched/deadline.c | 39 ++-
>  kernel/sched/sched.h|  6 ++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 37e2449..3d95c1d 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
>   return !RB_EMPTY_NODE(_se->rb_node);
>  }
>  
> +static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw += se_bw;
> +}

why not...

static *inline*
void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
dl_rq->running_bw += dl_se->dl_bw;
}

am I missing something?

> +static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq 
> *dl_rq)
> +{
> + u64 se_bw = dl_se->dl_bw;
> +
> + dl_rq->running_bw -= se_bw;
> + if (WARN_ON(dl_rq->running_bw < 0))
> + dl_rq->running_bw = 0;
> +}

(if I am not missing anything...)

the same in the above function: use inline and remove the se_bw variable.

-- Daniel


[RFC v3 1/6] Track the active utilisation

2016-10-24 Thread Luca Abeni
The active utilisation here is defined as the total utilisation of the
active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
when a task wakes up and is decreased when a task blocks.

When a task is migrated from CPUi to CPUj, immediately subtract the task's
utilisation from CPUi and add it to CPUj. This mechanism is implemented by
modifying the pull and push functions.
Note: this is not fully correct from the theoretical point of view
(the utilisation should be removed from CPUi only at the 0 lag time),
but doing the right thing would be _MUCH_ more complex (leaving the
timer armed when the task is on a different CPU... Inactive timers should
be moved from per-task timers to per-runqueue lists of timers! Bah...)

The utilisation tracking mechanism implemented in this commit can be
fixed / improved by decreasing the active utilisation at the so-called
"0-lag time" instead of when the task blocks.

Signed-off-by: Juri Lelli 
Signed-off-by: Luca Abeni 
---
 kernel/sched/deadline.c | 39 ++-
 kernel/sched/sched.h|  6 ++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 37e2449..3d95c1d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
return !RB_EMPTY_NODE(_se->rb_node);
 }
 
+static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
+{
+   u64 se_bw = dl_se->dl_bw;
+
+   dl_rq->running_bw += se_bw;
+}
+
+static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
+{
+   u64 se_bw = dl_se->dl_bw;
+
+   dl_rq->running_bw -= se_bw;
+   if (WARN_ON(dl_rq->running_bw < 0))
+   dl_rq->running_bw = 0;
+}
+
 static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
 {
struct sched_dl_entity *dl_se = >dl;
@@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
 
+   add_running_bw(dl_se, dl_rq);
+
if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
@@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
return;
}
 
+   if (p->on_rq == TASK_ON_RQ_MIGRATING)
+   add_running_bw(>dl, >dl);
+
/*
 * If p is throttled, we do nothing. In fact, if it exhausted
 * its budget it needs a replenishment and, since it now is on
 * its rq, the bandwidth timer callback (which clearly has not
 * run yet) will take care of this.
 */
-   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
+   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
+   add_running_bw(>dl, >dl);
return;
+   }
 
enqueue_dl_entity(>dl, pi_se, flags);
 
@@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
 {
update_curr_dl(rq);
__dequeue_task_dl(rq, p, flags);
+
+   if (p->on_rq == TASK_ON_RQ_MIGRATING)
+   sub_running_bw(>dl, >dl);
+
+   if (flags & DEQUEUE_SLEEP)
+   sub_running_bw(>dl, >dl);
 }
 
 /*
@@ -1501,7 +1530,9 @@ static int push_dl_task(struct rq *rq)
}
 
deactivate_task(rq, next_task, 0);
+   sub_running_bw(_task->dl, >dl);
set_task_cpu(next_task, later_rq->cpu);
+   add_running_bw(_task->dl, _rq->dl);
activate_task(later_rq, next_task, 0);
ret = 1;
 
@@ -1589,7 +1620,9 @@ static void pull_dl_task(struct rq *this_rq)
resched = true;
 
deactivate_task(src_rq, p, 0);
+   sub_running_bw(>dl, _rq->dl);
set_task_cpu(p, this_cpu);
+   add_running_bw(>dl, _rq->dl);
activate_task(this_rq, p, 0);
dmin = p->dl.deadline;
 
@@ -1695,6 +1728,9 @@ static void switched_from_dl(struct rq *rq, struct 
task_struct *p)
if (!start_dl_timer(p))
__dl_clear_params(p);
 
+   if (task_on_rq_queued(p))
+   sub_running_bw(>dl, >dl);
+
/*
 * Since this might be the only -deadline task on the rq,
 * this is the right place to try to pull some other one
@@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
task_struct *p)
  */
 static void switched_to_dl(struct rq *rq, struct task_struct *p)
 {
+   add_running_bw(>dl, >dl);
 
/* If p is not queued we will update its parameters at next wakeup. */
if (!task_on_rq_queued(p))
diff --git a/kernel/sched/sched.h 

[RFC v3 1/6] Track the active utilisation

2016-10-24 Thread Luca Abeni
The active utilisation here is defined as the total utilisation of the
active (TASK_RUNNING) tasks queued on a runqueue. Hence, it is increased
when a task wakes up and is decreased when a task blocks.

When a task is migrated from CPUi to CPUj, immediately subtract the task's
utilisation from CPUi and add it to CPUj. This mechanism is implemented by
modifying the pull and push functions.
Note: this is not fully correct from the theoretical point of view
(the utilisation should be removed from CPUi only at the 0 lag time),
but doing the right thing would be _MUCH_ more complex (leaving the
timer armed when the task is on a different CPU... Inactive timers should
be moved from per-task timers to per-runqueue lists of timers! Bah...)

The utilisation tracking mechanism implemented in this commit can be
fixed / improved by decreasing the active utilisation at the so-called
"0-lag time" instead of when the task blocks.

Signed-off-by: Juri Lelli 
Signed-off-by: Luca Abeni 
---
 kernel/sched/deadline.c | 39 ++-
 kernel/sched/sched.h|  6 ++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 37e2449..3d95c1d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -43,6 +43,22 @@ static inline int on_dl_rq(struct sched_dl_entity *dl_se)
return !RB_EMPTY_NODE(_se->rb_node);
 }
 
+static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
+{
+   u64 se_bw = dl_se->dl_bw;
+
+   dl_rq->running_bw += se_bw;
+}
+
+static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
+{
+   u64 se_bw = dl_se->dl_bw;
+
+   dl_rq->running_bw -= se_bw;
+   if (WARN_ON(dl_rq->running_bw < 0))
+   dl_rq->running_bw = 0;
+}
+
 static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
 {
struct sched_dl_entity *dl_se = >dl;
@@ -498,6 +514,8 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
 
+   add_running_bw(dl_se, dl_rq);
+
if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
@@ -947,14 +965,19 @@ static void enqueue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
return;
}
 
+   if (p->on_rq == TASK_ON_RQ_MIGRATING)
+   add_running_bw(>dl, >dl);
+
/*
 * If p is throttled, we do nothing. In fact, if it exhausted
 * its budget it needs a replenishment and, since it now is on
 * its rq, the bandwidth timer callback (which clearly has not
 * run yet) will take care of this.
 */
-   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH))
+   if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
+   add_running_bw(>dl, >dl);
return;
+   }
 
enqueue_dl_entity(>dl, pi_se, flags);
 
@@ -972,6 +995,12 @@ static void dequeue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
 {
update_curr_dl(rq);
__dequeue_task_dl(rq, p, flags);
+
+   if (p->on_rq == TASK_ON_RQ_MIGRATING)
+   sub_running_bw(>dl, >dl);
+
+   if (flags & DEQUEUE_SLEEP)
+   sub_running_bw(>dl, >dl);
 }
 
 /*
@@ -1501,7 +1530,9 @@ static int push_dl_task(struct rq *rq)
}
 
deactivate_task(rq, next_task, 0);
+   sub_running_bw(_task->dl, >dl);
set_task_cpu(next_task, later_rq->cpu);
+   add_running_bw(_task->dl, _rq->dl);
activate_task(later_rq, next_task, 0);
ret = 1;
 
@@ -1589,7 +1620,9 @@ static void pull_dl_task(struct rq *this_rq)
resched = true;
 
deactivate_task(src_rq, p, 0);
+   sub_running_bw(>dl, _rq->dl);
set_task_cpu(p, this_cpu);
+   add_running_bw(>dl, _rq->dl);
activate_task(this_rq, p, 0);
dmin = p->dl.deadline;
 
@@ -1695,6 +1728,9 @@ static void switched_from_dl(struct rq *rq, struct 
task_struct *p)
if (!start_dl_timer(p))
__dl_clear_params(p);
 
+   if (task_on_rq_queued(p))
+   sub_running_bw(>dl, >dl);
+
/*
 * Since this might be the only -deadline task on the rq,
 * this is the right place to try to pull some other one
@@ -1712,6 +1748,7 @@ static void switched_from_dl(struct rq *rq, struct 
task_struct *p)
  */
 static void switched_to_dl(struct rq *rq, struct task_struct *p)
 {
+   add_running_bw(>dl, >dl);
 
/* If p is not queued we will update its parameters at next wakeup. */
if (!task_on_rq_queued(p))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 055f935..3a36c74 100644
---