Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-25 Thread Luca Abeni
On Mon, 24 Jul 2017 11:11:30 +0200
Peter Zijlstra  wrote:

> On Mon, Jul 24, 2017 at 09:54:54AM +0200, Luca Abeni wrote:
> > Hi Peter,
> > 
> > I put this change in a local tree together with other fixes / cleanups
> > I plan to submit in the next weeks. Should I send it together with the
> > other patches, or are you going to apply it separately?  
> 
> Posting them in a series is fine; it is customary to put independent
> things first such that they will not get stuck after the larger changes.
> 
> > In the first case, what is the correct authorship / SOB chain (I ask
> > because I keep getting this wrong every time :)  
> 
> Yes, this is a 'fun' case :-) I'd just merge the change into your patch
> introducing it and forget I 'contributed' the name change.

I think this patch is independent from the other patches I have in my
tree... So, I will go for the solution you describe below.


Thanks,
Luca

> 
> For larger patches you could do something like (in your email body):
> 
> 
> From: Peter Zijlstra 
> 
> Changelog goes here...
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Luca...
> ---
> 
> $PATCH
> 
> 
> Which says this patch is from me, carried by you, and then I'll stick
> another SoB on to indicated I took it back. Its a bit weird, but we've
> done it before.



Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-25 Thread Luca Abeni
On Mon, 24 Jul 2017 11:11:30 +0200
Peter Zijlstra  wrote:

> On Mon, Jul 24, 2017 at 09:54:54AM +0200, Luca Abeni wrote:
> > Hi Peter,
> > 
> > I put this change in a local tree together with other fixes / cleanups
> > I plan to submit in the next weeks. Should I send it together with the
> > other patches, or are you going to apply it separately?  
> 
> Posting them in a series is fine; it is customary to put independent
> things first such that they will not get stuck after the larger changes.
> 
> > In the first case, what is the correct authorship / SOB chain (I ask
> > because I keep getting this wrong every time :)  
> 
> Yes, this is a 'fun' case :-) I'd just merge the change into your patch
> introducing it and forget I 'contributed' the name change.

I think this patch is independent from the other patches I have in my
tree... So, I will go for the solution you describe below.


Thanks,
Luca

> 
> For larger patches you could do something like (in your email body):
> 
> 
> From: Peter Zijlstra 
> 
> Changelog goes here...
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Luca...
> ---
> 
> $PATCH
> 
> 
> Which says this patch is from me, carried by you, and then I'll stick
> another SoB on to indicated I took it back. Its a bit weird, but we've
> done it before.



Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-25 Thread Luca Abeni
On Mon, 24 Jul 2017 11:04:52 +0200
Peter Zijlstra  wrote:

> On Mon, Jul 24, 2017 at 10:06:09AM +0200, Luca Abeni wrote:
> > > Yes, grouping all the flags in a single field was my intention too... I
> > > planned to submit a patch to do this after merging the reclaiming
> > > patches... But maybe it is better to do this first :)  
> > 
> > I implemented this change, but before submitting the patch I have a
> > small question.
> > I implemented some helpers to access the various
> > {throttled,boosted,yielded,non_contending} flags. I have some
> > "dl_{throttled,boosted,...}()" inline functions for reading the values
> > of the flags, and some inline functions for setting / clearing the
> > flags. For these, I have two possibilities:  
> 
> > - using two separate "dl_set_{throttled,...}()" and
> >   "dl_clear_{throttled,..}()" functions for each flag  
> 
> > - using one single "dl_set_{throttled,...}(dl, value)" function per
> >   flag, in which the flag's value is specified.
> > 
> > I have no preferences (with the first proposal, I introduce more inline
> > functions, but I think the functions can be made more efficient /
> > optimized). Which one of the two proposals is preferred? (or, there is
> > a third, better, idea that I overlooked?)  
> 
>  - Use bitfields and let the compiler sort it out.
> 
>  - Use macros to generate all the inlines as per the first.
> 
> 
> Personally, because I'm lazy, I'd try the bitfield thing first and see
> what kind code that generates. If that's not too horrendous, keep it :-)

Thanks for the suggestions; I'll test the C bitfields and I'll see how
the assembly generated by gcc compares with the inline functions (I did
not propose this idea originally because I got the impression that
people tend not to trust gcc)


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-25 Thread Luca Abeni
On Mon, 24 Jul 2017 11:04:52 +0200
Peter Zijlstra  wrote:

> On Mon, Jul 24, 2017 at 10:06:09AM +0200, Luca Abeni wrote:
> > > Yes, grouping all the flags in a single field was my intention too... I
> > > planned to submit a patch to do this after merging the reclaiming
> > > patches... But maybe it is better to do this first :)  
> > 
> > I implemented this change, but before submitting the patch I have a
> > small question.
> > I implemented some helpers to access the various
> > {throttled,boosted,yielded,non_contending} flags. I have some
> > "dl_{throttled,boosted,...}()" inline functions for reading the values
> > of the flags, and some inline functions for setting / clearing the
> > flags. For these, I have two possibilities:  
> 
> > - using two separate "dl_set_{throttled,...}()" and
> >   "dl_clear_{throttled,..}()" functions for each flag  
> 
> > - using one single "dl_set_{throttled,...}(dl, value)" function per
> >   flag, in which the flag's value is specified.
> > 
> > I have no preferences (with the first proposal, I introduce more inline
> > functions, but I think the functions can be made more efficient /
> > optimized). Which one of the two proposals is preferred? (or, there is
> > a third, better, idea that I overlooked?)  
> 
>  - Use bitfields and let the compiler sort it out.
> 
>  - Use macros to generate all the inlines as per the first.
> 
> 
> Personally, because I'm lazy, I'd try the bitfield thing first and see
> what kind code that generates. If that's not too horrendous, keep it :-)

Thanks for the suggestions; I'll test the C bitfields and I'll see how
the assembly generated by gcc compares with the inline functions (I did
not propose this idea originally because I got the impression that
people tend not to trust gcc)


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Peter Zijlstra
On Mon, Jul 24, 2017 at 09:54:54AM +0200, Luca Abeni wrote:
> Hi Peter,
> 
> I put this change in a local tree together with other fixes / cleanups
> I plan to submit in the next weeks. Should I send it together with the
> other patches, or are you going to apply it separately?

Posting them in a series is fine; it is customary to put independent
things first such that they will not get stuck after the larger changes.

> In the first case, what is the correct authorship / SOB chain (I ask
> because I keep getting this wrong every time :)

Yes, this is a 'fun' case :-) I'd just merge the change into your patch
introducing it and forget I 'contributed' the name change.

For larger patches you could do something like (in your email body):


From: Peter Zijlstra 

Changelog goes here...

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Luca...
---

$PATCH


Which says this patch is from me, carried by you, and then I'll stick
another SoB on to indicated I took it back. Its a bit weird, but we've
done it before.


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Peter Zijlstra
On Mon, Jul 24, 2017 at 09:54:54AM +0200, Luca Abeni wrote:
> Hi Peter,
> 
> I put this change in a local tree together with other fixes / cleanups
> I plan to submit in the next weeks. Should I send it together with the
> other patches, or are you going to apply it separately?

Posting them in a series is fine; it is customary to put independent
things first such that they will not get stuck after the larger changes.

> In the first case, what is the correct authorship / SOB chain (I ask
> because I keep getting this wrong every time :)

Yes, this is a 'fun' case :-) I'd just merge the change into your patch
introducing it and forget I 'contributed' the name change.

For larger patches you could do something like (in your email body):


From: Peter Zijlstra 

Changelog goes here...

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Luca...
---

$PATCH


Which says this patch is from me, carried by you, and then I'll stick
another SoB on to indicated I took it back. Its a bit weird, but we've
done it before.


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Peter Zijlstra
On Mon, Jul 24, 2017 at 10:06:09AM +0200, Luca Abeni wrote:
> > Yes, grouping all the flags in a single field was my intention too... I
> > planned to submit a patch to do this after merging the reclaiming
> > patches... But maybe it is better to do this first :)
> 
> I implemented this change, but before submitting the patch I have a
> small question.
> I implemented some helpers to access the various
> {throttled,boosted,yielded,non_contending} flags. I have some
> "dl_{throttled,boosted,...}()" inline functions for reading the values
> of the flags, and some inline functions for setting / clearing the
> flags. For these, I have two possibilities:

> - using two separate "dl_set_{throttled,...}()" and
>   "dl_clear_{throttled,..}()" functions for each flag

> - using one single "dl_set_{throttled,...}(dl, value)" function per
>   flag, in which the flag's value is specified.
> 
> I have no preferences (with the first proposal, I introduce more inline
> functions, but I think the functions can be made more efficient /
> optimized). Which one of the two proposals is preferred? (or, there is
> a third, better, idea that I overlooked?)

 - Use bitfields and let the compiler sort it out.

 - Use macros to generate all the inlines as per the first.


Personally, because I'm lazy, I'd try the bitfield thing first and see
what kind code that generates. If that's not too horrendous, keep it :-)


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Peter Zijlstra
On Mon, Jul 24, 2017 at 10:06:09AM +0200, Luca Abeni wrote:
> > Yes, grouping all the flags in a single field was my intention too... I
> > planned to submit a patch to do this after merging the reclaiming
> > patches... But maybe it is better to do this first :)
> 
> I implemented this change, but before submitting the patch I have a
> small question.
> I implemented some helpers to access the various
> {throttled,boosted,yielded,non_contending} flags. I have some
> "dl_{throttled,boosted,...}()" inline functions for reading the values
> of the flags, and some inline functions for setting / clearing the
> flags. For these, I have two possibilities:

> - using two separate "dl_set_{throttled,...}()" and
>   "dl_clear_{throttled,..}()" functions for each flag

> - using one single "dl_set_{throttled,...}(dl, value)" function per
>   flag, in which the flag's value is specified.
> 
> I have no preferences (with the first proposal, I introduce more inline
> functions, but I think the functions can be made more efficient /
> optimized). Which one of the two proposals is preferred? (or, there is
> a third, better, idea that I overlooked?)

 - Use bitfields and let the compiler sort it out.

 - Use macros to generate all the inlines as per the first.


Personally, because I'm lazy, I'd try the bitfield thing first and see
what kind code that generates. If that's not too horrendous, keep it :-)


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Luca Abeni
Hi Peter,

On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:

> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> >   
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index d67eee8..952cac8 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> > >*
> > >* @dl_yielded tells if task gave up the CPU before
> > > consuming
> > >* all its available runtime during the last job.
> > > +  *
> > > +  * @dl_non_contending tells if task is inactive while still
> > > +  * contributing to the active utilization. In other words,
> > > it
> > > +  * indicates if the inactive timer has been armed and its
> > > handler
> > > +  * has not been executed yet. This flag is useful to avoid
> > > race
> > > +  * conditions between the inactive timer handler and the
> > > wakeup
> > > +  * code.
> > >*/
> > >   int dl_throttled;
> > >   int dl_boosted;
> > >   int dl_yielded;
> > > + int dl_non_contending;
> > 
> > We should maybe look into making those bits :1, not something for this
> > patch though;  
> 
> Yes, grouping all the flags in a single field was my intention too... I
> planned to submit a patch to do this after merging the reclaiming
> patches... But maybe it is better to do this first :)

I implemented this change, but before submitting the patch I have a
small question.
I implemented some helpers to access the various
{throttled,boosted,yielded,non_contending} flags. I have some
"dl_{throttled,boosted,...}()" inline functions for reading the values
of the flags, and some inline functions for setting / clearing the
flags. For these, I have two possibilities:
- using two separate "dl_set_{throttled,...}()" and
  "dl_clear_{throttled,..}()" functions for each flag
- using one single "dl_set_{throttled,...}(dl, value)" function per
  flag, in which the flag's value is specified.

I have no preferences (with the first proposal, I introduce more inline
functions, but I think the functions can be made more efficient /
optimized). Which one of the two proposals is preferred? (or, there is
a third, better, idea that I overlooked?)


Thanks,
Luca

> 
> 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index cef9adb..86aed82 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > > *dl_rq) dl_rq->running_bw = 0;
> > >  }
> > >  
> > > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > > +{
> > > + if (!task_on_rq_queued(p)) {
> > > + struct rq *rq = task_rq(p);
> > > +
> > > + if (p->dl.dl_non_contending) {
> > > + sub_running_bw(p->dl.dl_bw, >dl);
> > > + p->dl.dl_non_contending = 0;
> > > + /*
> > > +  * If the timer handler is currently
> > > running and the
> > > +  * timer cannot be cancelled,
> > > inactive_task_timer()
> > > +  * will see that dl_not_contending is not
> > > set, and
> > > +  * will not touch the rq's active
> > > utilization,
> > > +  * so we are still safe.
> > > +  */
> > > + if
> > > (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> > > + put_task_struct(p);
> > > + }
> > > + }
> > > +}
> > 
> > If we rearrange that slightly we can avoid the double indent:  
> [...]
> Ok; I'll rework the code in this way on Monday.
> 
> [...]
> > > + if (hrtimer_try_to_cancel(_se->inactive_timer)
> > > == 1)
> > > + put_task_struct(dl_task_of(dl_se));
> > > + dl_se->dl_non_contending = 0;
> > 
> > This had me worried for a little while as being a use-after-free, but
> > this put_task_struct() cannot be the last in this case. Might be
> > worth a comment.  
> 
> Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
> hrtimer_try_to_cancel()?
> 
> >   
> > > + } else {
> > > + /*
> > > +  * Since "dl_non_contending" is not set, the
> > > +  * task's utilization has already been removed from
> > > +  * active utilization (either when the task
> > > blocked,
> > > +  * when the "inactive timer" fired).
> > > +  * So, add it back.
> > > +  */
> > > + add_running_bw(dl_se->dl_bw, dl_rq);
> > > + }
> > > +}
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Luca Abeni
Hi Peter,

On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:

> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> >   
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index d67eee8..952cac8 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> > >*
> > >* @dl_yielded tells if task gave up the CPU before
> > > consuming
> > >* all its available runtime during the last job.
> > > +  *
> > > +  * @dl_non_contending tells if task is inactive while still
> > > +  * contributing to the active utilization. In other words,
> > > it
> > > +  * indicates if the inactive timer has been armed and its
> > > handler
> > > +  * has not been executed yet. This flag is useful to avoid
> > > race
> > > +  * conditions between the inactive timer handler and the
> > > wakeup
> > > +  * code.
> > >*/
> > >   int dl_throttled;
> > >   int dl_boosted;
> > >   int dl_yielded;
> > > + int dl_non_contending;
> > 
> > We should maybe look into making those bits :1, not something for this
> > patch though;  
> 
> Yes, grouping all the flags in a single field was my intention too... I
> planned to submit a patch to do this after merging the reclaiming
> patches... But maybe it is better to do this first :)

I implemented this change, but before submitting the patch I have a
small question.
I implemented some helpers to access the various
{throttled,boosted,yielded,non_contending} flags. I have some
"dl_{throttled,boosted,...}()" inline functions for reading the values
of the flags, and some inline functions for setting / clearing the
flags. For these, I have two possibilities:
- using two separate "dl_set_{throttled,...}()" and
  "dl_clear_{throttled,..}()" functions for each flag
- using one single "dl_set_{throttled,...}(dl, value)" function per
  flag, in which the flag's value is specified.

I have no preferences (with the first proposal, I introduce more inline
functions, but I think the functions can be made more efficient /
optimized). Which one of the two proposals is preferred? (or, there is
a third, better, idea that I overlooked?)


Thanks,
Luca

> 
> 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index cef9adb..86aed82 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > > *dl_rq) dl_rq->running_bw = 0;
> > >  }
> > >  
> > > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > > +{
> > > + if (!task_on_rq_queued(p)) {
> > > + struct rq *rq = task_rq(p);
> > > +
> > > + if (p->dl.dl_non_contending) {
> > > + sub_running_bw(p->dl.dl_bw, >dl);
> > > + p->dl.dl_non_contending = 0;
> > > + /*
> > > +  * If the timer handler is currently
> > > running and the
> > > +  * timer cannot be cancelled,
> > > inactive_task_timer()
> > > +  * will see that dl_not_contending is not
> > > set, and
> > > +  * will not touch the rq's active
> > > utilization,
> > > +  * so we are still safe.
> > > +  */
> > > + if
> > > (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> > > + put_task_struct(p);
> > > + }
> > > + }
> > > +}
> > 
> > If we rearrange that slightly we can avoid the double indent:  
> [...]
> Ok; I'll rework the code in this way on Monday.
> 
> [...]
> > > + if (hrtimer_try_to_cancel(_se->inactive_timer)
> > > == 1)
> > > + put_task_struct(dl_task_of(dl_se));
> > > + dl_se->dl_non_contending = 0;
> > 
> > This had me worried for a little while as being a use-after-free, but
> > this put_task_struct() cannot be the last in this case. Might be
> > worth a comment.  
> 
> Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
> hrtimer_try_to_cancel()?
> 
> >   
> > > + } else {
> > > + /*
> > > +  * Since "dl_non_contending" is not set, the
> > > +  * task's utilization has already been removed from
> > > +  * active utilization (either when the task
> > > blocked,
> > > +  * when the "inactive timer" fired).
> > > +  * So, add it back.
> > > +  */
> > > + add_running_bw(dl_se->dl_bw, dl_rq);
> > > + }
> > > +}
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Luca Abeni
Hi Peter,

I put this change in a local tree together with other fixes / cleanups
I plan to submit in the next weeks. Should I send it together with the
other patches, or are you going to apply it separately?
In the first case, what is the correct authorship / SOB chain (I ask
because I keep getting this wrong every time :)


Thanks,
Luca

On Fri, 24 Mar 2017 14:23:51 +0100
Peter Zijlstra  wrote:

> On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> > @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> > policy,
> >!__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> > __dl_clear(dl_b, p->dl.dl_bw);
> > __dl_add(dl_b, new_bw);
> > +   dl_change_utilization(p, new_bw);
> > err = 0;  
> 
> Every time I see that I want to do this..
> 
> 
> ---
>  kernel/sched/core.c | 4 ++--
>  kernel/sched/deadline.c | 2 +-
>  kernel/sched/sched.h| 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3b31fc05a0f1..b845ee4b3e55 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2512,11 +2512,11 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>   err = 0;
>   } else if (dl_policy(policy) && task_has_dl_policy(p) &&
>  !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> - __dl_clear(dl_b, p->dl.dl_bw);
> + __dl_sub(dl_b, p->dl.dl_bw);
>   __dl_add(dl_b, new_bw);
>   err = 0;
>   } else if (!dl_policy(policy) && task_has_dl_policy(p)) {
> - __dl_clear(dl_b, p->dl.dl_bw);
> + __dl_sub(dl_b, p->dl.dl_bw);
>   err = 0;
>   }
>   raw_spin_unlock(_b->lock);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a2ce59015642..229660088138 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1695,7 +1695,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
>* until we complete the update.
>*/
>   raw_spin_lock(_dl_b->lock);
> - __dl_clear(src_dl_b, p->dl.dl_bw);
> + __dl_sub(src_dl_b, p->dl.dl_bw);
>   raw_spin_unlock(_dl_b->lock);
>   }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5cbf92214ad8..1a521324ecee 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -226,7 +226,7 @@ struct dl_bw {
>  };
>  
>  static inline
> -void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
> +void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw)
>  {
>   dl_b->total_bw -= tsk_bw;
>  }



Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-07-24 Thread Luca Abeni
Hi Peter,

I put this change in a local tree together with other fixes / cleanups
I plan to submit in the next weeks. Should I send it together with the
other patches, or are you going to apply it separately?
In the first case, what is the correct authorship / SOB chain (I ask
because I keep getting this wrong every time :)


Thanks,
Luca

On Fri, 24 Mar 2017 14:23:51 +0100
Peter Zijlstra  wrote:

> On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> > @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> > policy,
> >!__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> > __dl_clear(dl_b, p->dl.dl_bw);
> > __dl_add(dl_b, new_bw);
> > +   dl_change_utilization(p, new_bw);
> > err = 0;  
> 
> Every time I see that I want to do this..
> 
> 
> ---
>  kernel/sched/core.c | 4 ++--
>  kernel/sched/deadline.c | 2 +-
>  kernel/sched/sched.h| 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3b31fc05a0f1..b845ee4b3e55 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2512,11 +2512,11 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>   err = 0;
>   } else if (dl_policy(policy) && task_has_dl_policy(p) &&
>  !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> - __dl_clear(dl_b, p->dl.dl_bw);
> + __dl_sub(dl_b, p->dl.dl_bw);
>   __dl_add(dl_b, new_bw);
>   err = 0;
>   } else if (!dl_policy(policy) && task_has_dl_policy(p)) {
> - __dl_clear(dl_b, p->dl.dl_bw);
> + __dl_sub(dl_b, p->dl.dl_bw);
>   err = 0;
>   }
>   raw_spin_unlock(_b->lock);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a2ce59015642..229660088138 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1695,7 +1695,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
>* until we complete the update.
>*/
>   raw_spin_lock(_dl_b->lock);
> - __dl_clear(src_dl_b, p->dl.dl_bw);
> + __dl_sub(src_dl_b, p->dl.dl_bw);
>   raw_spin_unlock(_dl_b->lock);
>   }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5cbf92214ad8..1a521324ecee 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -226,7 +226,7 @@ struct dl_bw {
>  };
>  
>  static inline
> -void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
> +void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw)
>  {
>   dl_b->total_bw -= tsk_bw;
>  }



Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Claudio Scordino
Hi guys,

2017-03-27 10:20 GMT+02:00 Luca Abeni :
> On Fri, 24 Mar 2017 22:31:46 -0400
> Steven Rostedt  wrote:
>
>> On Fri, 24 Mar 2017 22:47:15 +0100
>> luca abeni  wrote:
>>
>> > Ok... Since I am not good at ascii art, would it be ok to add a
>> > textual description? If yes, I'll add a comment like:
>> > "
>> > The utilization of a task is added to the runqueue's active
>> > utilization when the task becomes active (is enqueued in the
>> > runqueue), and is removed when the task becomes inactive. A task
>> > does not become immediately inactive when it blocks, but becomes
>> > inactive at the so called "0 lag time"; so, we setup the "inactive
>> > timer" to fire at the "0 lag time". When the "inactive timer"
>> > fires, the task utilization is removed from the runqueue's active
>> > utilization. If the task wakes up again on the same runqueue before
>> > the "0 lag time", the active utilization must not be changed and
>> > the "inactive timer" must be cancelled. If the task wakes up again
>> > on a different runqueue before the "0 lag time", then the task's
>> > utilization must be removed from the previous runqueue's active
>> > utilization and must be added to the new runqueue's active
>> > utilization. In order to avoid races between a task waking up on a
>> > runqueue while the "inactive timer" is running on a different CPU,
>> > the "dl_non_contending" flag is used to indicate that a task is not
>> > on a runqueue but is active (so, the flag is set when the task
>> > blocks and is cleared when the "inactive timer" fires or when the
>> > task  wakes up).
>>
>> Sure, the above is great if you never want anyone to read it ;)
>>
>> Can you please break it up a little. My head starts to spin by the
>> third line down.
>
> Ok... Maybe finding a clean and understandable way to explain the
> above sentence is something that can be done at the OSPM summit?

What about adding the following text to Documentation/ ?
Does it make things clearer ?

Cheers,

 Claudio




The following figure illustrates the state names of the GRUB algorithm:

  
  (d)|   Active   |
   ->||
   | | Contending |
   |  
   |A  |
   --   |  |
  |  |  |  |
  | Inactive |  |(b)   | (a)
  |  |  |  |
   --   |  |
   A|  V
   |  
   | |   Active   |
   --| Non|
  (c)| Contending |
  

A task can be in one of the following states:

 - ActiveContending: if it is ready for execution (or executing);

 - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
   time;

 - Inactive: if it is blocked and has surpassed the 0-lag time.


For each runqueue, the algorithm GRUB keeps track of two different bandwidths:

 - Active bandwidth (running_bw): this is the sum of the bandwidths of all
   tasks in active state (i.e., ActiveContending or ActiveNonContending);

 - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
   runqueue, including the tasks in Inactive state.


State transitions:

 (a) When a task blocks, it does not become immediately inactive since its
 bandwidth cannot be immediately reclaimed without breaking the
 real-time guarantees. It therefore enters a transitional state called
 ActiveNonContending. The scheduler arms the "inactive timer" to fire at
 the 0-lag time, when the task's bandwidth can be reclaimed without
 breaking the real-time guarantees.

 (b) If the task wakes up before the inactive timer fires, the task re-enters
 the ActiveContending state and the "inactive timer" is canceled.
 In addition, if the task wakes up on a different runqueue, then
 the task's utilization must be removed from the previous runqueue's active
 utilization and must be added to the new runqueue's active utilization.
 In order to avoid races between a task waking up on a runqueue while the
  "inactive timer" is running on a different CPU, the "dl_non_contending"
 flag is used to indicate that a task is not on a runqueue but is active
 (so, the flag is set when the task blocks and is cleared when the
 "inactive timer" fires or when the task  wakes up).

 (c) When the "inactive timer" fires, the task enters the Inactive state and its
 utilization is removed from the runqueue's active utilization.

 (d) When an inactive task wakes up, it enters the ActiveContending state and
 its utilization is added to the active utilization of the runqueue where
 it has been enqueued.


The algorithm reclaims the bandwidth of the tasks in Inactive state.
It 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Claudio Scordino
Hi guys,

2017-03-27 10:20 GMT+02:00 Luca Abeni :
> On Fri, 24 Mar 2017 22:31:46 -0400
> Steven Rostedt  wrote:
>
>> On Fri, 24 Mar 2017 22:47:15 +0100
>> luca abeni  wrote:
>>
>> > Ok... Since I am not good at ascii art, would it be ok to add a
>> > textual description? If yes, I'll add a comment like:
>> > "
>> > The utilization of a task is added to the runqueue's active
>> > utilization when the task becomes active (is enqueued in the
>> > runqueue), and is removed when the task becomes inactive. A task
>> > does not become immediately inactive when it blocks, but becomes
>> > inactive at the so called "0 lag time"; so, we setup the "inactive
>> > timer" to fire at the "0 lag time". When the "inactive timer"
>> > fires, the task utilization is removed from the runqueue's active
>> > utilization. If the task wakes up again on the same runqueue before
>> > the "0 lag time", the active utilization must not be changed and
>> > the "inactive timer" must be cancelled. If the task wakes up again
>> > on a different runqueue before the "0 lag time", then the task's
>> > utilization must be removed from the previous runqueue's active
>> > utilization and must be added to the new runqueue's active
>> > utilization. In order to avoid races between a task waking up on a
>> > runqueue while the "inactive timer" is running on a different CPU,
>> > the "dl_non_contending" flag is used to indicate that a task is not
>> > on a runqueue but is active (so, the flag is set when the task
>> > blocks and is cleared when the "inactive timer" fires or when the
>> > task  wakes up).
>>
>> Sure, the above is great if you never want anyone to read it ;)
>>
>> Can you please break it up a little. My head starts to spin by the
>> third line down.
>
> Ok... Maybe finding a clean and understandable way to explain the
> above sentence is something that can be done at the OSPM summit?

What about adding the following text to Documentation/ ?
Does it make things clearer ?

Cheers,

 Claudio




The following figure illustrates the state names of the GRUB algorithm:

  
  (d)|   Active   |
   ->||
   | | Contending |
   |  
   |A  |
   --   |  |
  |  |  |  |
  | Inactive |  |(b)   | (a)
  |  |  |  |
   --   |  |
   A|  V
   |  
   | |   Active   |
   --| Non|
  (c)| Contending |
  

A task can be in one of the following states:

 - ActiveContending: if it is ready for execution (or executing);

 - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
   time;

 - Inactive: if it is blocked and has surpassed the 0-lag time.


For each runqueue, the algorithm GRUB keeps track of two different bandwidths:

 - Active bandwidth (running_bw): this is the sum of the bandwidths of all
   tasks in active state (i.e., ActiveContending or ActiveNonContending);

 - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
   runqueue, including the tasks in Inactive state.


State transitions:

 (a) When a task blocks, it does not become immediately inactive since its
 bandwidth cannot be immediately reclaimed without breaking the
 real-time guarantees. It therefore enters a transitional state called
 ActiveNonContending. The scheduler arms the "inactive timer" to fire at
 the 0-lag time, when the task's bandwidth can be reclaimed without
 breaking the real-time guarantees.

 (b) If the task wakes up before the inactive timer fires, the task re-enters
 the ActiveContending state and the "inactive timer" is canceled.
 In addition, if the task wakes up on a different runqueue, then
 the task's utilization must be removed from the previous runqueue's active
 utilization and must be added to the new runqueue's active utilization.
 In order to avoid races between a task waking up on a runqueue while the
  "inactive timer" is running on a different CPU, the "dl_non_contending"
 flag is used to indicate that a task is not on a runqueue but is active
 (so, the flag is set when the task blocks and is cleared when the
 "inactive timer" fires or when the task  wakes up).

 (c) When the "inactive timer" fires, the task enters the Inactive state and its
 utilization is removed from the runqueue's active utilization.

 (d) When an inactive task wakes up, it enters the ActiveContending state and
 its utilization is added to the active utilization of the runqueue where
 it has been enqueued.


The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Juri Lelli
On 27/03/17 09:43, Luca Abeni wrote:
> Hi Juri,
> 
> On Mon, 27 Mar 2017 08:17:45 +0100
> Juri Lelli  wrote:
> [...]
> > > > In general I feel it would be nice to have a state diagram
> > > > included somewhere near these two functions. It would be nice to
> > > > not have to dig out the PDF every time.  
> > > 
> > > Ok... Since I am not good at ascii art, would it be ok to add a
> > > textual description? If yes, I'll add a comment like:
> > > "
> > > The utilization of a task is added to the runqueue's active
> > > utilization when the task becomes active (is enqueued in the
> > > runqueue), and is  
> > 
> > Is enqueued for the first time on a new period, maybe? It seems to be
> > contradictory w.r.t. what below (if wakeup before 0 lag time)
> > otherwise.
> I think it should be "is enqueued in the runqueue and was previously
> not active" (I did not write the "and was previously not active" to

Right.

> avoid complicanting the sentence even more... But this
> "simplification" was not a good idea :). The fact that this happens in a
> new period or not is (in my understanding) irrelevant...
> 
> 
> > > removed when the task becomes inactive. A task does not become
> > > immediately inactive when it blocks, but becomes inactive at the so
> > > called "0 lag time"; so, we setup the "inactive timer" to fire at
> > > the "0 lag time". When the "inactive timer" fires, the task
> > > utilization is removed from the runqueue's active utilization. If
> > > the task wakes up again on the same runqueue before the "0 lag
> > > time", the active utilization must not be changed and the "inactive
> > > timer" must be cancelled. If the task wakes up again on a different
> > > runqueue before the "0 lag time", then the task's utilization must
> > > be removed from the previous runqueue's active utilization and must
> > > be added to the new runqueue's active utilization.
> > > In order to avoid races between a task waking up on a runqueue
> > > while the "inactive timer" is running on a different CPU, the
> > > "dl_non_contending" flag is used to indicate that a task is not on
> > > a runqueue but is active (so, the flag is set when the task blocks
> > > and is cleared when the "inactive timer" fires or when the task
> > > wakes up). "
> > > (if this is ok, where can I add this comment?)
> > >   
> > 
> > Thanks for this Luca. Not sure it adds much to your text above, but we
> > might want to consider adding something like below?
> > 
> > --->8---  
> >1st enqueue   +--+
> >  |  |
> >+>+ ACTIVEcontending |
> >| |  |
> >| ++--+--+
> >|  |  ^
> >|  |  |
> >   ++---+  |  |
> >   || dequeue  |  |  wakeup before
> >   |INACTIVE|  |  |  0 lag time
> >   ||  |  |
> >   ++---+  |  |
> >^  |  |
> >|  V  |
> >| ++--+--+
> >| |  |
> >+-+ ACTIVEnonCONTEND |
> >  |  |
> > 0 lag time   +--+
> > elapsed
> > --->8---  
> 
> I am not sure if introducing the "active non contending" name is a good
> idea or not (see my previous email), but I am not the best person to
> decide this... If people like this figure, I am more than happy to add
> it :)
> (but then maybe we can change "0 lag time elapsed" with "inactive timer
> fires" and we can display in the figure the state of the
> "dl_non_contending"/"inactive_timer_armed" flag)
> 

Sure. Let's see what people think about what you say in the other email
and I'll update the figure accordingly.


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Juri Lelli
On 27/03/17 09:43, Luca Abeni wrote:
> Hi Juri,
> 
> On Mon, 27 Mar 2017 08:17:45 +0100
> Juri Lelli  wrote:
> [...]
> > > > In general I feel it would be nice to have a state diagram
> > > > included somewhere near these two functions. It would be nice to
> > > > not have to dig out the PDF every time.  
> > > 
> > > Ok... Since I am not good at ascii art, would it be ok to add a
> > > textual description? If yes, I'll add a comment like:
> > > "
> > > The utilization of a task is added to the runqueue's active
> > > utilization when the task becomes active (is enqueued in the
> > > runqueue), and is  
> > 
> > Is enqueued for the first time on a new period, maybe? It seems to be
> > contradictory w.r.t. what below (if wakeup before 0 lag time)
> > otherwise.
> I think it should be "is enqueued in the runqueue and was previously
> not active" (I did not write the "and was previously not active" to

Right.

> avoid complicanting the sentence even more... But this
> "simplification" was not a good idea :). The fact that this happens in a
> new period or not is (in my understanding) irrelevant...
> 
> 
> > > removed when the task becomes inactive. A task does not become
> > > immediately inactive when it blocks, but becomes inactive at the so
> > > called "0 lag time"; so, we setup the "inactive timer" to fire at
> > > the "0 lag time". When the "inactive timer" fires, the task
> > > utilization is removed from the runqueue's active utilization. If
> > > the task wakes up again on the same runqueue before the "0 lag
> > > time", the active utilization must not be changed and the "inactive
> > > timer" must be cancelled. If the task wakes up again on a different
> > > runqueue before the "0 lag time", then the task's utilization must
> > > be removed from the previous runqueue's active utilization and must
> > > be added to the new runqueue's active utilization.
> > > In order to avoid races between a task waking up on a runqueue
> > > while the "inactive timer" is running on a different CPU, the
> > > "dl_non_contending" flag is used to indicate that a task is not on
> > > a runqueue but is active (so, the flag is set when the task blocks
> > > and is cleared when the "inactive timer" fires or when the task
> > > wakes up). "
> > > (if this is ok, where can I add this comment?)
> > >   
> > 
> > Thanks for this Luca. Not sure it adds much to your text above, but we
> > might want to consider adding something like below?
> > 
> > --->8---  
> >1st enqueue   +--+
> >  |  |
> >+>+ ACTIVEcontending |
> >| |  |
> >| ++--+--+
> >|  |  ^
> >|  |  |
> >   ++---+  |  |
> >   || dequeue  |  |  wakeup before
> >   |INACTIVE|  |  |  0 lag time
> >   ||  |  |
> >   ++---+  |  |
> >^  |  |
> >|  V  |
> >| ++--+--+
> >| |  |
> >+-+ ACTIVEnonCONTEND |
> >  |  |
> > 0 lag time   +--+
> > elapsed
> > --->8---  
> 
> I am not sure if introducing the "active non contending" name is a good
> idea or not (see my previous email), but I am not the best person to
> decide this... If people like this figure, I am more than happy to add
> it :)
> (but then maybe we can change "0 lag time elapsed" with "inactive timer
> fires" and we can display in the figure the state of the
> "dl_non_contending"/"inactive_timer_armed" flag)
> 

Sure. Let's see what people think about what you say in the other email
and I'll update the figure accordingly.


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
On Fri, 24 Mar 2017 22:31:46 -0400
Steven Rostedt  wrote:

> On Fri, 24 Mar 2017 22:47:15 +0100
> luca abeni  wrote:
> 
> > Ok... Since I am not good at ascii art, would it be ok to add a
> > textual description? If yes, I'll add a comment like:
> > "
> > The utilization of a task is added to the runqueue's active
> > utilization when the task becomes active (is enqueued in the
> > runqueue), and is removed when the task becomes inactive. A task
> > does not become immediately inactive when it blocks, but becomes
> > inactive at the so called "0 lag time"; so, we setup the "inactive
> > timer" to fire at the "0 lag time". When the "inactive timer"
> > fires, the task utilization is removed from the runqueue's active
> > utilization. If the task wakes up again on the same runqueue before
> > the "0 lag time", the active utilization must not be changed and
> > the "inactive timer" must be cancelled. If the task wakes up again
> > on a different runqueue before the "0 lag time", then the task's
> > utilization must be removed from the previous runqueue's active
> > utilization and must be added to the new runqueue's active
> > utilization. In order to avoid races between a task waking up on a
> > runqueue while the "inactive timer" is running on a different CPU,
> > the "dl_non_contending" flag is used to indicate that a task is not
> > on a runqueue but is active (so, the flag is set when the task
> > blocks and is cleared when the "inactive timer" fires or when the
> > task  wakes up).  
> 
> Sure, the above is great if you never want anyone to read it ;)
> 
> Can you please break it up a little. My head starts to spin by the
> third line down.

Ok... Maybe finding a clean and understandable way to explain the
above sentence is something that can be done at the OSPM summit?



Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
On Fri, 24 Mar 2017 22:31:46 -0400
Steven Rostedt  wrote:

> On Fri, 24 Mar 2017 22:47:15 +0100
> luca abeni  wrote:
> 
> > Ok... Since I am not good at ascii art, would it be ok to add a
> > textual description? If yes, I'll add a comment like:
> > "
> > The utilization of a task is added to the runqueue's active
> > utilization when the task becomes active (is enqueued in the
> > runqueue), and is removed when the task becomes inactive. A task
> > does not become immediately inactive when it blocks, but becomes
> > inactive at the so called "0 lag time"; so, we setup the "inactive
> > timer" to fire at the "0 lag time". When the "inactive timer"
> > fires, the task utilization is removed from the runqueue's active
> > utilization. If the task wakes up again on the same runqueue before
> > the "0 lag time", the active utilization must not be changed and
> > the "inactive timer" must be cancelled. If the task wakes up again
> > on a different runqueue before the "0 lag time", then the task's
> > utilization must be removed from the previous runqueue's active
> > utilization and must be added to the new runqueue's active
> > utilization. In order to avoid races between a task waking up on a
> > runqueue while the "inactive timer" is running on a different CPU,
> > the "dl_non_contending" flag is used to indicate that a task is not
> > on a runqueue but is active (so, the flag is set when the task
> > blocks and is cleared when the "inactive timer" fires or when the
> > task  wakes up).  
> 
> Sure, the above is great if you never want anyone to read it ;)
> 
> Can you please break it up a little. My head starts to spin by the
> third line down.

Ok... Maybe finding a clean and understandable way to explain the
above sentence is something that can be done at the OSPM summit?



Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
Hi Juri,

On Mon, 27 Mar 2017 08:17:45 +0100
Juri Lelli  wrote:
[...]
> > > In general I feel it would be nice to have a state diagram
> > > included somewhere near these two functions. It would be nice to
> > > not have to dig out the PDF every time.  
> > 
> > Ok... Since I am not good at ascii art, would it be ok to add a
> > textual description? If yes, I'll add a comment like:
> > "
> > The utilization of a task is added to the runqueue's active
> > utilization when the task becomes active (is enqueued in the
> > runqueue), and is  
> 
> Is enqueued for the first time on a new period, maybe? It seems to be
> contradictory w.r.t. what below (if wakeup before 0 lag time)
> otherwise.
I think it should be "is enqueued in the runqueue and was previously
not active" (I did not write the "and was previously not active" to
avoid complicanting the sentence even more... But this
"simplification" was not a good idea :). The fact that this happens in a
new period or not is (in my understanding) irrelevant...


> > removed when the task becomes inactive. A task does not become
> > immediately inactive when it blocks, but becomes inactive at the so
> > called "0 lag time"; so, we setup the "inactive timer" to fire at
> > the "0 lag time". When the "inactive timer" fires, the task
> > utilization is removed from the runqueue's active utilization. If
> > the task wakes up again on the same runqueue before the "0 lag
> > time", the active utilization must not be changed and the "inactive
> > timer" must be cancelled. If the task wakes up again on a different
> > runqueue before the "0 lag time", then the task's utilization must
> > be removed from the previous runqueue's active utilization and must
> > be added to the new runqueue's active utilization.
> > In order to avoid races between a task waking up on a runqueue
> > while the "inactive timer" is running on a different CPU, the
> > "dl_non_contending" flag is used to indicate that a task is not on
> > a runqueue but is active (so, the flag is set when the task blocks
> > and is cleared when the "inactive timer" fires or when the task
> > wakes up). "
> > (if this is ok, where can I add this comment?)
> >   
> 
> Thanks for this Luca. Not sure it adds much to your text above, but we
> might want to consider adding something like below?
> 
> --->8---  
>1st enqueue   +--+
>  |  |
>+>+ ACTIVEcontending |
>| |  |
>| ++--+--+
>|  |  ^
>|  |  |
>   ++---+  |  |
>   || dequeue  |  |  wakeup before
>   |INACTIVE|  |  |  0 lag time
>   ||  |  |
>   ++---+  |  |
>^  |  |
>|  V  |
>| ++--+--+
>| |  |
>+-+ ACTIVEnonCONTEND |
>  |  |
> 0 lag time   +--+
> elapsed
> --->8---  

I am not sure if introducing the "active non contending" name is a good
idea or not (see my previous email), but I am not the best person to
decide this... If people like this figure, I am more than happy to add
it :)
(but then maybe we can change "0 lag time elapsed" with "inactive timer
fires" and we can display in the figure the state of the
"dl_non_contending"/"inactive_timer_armed" flag)


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
Hi Juri,

On Mon, 27 Mar 2017 08:17:45 +0100
Juri Lelli  wrote:
[...]
> > > In general I feel it would be nice to have a state diagram
> > > included somewhere near these two functions. It would be nice to
> > > not have to dig out the PDF every time.  
> > 
> > Ok... Since I am not good at ascii art, would it be ok to add a
> > textual description? If yes, I'll add a comment like:
> > "
> > The utilization of a task is added to the runqueue's active
> > utilization when the task becomes active (is enqueued in the
> > runqueue), and is  
> 
> Is enqueued for the first time on a new period, maybe? It seems to be
> contradictory w.r.t. what below (if wakeup before 0 lag time)
> otherwise.
I think it should be "is enqueued in the runqueue and was previously
not active" (I did not write the "and was previously not active" to
avoid complicanting the sentence even more... But this
"simplification" was not a good idea :). The fact that this happens in a
new period or not is (in my understanding) irrelevant...


> > removed when the task becomes inactive. A task does not become
> > immediately inactive when it blocks, but becomes inactive at the so
> > called "0 lag time"; so, we setup the "inactive timer" to fire at
> > the "0 lag time". When the "inactive timer" fires, the task
> > utilization is removed from the runqueue's active utilization. If
> > the task wakes up again on the same runqueue before the "0 lag
> > time", the active utilization must not be changed and the "inactive
> > timer" must be cancelled. If the task wakes up again on a different
> > runqueue before the "0 lag time", then the task's utilization must
> > be removed from the previous runqueue's active utilization and must
> > be added to the new runqueue's active utilization.
> > In order to avoid races between a task waking up on a runqueue
> > while the "inactive timer" is running on a different CPU, the
> > "dl_non_contending" flag is used to indicate that a task is not on
> > a runqueue but is active (so, the flag is set when the task blocks
> > and is cleared when the "inactive timer" fires or when the task
> > wakes up). "
> > (if this is ok, where can I add this comment?)
> >   
> 
> Thanks for this Luca. Not sure it adds much to your text above, but we
> might want to consider adding something like below?
> 
> --->8---  
>1st enqueue   +--+
>  |  |
>+>+ ACTIVEcontending |
>| |  |
>| ++--+--+
>|  |  ^
>|  |  |
>   ++---+  |  |
>   || dequeue  |  |  wakeup before
>   |INACTIVE|  |  |  0 lag time
>   ||  |  |
>   ++---+  |  |
>^  |  |
>|  V  |
>| ++--+--+
>| |  |
>+-+ ACTIVEnonCONTEND |
>  |  |
> 0 lag time   +--+
> elapsed
> --->8---  

I am not sure if introducing the "active non contending" name is a good
idea or not (see my previous email), but I am not the best person to
decide this... If people like this figure, I am more than happy to add
it :)
(but then maybe we can change "0 lag time elapsed" with "inactive timer
fires" and we can display in the figure the state of the
"dl_non_contending"/"inactive_timer_armed" flag)


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:
[...]
> > > + } else {
> > > + /*
> > > +  * Since "dl_non_contending" is not set, the
> > > +  * task's utilization has already been removed
> > > from
> > > +  * active utilization (either when the task
> > > blocked,
> > > +  * when the "inactive timer" fired).
> > > +  * So, add it back.
> > > +  */
> > > + add_running_bw(dl_se->dl_bw, dl_rq);
> > > + }
> > > +}
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, would it be ok to add a
> textual description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active
> utilization when the task becomes active (is enqueued in the
> runqueue), and is removed when the task becomes inactive. A task does
> not become immediately inactive when it blocks, but becomes inactive
> at the so called "0 lag time"; so, we setup the "inactive timer" to
> fire at the "0 lag time". When the "inactive timer" fires, the task
> utilization is removed from the runqueue's active utilization. If the
> task wakes up again on the same runqueue before the "0 lag time", the
> active utilization must not be changed and the "inactive timer" must
> be cancelled. If the task wakes up again on a different runqueue
> before the "0 lag time", then the task's utilization must be removed
> from the previous runqueue's active utilization and must be added to
> the new runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while
> the "inactive timer" is running on a different CPU, the
> "dl_non_contending" flag is used to indicate that a task is not on a
> runqueue but is active (so, the flag is set when the task blocks and
> is cleared when the "inactive timer" fires or when the task  wakes
> up). "

After re-reading all of this, I realized two things:
1) The comment I added before the definition of the dl_non_contending
   flag in sched.h is confusing. I'll try to rephrase it
2) The "dl_non_contending" name might be part of the issue, here.
   It tries to map concepts from the GRUB paper to the implementation,
   so it is understandable and makes things clear (I hope) for people
   who know the paper... But it is not the best name for people not
   knowing the GRUB paper (and pretending that someone studies the
   paper to understand the code is not good :).
   So, what about "inactive_timer_armed" (or similar)? This would
   immediately clarify what the flag is about... What do you think?
   Would this renaming be useful?



Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Luca Abeni
On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:
[...]
> > > + } else {
> > > + /*
> > > +  * Since "dl_non_contending" is not set, the
> > > +  * task's utilization has already been removed
> > > from
> > > +  * active utilization (either when the task
> > > blocked,
> > > +  * when the "inactive timer" fired).
> > > +  * So, add it back.
> > > +  */
> > > + add_running_bw(dl_se->dl_bw, dl_rq);
> > > + }
> > > +}
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, would it be ok to add a
> textual description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active
> utilization when the task becomes active (is enqueued in the
> runqueue), and is removed when the task becomes inactive. A task does
> not become immediately inactive when it blocks, but becomes inactive
> at the so called "0 lag time"; so, we setup the "inactive timer" to
> fire at the "0 lag time". When the "inactive timer" fires, the task
> utilization is removed from the runqueue's active utilization. If the
> task wakes up again on the same runqueue before the "0 lag time", the
> active utilization must not be changed and the "inactive timer" must
> be cancelled. If the task wakes up again on a different runqueue
> before the "0 lag time", then the task's utilization must be removed
> from the previous runqueue's active utilization and must be added to
> the new runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while
> the "inactive timer" is running on a different CPU, the
> "dl_non_contending" flag is used to indicate that a task is not on a
> runqueue but is active (so, the flag is set when the task blocks and
> is cleared when the "inactive timer" fires or when the task  wakes
> up). "

After re-reading all of this, I realized two things:
1) The comment I added before the definition of the dl_non_contending
   flag in sched.h is confusing. I'll try to rephrase it
2) The "dl_non_contending" name might be part of the issue, here.
   It tries to map concepts from the GRUB paper to the implementation,
   so it is understandable and makes things clear (I hope) for people
   who know the paper... But it is not the best name for people not
   knowing the GRUB paper (and pretending that someone studies the
   paper to understand the code is not good :).
   So, what about "inactive_timer_armed" (or similar)? This would
   immediately clarify what the flag is about... What do you think?
   Would this renaming be useful?



Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Juri Lelli
On 24/03/17 22:47, Luca Abeni wrote:
> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> > 

[...]

> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.
> 
> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is

Is enqueued for the first time on a new period, maybe? It seems to be
contradictory w.r.t. what below (if wakeup before 0 lag time) otherwise.

> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).
> "
> (if this is ok, where can I add this comment?)
> 

Thanks for this Luca. Not sure it adds much to your text above, but we
might want to consider adding something like below?

--->8---
   1st enqueue   +--+
 |  |
   +>+ ACTIVEcontending |
   | |  |
   | ++--+--+
   |  |  ^
   |  |  |
  ++---+  |  |
  || dequeue  |  |  wakeup before
  |INACTIVE|  |  |  0 lag time
  ||  |  |
  ++---+  |  |
   ^  |  |
   |  V  |
   | ++--+--+
   | |  |
   +-+ ACTIVEnonCONTEND |
 |  |
0 lag time   +--+
elapsed
--->8---

Thanks,

- Juri


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Juri Lelli
On 24/03/17 22:47, Luca Abeni wrote:
> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> > 

[...]

> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.
> 
> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is

Is enqueued for the first time on a new period, maybe? It seems to be
contradictory w.r.t. what below (if wakeup before 0 lag time) otherwise.

> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).
> "
> (if this is ok, where can I add this comment?)
> 

Thanks for this Luca. Not sure it adds much to your text above, but we
might want to consider adding something like below?

--->8---
   1st enqueue   +--+
 |  |
   +>+ ACTIVEcontending |
   | |  |
   | ++--+--+
   |  |  ^
   |  |  |
  ++---+  |  |
  || dequeue  |  |  wakeup before
  |INACTIVE|  |  |  0 lag time
  ||  |  |
  ++---+  |  |
   ^  |  |
   |  V  |
   | ++--+--+
   | |  |
   +-+ ACTIVEnonCONTEND |
 |  |
0 lag time   +--+
elapsed
--->8---

Thanks,

- Juri


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-26 Thread luca abeni
Hi Mathieu,

On Sun, 26 Mar 2017 11:32:59 -0600
Mathieu Poirier  wrote:
[...]
> > +   task_rq_unlock(rq, p, );
> > +   put_task_struct(p);
> > +
> > +   return HRTIMER_NORESTART;
> > +}
> > +
> > +void init_inactive_task_timer(struct sched_dl_entity *dl_se)  
> 
> To be consistent with the other DL related functions:
> 
> s/init_inactive_task_timer(...)/init_dl_inactive_task_timer(...)

Thanks; I'll change the name of this function.


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-26 Thread luca abeni
Hi Mathieu,

On Sun, 26 Mar 2017 11:32:59 -0600
Mathieu Poirier  wrote:
[...]
> > +   task_rq_unlock(rq, p, );
> > +   put_task_struct(p);
> > +
> > +   return HRTIMER_NORESTART;
> > +}
> > +
> > +void init_inactive_task_timer(struct sched_dl_entity *dl_se)  
> 
> To be consistent with the other DL related functions:
> 
> s/init_inactive_task_timer(...)/init_dl_inactive_task_timer(...)

Thanks; I'll change the name of this function.


Thanks,
Luca


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-26 Thread Mathieu Poirier
On 23 March 2017 at 21:52, luca abeni  wrote:
> From: Luca Abeni 
>
> This patch implements a more theoretically sound algorithm for
> tracking active utilization: instead of decreasing it when a
> task blocks, use a timer (the "inactive timer", named after the
> "Inactive" task state of the GRUB algorithm) to decrease the
> active utilization at the so called "0-lag time".
>
> Signed-off-by: Luca Abeni 
> Tested-by: Claudio Scordino 
> Tested-by: Daniel Bristot de Oliveira 
> ---
>  include/linux/sched.h   |  17 
>  kernel/sched/core.c |   3 +
>  kernel/sched/deadline.c | 208 
> 
>  kernel/sched/sched.h|   2 +
>  4 files changed, 215 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d67eee8..952cac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -445,16 +445,33 @@ struct sched_dl_entity {
>  *
>  * @dl_yielded tells if task gave up the CPU before consuming
>  * all its available runtime during the last job.
> +*
> +* @dl_non_contending tells if task is inactive while still
> +* contributing to the active utilization. In other words, it
> +* indicates if the inactive timer has been armed and its handler
> +* has not been executed yet. This flag is useful to avoid race
> +* conditions between the inactive timer handler and the wakeup
> +* code.
>  */
> int dl_throttled;
> int dl_boosted;
> int dl_yielded;
> +   int dl_non_contending;
>
> /*
>  * Bandwidth enforcement timer. Each -deadline task has its
>  * own bandwidth to be enforced, thus we need one timer per task.
>  */
> struct hrtimer  dl_timer;
> +
> +   /*
> +* Inactive timer, responsible for decreasing the active utilization
> +* at the "0-lag time". When a -deadline task blocks, it contributes
> +* to GRUB's active utilization until the "0-lag time", hence a
> +* timer is needed to decrease the active utilization at the correct
> +* time.
> +*/
> +   struct hrtimer inactive_timer;
>  };
>
>  union rcu_special {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6d6cad9..bf0b0b9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2165,6 +2165,7 @@ void __dl_clear_params(struct task_struct *p)
>
> dl_se->dl_throttled = 0;
> dl_se->dl_yielded = 0;
> +   dl_se->dl_non_contending = 0;
>  }
>
>  /*
> @@ -2196,6 +2197,7 @@ static void __sched_fork(unsigned long clone_flags, 
> struct task_struct *p)
>
> RB_CLEAR_NODE(>dl.rb_node);
> init_dl_task_timer(>dl);
> +   init_inactive_task_timer(>dl);
> __dl_clear_params(p);
>
> INIT_LIST_HEAD(>rt.run_list);
> @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>!__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> __dl_clear(dl_b, p->dl.dl_bw);
> __dl_add(dl_b, new_bw);
> +   dl_change_utilization(p, new_bw);
> err = 0;
> } else if (!dl_policy(policy) && task_has_dl_policy(p)) {
> __dl_clear(dl_b, p->dl.dl_bw);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cef9adb..86aed82 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
> dl_rq->running_bw = 0;
>  }
>
> +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> +{
> +   if (!task_on_rq_queued(p)) {
> +   struct rq *rq = task_rq(p);
> +
> +   if (p->dl.dl_non_contending) {
> +   sub_running_bw(p->dl.dl_bw, >dl);
> +   p->dl.dl_non_contending = 0;
> +   /*
> +* If the timer handler is currently running and the
> +* timer cannot be cancelled, inactive_task_timer()
> +* will see that dl_not_contending is not set, and
> +* will not touch the rq's active utilization,
> +* so we are still safe.
> +*/
> +   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> +   put_task_struct(p);
> +   }
> +   }
> +}
> +
> +static void task_non_contending(struct task_struct *p)
> +{
> +   struct sched_dl_entity *dl_se = >dl;
> +   struct hrtimer *timer = _se->inactive_timer;
> +   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-26 Thread Mathieu Poirier
On 23 March 2017 at 21:52, luca abeni  wrote:
> From: Luca Abeni 
>
> This patch implements a more theoretically sound algorithm for
> tracking active utilization: instead of decreasing it when a
> task blocks, use a timer (the "inactive timer", named after the
> "Inactive" task state of the GRUB algorithm) to decrease the
> active utilization at the so called "0-lag time".
>
> Signed-off-by: Luca Abeni 
> Tested-by: Claudio Scordino 
> Tested-by: Daniel Bristot de Oliveira 
> ---
>  include/linux/sched.h   |  17 
>  kernel/sched/core.c |   3 +
>  kernel/sched/deadline.c | 208 
> 
>  kernel/sched/sched.h|   2 +
>  4 files changed, 215 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d67eee8..952cac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -445,16 +445,33 @@ struct sched_dl_entity {
>  *
>  * @dl_yielded tells if task gave up the CPU before consuming
>  * all its available runtime during the last job.
> +*
> +* @dl_non_contending tells if task is inactive while still
> +* contributing to the active utilization. In other words, it
> +* indicates if the inactive timer has been armed and its handler
> +* has not been executed yet. This flag is useful to avoid race
> +* conditions between the inactive timer handler and the wakeup
> +* code.
>  */
> int dl_throttled;
> int dl_boosted;
> int dl_yielded;
> +   int dl_non_contending;
>
> /*
>  * Bandwidth enforcement timer. Each -deadline task has its
>  * own bandwidth to be enforced, thus we need one timer per task.
>  */
> struct hrtimer  dl_timer;
> +
> +   /*
> +* Inactive timer, responsible for decreasing the active utilization
> +* at the "0-lag time". When a -deadline task blocks, it contributes
> +* to GRUB's active utilization until the "0-lag time", hence a
> +* timer is needed to decrease the active utilization at the correct
> +* time.
> +*/
> +   struct hrtimer inactive_timer;
>  };
>
>  union rcu_special {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6d6cad9..bf0b0b9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2165,6 +2165,7 @@ void __dl_clear_params(struct task_struct *p)
>
> dl_se->dl_throttled = 0;
> dl_se->dl_yielded = 0;
> +   dl_se->dl_non_contending = 0;
>  }
>
>  /*
> @@ -2196,6 +2197,7 @@ static void __sched_fork(unsigned long clone_flags, 
> struct task_struct *p)
>
> RB_CLEAR_NODE(>dl.rb_node);
> init_dl_task_timer(>dl);
> +   init_inactive_task_timer(>dl);
> __dl_clear_params(p);
>
> INIT_LIST_HEAD(>rt.run_list);
> @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>!__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> __dl_clear(dl_b, p->dl.dl_bw);
> __dl_add(dl_b, new_bw);
> +   dl_change_utilization(p, new_bw);
> err = 0;
> } else if (!dl_policy(policy) && task_has_dl_policy(p)) {
> __dl_clear(dl_b, p->dl.dl_bw);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cef9adb..86aed82 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
> dl_rq->running_bw = 0;
>  }
>
> +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> +{
> +   if (!task_on_rq_queued(p)) {
> +   struct rq *rq = task_rq(p);
> +
> +   if (p->dl.dl_non_contending) {
> +   sub_running_bw(p->dl.dl_bw, >dl);
> +   p->dl.dl_non_contending = 0;
> +   /*
> +* If the timer handler is currently running and the
> +* timer cannot be cancelled, inactive_task_timer()
> +* will see that dl_not_contending is not set, and
> +* will not touch the rq's active utilization,
> +* so we are still safe.
> +*/
> +   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> +   put_task_struct(p);
> +   }
> +   }
> +}
> +
> +static void task_non_contending(struct task_struct *p)
> +{
> +   struct sched_dl_entity *dl_se = >dl;
> +   struct hrtimer *timer = _se->inactive_timer;
> +   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> +   struct rq *rq = rq_of_dl_rq(dl_rq);
> +   s64 zerolag_time;
> +
> +   /*
> +* If this is a non-deadline 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Steven Rostedt
On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:

> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is
> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).

Sure, the above is great if you never want anyone to read it ;)

Can you please break it up a little. My head starts to spin by the
third line down.

-- Steve

> "
> (if this is ok, where can I add this comment?)


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Steven Rostedt
On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni  wrote:

> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is
> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).

Sure, the above is great if you never want anyone to read it ;)

Can you please break it up a little. My head starts to spin by the
third line down.

-- Steve

> "
> (if this is ok, where can I add this comment?)


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread luca abeni
Hi Peter,

On Fri, 24 Mar 2017 14:20:41 +0100
Peter Zijlstra  wrote:

> On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index d67eee8..952cac8 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> >  *
> >  * @dl_yielded tells if task gave up the CPU before
> > consuming
> >  * all its available runtime during the last job.
> > +*
> > +* @dl_non_contending tells if task is inactive while still
> > +* contributing to the active utilization. In other words,
> > it
> > +* indicates if the inactive timer has been armed and its
> > handler
> > +* has not been executed yet. This flag is useful to avoid
> > race
> > +* conditions between the inactive timer handler and the
> > wakeup
> > +* code.
> >  */
> > int dl_throttled;
> > int dl_boosted;
> > int dl_yielded;
> > +   int dl_non_contending;  
> 
> We should maybe look into making those bits :1, not something for this
> patch though;

Yes, grouping all the flags in a single field was my intention too... I
planned to submit a patch to do this after merging the reclaiming
patches... But maybe it is better to do this first :) 


> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cef9adb..86aed82 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > *dl_rq) dl_rq->running_bw = 0;
> >  }
> >  
> > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > +{
> > +   if (!task_on_rq_queued(p)) {
> > +   struct rq *rq = task_rq(p);
> > +
> > +   if (p->dl.dl_non_contending) {
> > +   sub_running_bw(p->dl.dl_bw, >dl);
> > +   p->dl.dl_non_contending = 0;
> > +   /*
> > +* If the timer handler is currently
> > running and the
> > +* timer cannot be cancelled,
> > inactive_task_timer()
> > +* will see that dl_not_contending is not
> > set, and
> > +* will not touch the rq's active
> > utilization,
> > +* so we are still safe.
> > +*/
> > +   if
> > (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> > +   put_task_struct(p);
> > +   }
> > +   }
> > +}  
> 
> If we rearrange that slightly we can avoid the double indent:
[...]
Ok; I'll rework the code in this way on Monday.

[...]
> > +   if (hrtimer_try_to_cancel(_se->inactive_timer)
> > == 1)
> > +   put_task_struct(dl_task_of(dl_se));
> > +   dl_se->dl_non_contending = 0;  
> 
> This had me worried for a little while as being a use-after-free, but
> this put_task_struct() cannot be the last in this case. Might be
> worth a comment.

Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
hrtimer_try_to_cancel()?

> 
> > +   } else {
> > +   /*
> > +* Since "dl_non_contending" is not set, the
> > +* task's utilization has already been removed from
> > +* active utilization (either when the task
> > blocked,
> > +* when the "inactive timer" fired).
> > +* So, add it back.
> > +*/
> > +   add_running_bw(dl_se->dl_bw, dl_rq);
> > +   }
> > +}  
> 
> In general I feel it would be nice to have a state diagram included
> somewhere near these two functions. It would be nice to not have to
> dig out the PDF every time.

Ok... Since I am not good at ascii art, would it be ok to add a textual
description? If yes, I'll add a comment like:
"
The utilization of a task is added to the runqueue's active utilization
when the task becomes active (is enqueued in the runqueue), and is
removed when the task becomes inactive. A task does not become
immediately inactive when it blocks, but becomes inactive at the so
called "0 lag time"; so, we setup the "inactive timer" to fire at the
"0 lag time". When the "inactive timer" fires, the task utilization is
removed from the runqueue's active utilization. If the task wakes up
again on the same runqueue before the "0 lag time", the active
utilization must not be changed and the "inactive timer" must be
cancelled. If the task wakes up again on a different runqueue before
the "0 lag time", then the task's utilization must be removed from the
previous runqueue's active utilization and must be added to the new
runqueue's active utilization.
In order to avoid races between a task waking up on a runqueue while the
"inactive timer" is running on a different CPU, the "dl_non_contending"
flag is used to indicate that a task is not on a runqueue but is active
(so, the flag is set when the 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread luca abeni
Hi Peter,

On Fri, 24 Mar 2017 14:20:41 +0100
Peter Zijlstra  wrote:

> On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index d67eee8..952cac8 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> >  *
> >  * @dl_yielded tells if task gave up the CPU before
> > consuming
> >  * all its available runtime during the last job.
> > +*
> > +* @dl_non_contending tells if task is inactive while still
> > +* contributing to the active utilization. In other words,
> > it
> > +* indicates if the inactive timer has been armed and its
> > handler
> > +* has not been executed yet. This flag is useful to avoid
> > race
> > +* conditions between the inactive timer handler and the
> > wakeup
> > +* code.
> >  */
> > int dl_throttled;
> > int dl_boosted;
> > int dl_yielded;
> > +   int dl_non_contending;  
> 
> We should maybe look into making those bits :1, not something for this
> patch though;

Yes, grouping all the flags in a single field was my intention too... I
planned to submit a patch to do this after merging the reclaiming
patches... But maybe it is better to do this first :) 


> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cef9adb..86aed82 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > *dl_rq) dl_rq->running_bw = 0;
> >  }
> >  
> > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > +{
> > +   if (!task_on_rq_queued(p)) {
> > +   struct rq *rq = task_rq(p);
> > +
> > +   if (p->dl.dl_non_contending) {
> > +   sub_running_bw(p->dl.dl_bw, >dl);
> > +   p->dl.dl_non_contending = 0;
> > +   /*
> > +* If the timer handler is currently
> > running and the
> > +* timer cannot be cancelled,
> > inactive_task_timer()
> > +* will see that dl_not_contending is not
> > set, and
> > +* will not touch the rq's active
> > utilization,
> > +* so we are still safe.
> > +*/
> > +   if
> > (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> > +   put_task_struct(p);
> > +   }
> > +   }
> > +}  
> 
> If we rearrange that slightly we can avoid the double indent:
[...]
Ok; I'll rework the code in this way on Monday.

[...]
> > +   if (hrtimer_try_to_cancel(_se->inactive_timer)
> > == 1)
> > +   put_task_struct(dl_task_of(dl_se));
> > +   dl_se->dl_non_contending = 0;  
> 
> This had me worried for a little while as being a use-after-free, but
> this put_task_struct() cannot be the last in this case. Might be
> worth a comment.

Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
hrtimer_try_to_cancel()?

> 
> > +   } else {
> > +   /*
> > +* Since "dl_non_contending" is not set, the
> > +* task's utilization has already been removed from
> > +* active utilization (either when the task
> > blocked,
> > +* when the "inactive timer" fired).
> > +* So, add it back.
> > +*/
> > +   add_running_bw(dl_se->dl_bw, dl_rq);
> > +   }
> > +}  
> 
> In general I feel it would be nice to have a state diagram included
> somewhere near these two functions. It would be nice to not have to
> dig out the PDF every time.

Ok... Since I am not good at ascii art, would it be ok to add a textual
description? If yes, I'll add a comment like:
"
The utilization of a task is added to the runqueue's active utilization
when the task becomes active (is enqueued in the runqueue), and is
removed when the task becomes inactive. A task does not become
immediately inactive when it blocks, but becomes inactive at the so
called "0 lag time"; so, we setup the "inactive timer" to fire at the
"0 lag time". When the "inactive timer" fires, the task utilization is
removed from the runqueue's active utilization. If the task wakes up
again on the same runqueue before the "0 lag time", the active
utilization must not be changed and the "inactive timer" must be
cancelled. If the task wakes up again on a different runqueue before
the "0 lag time", then the task's utilization must be removed from the
previous runqueue's active utilization and must be added to the new
runqueue's active utilization.
In order to avoid races between a task waking up on a runqueue while the
"inactive timer" is running on a different CPU, the "dl_non_contending"
flag is used to indicate that a task is not on a runqueue but is active
(so, the flag is set when the task blocks and is 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Peter Zijlstra
On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>  !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
>   __dl_clear(dl_b, p->dl.dl_bw);
>   __dl_add(dl_b, new_bw);
> + dl_change_utilization(p, new_bw);
>   err = 0;

Every time I see that I want to do this..


---
 kernel/sched/core.c | 4 ++--
 kernel/sched/deadline.c | 2 +-
 kernel/sched/sched.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b31fc05a0f1..b845ee4b3e55 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2512,11 +2512,11 @@ static int dl_overflow(struct task_struct *p, int 
policy,
err = 0;
} else if (dl_policy(policy) && task_has_dl_policy(p) &&
   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
-   __dl_clear(dl_b, p->dl.dl_bw);
+   __dl_sub(dl_b, p->dl.dl_bw);
__dl_add(dl_b, new_bw);
err = 0;
} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
-   __dl_clear(dl_b, p->dl.dl_bw);
+   __dl_sub(dl_b, p->dl.dl_bw);
err = 0;
}
raw_spin_unlock(_b->lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a2ce59015642..229660088138 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1695,7 +1695,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 * until we complete the update.
 */
raw_spin_lock(_dl_b->lock);
-   __dl_clear(src_dl_b, p->dl.dl_bw);
+   __dl_sub(src_dl_b, p->dl.dl_bw);
raw_spin_unlock(_dl_b->lock);
}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5cbf92214ad8..1a521324ecee 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -226,7 +226,7 @@ struct dl_bw {
 };
 
 static inline
-void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
+void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw)
 {
dl_b->total_bw -= tsk_bw;
 }


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Peter Zijlstra
On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> @@ -2518,6 +2520,7 @@ static int dl_overflow(struct task_struct *p, int 
> policy,
>  !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
>   __dl_clear(dl_b, p->dl.dl_bw);
>   __dl_add(dl_b, new_bw);
> + dl_change_utilization(p, new_bw);
>   err = 0;

Every time I see that I want to do this..


---
 kernel/sched/core.c | 4 ++--
 kernel/sched/deadline.c | 2 +-
 kernel/sched/sched.h| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b31fc05a0f1..b845ee4b3e55 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2512,11 +2512,11 @@ static int dl_overflow(struct task_struct *p, int 
policy,
err = 0;
} else if (dl_policy(policy) && task_has_dl_policy(p) &&
   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
-   __dl_clear(dl_b, p->dl.dl_bw);
+   __dl_sub(dl_b, p->dl.dl_bw);
__dl_add(dl_b, new_bw);
err = 0;
} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
-   __dl_clear(dl_b, p->dl.dl_bw);
+   __dl_sub(dl_b, p->dl.dl_bw);
err = 0;
}
raw_spin_unlock(_b->lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a2ce59015642..229660088138 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1695,7 +1695,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 * until we complete the update.
 */
raw_spin_lock(_dl_b->lock);
-   __dl_clear(src_dl_b, p->dl.dl_bw);
+   __dl_sub(src_dl_b, p->dl.dl_bw);
raw_spin_unlock(_dl_b->lock);
}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5cbf92214ad8..1a521324ecee 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -226,7 +226,7 @@ struct dl_bw {
 };
 
 static inline
-void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
+void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw)
 {
dl_b->total_bw -= tsk_bw;
 }


Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Peter Zijlstra
On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d67eee8..952cac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -445,16 +445,33 @@ struct sched_dl_entity {
>*
>* @dl_yielded tells if task gave up the CPU before consuming
>* all its available runtime during the last job.
> +  *
> +  * @dl_non_contending tells if task is inactive while still
> +  * contributing to the active utilization. In other words, it
> +  * indicates if the inactive timer has been armed and its handler
> +  * has not been executed yet. This flag is useful to avoid race
> +  * conditions between the inactive timer handler and the wakeup
> +  * code.
>*/
>   int dl_throttled;
>   int dl_boosted;
>   int dl_yielded;
> + int dl_non_contending;

We should maybe look into making those bits :1, not something for this
patch though;


> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cef9adb..86aed82 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
>   dl_rq->running_bw = 0;
>  }
>  
> +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> +{
> + if (!task_on_rq_queued(p)) {
> + struct rq *rq = task_rq(p);
> +
> + if (p->dl.dl_non_contending) {
> + sub_running_bw(p->dl.dl_bw, >dl);
> + p->dl.dl_non_contending = 0;
> + /*
> +  * If the timer handler is currently running and the
> +  * timer cannot be cancelled, inactive_task_timer()
> +  * will see that dl_not_contending is not set, and
> +  * will not touch the rq's active utilization,
> +  * so we are still safe.
> +  */
> + if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> + put_task_struct(p);
> + }
> + }
> +}

If we rearrange that slightly we can avoid the double indent:


--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -67,23 +67,23 @@ void sub_running_bw(u64 dl_bw, struct dl
 
 void dl_change_utilization(struct task_struct *p, u64 new_bw)
 {
-   if (!task_on_rq_queued(p)) {
-   struct rq *rq = task_rq(p);
+   if (task_on_rq_queued(p))
+   return;
 
-   if (p->dl.dl_non_contending) {
-   sub_running_bw(p->dl.dl_bw, >dl);
-   p->dl.dl_non_contending = 0;
-   /*
-* If the timer handler is currently running and the
-* timer cannot be cancelled, inactive_task_timer()
-* will see that dl_not_contending is not set, and
-* will not touch the rq's active utilization,
-* so we are still safe.
-*/
-   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
-   put_task_struct(p);
-   }
-   }
+   if (!p->dl.dl_non_contending)
+   return;
+
+   sub_running_bw(p->dl.dl_bw, _rq(p)->dl);
+   p->dl.dl_non_contending = 0;
+   /*
+* If the timer handler is currently running and the
+* timer cannot be cancelled, inactive_task_timer()
+* will see that dl_not_contending is not set, and
+* will not touch the rq's active utilization,
+* so we are still safe.
+*/
+   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
+   put_task_struct(p);
 }
 
 static void task_non_contending(struct task_struct *p)

> +
> +static void task_non_contending(struct task_struct *p)
> +{

> +}
> +
> +static void task_contending(struct sched_dl_entity *dl_se)
> +{
> + struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> +
> + /*
> +  * If this is a non-deadline task that has been boosted,
> +  * do nothing
> +  */
> + if (dl_se->dl_runtime == 0)
> + return;
> +
> + if (dl_se->dl_non_contending) {
> + /*
> +  * If the timer handler is currently running and the
> +  * timer cannot be cancelled, inactive_task_timer()
> +  * will see that dl_not_contending is not set, and
> +  * will not touch the rq's active utilization,
> +  * so we are still safe.
> +  */
> + if (hrtimer_try_to_cancel(_se->inactive_timer) == 1)
> + put_task_struct(dl_task_of(dl_se));
> + dl_se->dl_non_contending = 0;

This had me worried for a little while as being a use-after-free, but
this 

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-24 Thread Peter Zijlstra
On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d67eee8..952cac8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -445,16 +445,33 @@ struct sched_dl_entity {
>*
>* @dl_yielded tells if task gave up the CPU before consuming
>* all its available runtime during the last job.
> +  *
> +  * @dl_non_contending tells if task is inactive while still
> +  * contributing to the active utilization. In other words, it
> +  * indicates if the inactive timer has been armed and its handler
> +  * has not been executed yet. This flag is useful to avoid race
> +  * conditions between the inactive timer handler and the wakeup
> +  * code.
>*/
>   int dl_throttled;
>   int dl_boosted;
>   int dl_yielded;
> + int dl_non_contending;

We should maybe look into making those bits :1, not something for this
patch though;


> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cef9adb..86aed82 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
>   dl_rq->running_bw = 0;
>  }
>  
> +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> +{
> + if (!task_on_rq_queued(p)) {
> + struct rq *rq = task_rq(p);
> +
> + if (p->dl.dl_non_contending) {
> + sub_running_bw(p->dl.dl_bw, >dl);
> + p->dl.dl_non_contending = 0;
> + /*
> +  * If the timer handler is currently running and the
> +  * timer cannot be cancelled, inactive_task_timer()
> +  * will see that dl_not_contending is not set, and
> +  * will not touch the rq's active utilization,
> +  * so we are still safe.
> +  */
> + if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
> + put_task_struct(p);
> + }
> + }
> +}

If we rearrange that slightly we can avoid the double indent:


--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -67,23 +67,23 @@ void sub_running_bw(u64 dl_bw, struct dl
 
 void dl_change_utilization(struct task_struct *p, u64 new_bw)
 {
-   if (!task_on_rq_queued(p)) {
-   struct rq *rq = task_rq(p);
+   if (task_on_rq_queued(p))
+   return;
 
-   if (p->dl.dl_non_contending) {
-   sub_running_bw(p->dl.dl_bw, >dl);
-   p->dl.dl_non_contending = 0;
-   /*
-* If the timer handler is currently running and the
-* timer cannot be cancelled, inactive_task_timer()
-* will see that dl_not_contending is not set, and
-* will not touch the rq's active utilization,
-* so we are still safe.
-*/
-   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
-   put_task_struct(p);
-   }
-   }
+   if (!p->dl.dl_non_contending)
+   return;
+
+   sub_running_bw(p->dl.dl_bw, _rq(p)->dl);
+   p->dl.dl_non_contending = 0;
+   /*
+* If the timer handler is currently running and the
+* timer cannot be cancelled, inactive_task_timer()
+* will see that dl_not_contending is not set, and
+* will not touch the rq's active utilization,
+* so we are still safe.
+*/
+   if (hrtimer_try_to_cancel(>dl.inactive_timer) == 1)
+   put_task_struct(p);
 }
 
 static void task_non_contending(struct task_struct *p)

> +
> +static void task_non_contending(struct task_struct *p)
> +{

> +}
> +
> +static void task_contending(struct sched_dl_entity *dl_se)
> +{
> + struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> +
> + /*
> +  * If this is a non-deadline task that has been boosted,
> +  * do nothing
> +  */
> + if (dl_se->dl_runtime == 0)
> + return;
> +
> + if (dl_se->dl_non_contending) {
> + /*
> +  * If the timer handler is currently running and the
> +  * timer cannot be cancelled, inactive_task_timer()
> +  * will see that dl_not_contending is not set, and
> +  * will not touch the rq's active utilization,
> +  * so we are still safe.
> +  */
> + if (hrtimer_try_to_cancel(_se->inactive_timer) == 1)
> + put_task_struct(dl_task_of(dl_se));
> + dl_se->dl_non_contending = 0;

This had me worried for a little while as being a use-after-free, but
this