Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-08 Thread Peter Zijlstra
On Tue, Aug 08, 2017 at 03:56:26PM +0200, Vincent Guittot wrote:
> 
> I don't think that IRQ tracking patch is working.
> update_irq_load_avg(rq->clock, cpu_of(rq), rq, 1); is called in
> update_rq_clock_task() which is never called in irq context. In order
> to use PELT for tracking irq and paravirt, we should call
> update_irq_load_avg() for every context switch between irq/paravirt
> and task which will probably be too heavy. Nevertheless, we can
> 

Right, realized the same much later yesterday...


Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-08 Thread Peter Zijlstra
On Tue, Aug 08, 2017 at 03:56:26PM +0200, Vincent Guittot wrote:
> 
> I don't think that IRQ tracking patch is working.
> update_irq_load_avg(rq->clock, cpu_of(rq), rq, 1); is called in
> update_rq_clock_task() which is never called in irq context. In order
> to use PELT for tracking irq and paravirt, we should call
> update_irq_load_avg() for every context switch between irq/paravirt
> and task which will probably be too heavy. Nevertheless, we can
> 

Right, realized the same much later yesterday...


Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-08 Thread Vincent Guittot
On 7 August 2017 at 18:44, Peter Zijlstra  wrote:
> On Fri, Aug 04, 2017 at 03:40:21PM +0200, Vincent Guittot wrote:
>
>> There were several comments on v1:
>> - As raised by Peter for v1, if IRQ time is taken into account in
>>   rt_avg, it will not be accounted in rq->clock_task. This means that cfs
>>   utilization is not affected by some extra contributions or decays
>>   because of IRQ.
>
> Right.
>
>> - Regading the sync of rt and cfs utilization, both cfs and rt use the same
>>   rq->clock_task. Then, we have the same issue than cfs regarding blocked 
>> value.
>>   The utilization of idle cfs/rt rqs are not updated regularly but only when 
>> a
>>   load_balance is triggered (more precisely a call to 
>> update_blocked_average).
>>   I'd like to fix this issue for both cfs and rt with a separate patch that
>>   will ensure that utilization (and load) are updated regularly even for
>>   idle CPUs
>
> Yeah, that needs help.
>
>> - One last open question is the location of rt utilization function in fair.c
>>   file. PELT related funtions should probably move in a dedicated pelt.c 
>> file.
>>   This would also help to address one comment about having a place to update
>>   metrics of NOHZ idle CPUs. Thought ?
>
> Probably, but I have a bunch of patches lined up changing that code, so
> lets not do that now.

ok. I can rebase and move the code once your patches will be there

>
> In any case, would something like the attached patches make sense? It
> completely replaces rt_avg with separate IRQ,RT and DL tracking.

That would be nice if we can replace rt_avg by something that has the
same dynamic as PELT.

The DL patch looks fine but can't we rely on deadline running
bandwidth to get the figures instead ?

I don't think that IRQ tracking patch is working.
update_irq_load_avg(rq->clock, cpu_of(rq), rq, 1); is called in
update_rq_clock_task() which is never called in irq context. In order
to use PELT for tracking irq and paravirt, we should call
update_irq_load_avg() for every context switch between irq/paravirt
and task which will probably be too heavy. Nevertheless, we can

Because PELT is cpu invariant,  the used value must now be subtracted
to cpu_capacity_orig of the local cpu in scale_rt_capacity,

>
>


Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-08 Thread Vincent Guittot
On 7 August 2017 at 18:44, Peter Zijlstra  wrote:
> On Fri, Aug 04, 2017 at 03:40:21PM +0200, Vincent Guittot wrote:
>
>> There were several comments on v1:
>> - As raised by Peter for v1, if IRQ time is taken into account in
>>   rt_avg, it will not be accounted in rq->clock_task. This means that cfs
>>   utilization is not affected by some extra contributions or decays
>>   because of IRQ.
>
> Right.
>
>> - Regading the sync of rt and cfs utilization, both cfs and rt use the same
>>   rq->clock_task. Then, we have the same issue than cfs regarding blocked 
>> value.
>>   The utilization of idle cfs/rt rqs are not updated regularly but only when 
>> a
>>   load_balance is triggered (more precisely a call to 
>> update_blocked_average).
>>   I'd like to fix this issue for both cfs and rt with a separate patch that
>>   will ensure that utilization (and load) are updated regularly even for
>>   idle CPUs
>
> Yeah, that needs help.
>
>> - One last open question is the location of rt utilization function in fair.c
>>   file. PELT related funtions should probably move in a dedicated pelt.c 
>> file.
>>   This would also help to address one comment about having a place to update
>>   metrics of NOHZ idle CPUs. Thought ?
>
> Probably, but I have a bunch of patches lined up changing that code, so
> lets not do that now.

ok. I can rebase and move the code once your patches will be there

>
> In any case, would something like the attached patches make sense? It
> completely replaces rt_avg with separate IRQ,RT and DL tracking.

That would be nice if we can replace rt_avg by something that has the
same dynamic as PELT.

The DL patch looks fine but can't we rely on deadline running
bandwidth to get the figures instead ?

I don't think that IRQ tracking patch is working.
update_irq_load_avg(rq->clock, cpu_of(rq), rq, 1); is called in
update_rq_clock_task() which is never called in irq context. In order
to use PELT for tracking irq and paravirt, we should call
update_irq_load_avg() for every context switch between irq/paravirt
and task which will probably be too heavy. Nevertheless, we can

Because PELT is cpu invariant,  the used value must now be subtracted
to cpu_capacity_orig of the local cpu in scale_rt_capacity,

>
>


Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-07 Thread Peter Zijlstra
On Fri, Aug 04, 2017 at 03:40:21PM +0200, Vincent Guittot wrote:

> There were several comments on v1:
> - As raised by Peter for v1, if IRQ time is taken into account in
>   rt_avg, it will not be accounted in rq->clock_task. This means that cfs
>   utilization is not affected by some extra contributions or decays
>   because of IRQ.  

Right.

> - Regading the sync of rt and cfs utilization, both cfs and rt use the same
>   rq->clock_task. Then, we have the same issue than cfs regarding blocked 
> value.
>   The utilization of idle cfs/rt rqs are not updated regularly but only when a
>   load_balance is triggered (more precisely a call to update_blocked_average).
>   I'd like to fix this issue for both cfs and rt with a separate patch that
>   will ensure that utilization (and load) are updated regularly even for
>   idle CPUs

Yeah, that needs help.

> - One last open question is the location of rt utilization function in fair.c
>   file. PELT related funtions should probably move in a dedicated pelt.c file.
>   This would also help to address one comment about having a place to update
>   metrics of NOHZ idle CPUs. Thought ?

Probably, but I have a bunch of patches lined up changing that code, so
lets not do that now.

In any case, would something like the attached patches make sense? It
completely replaces rt_avg with separate IRQ,RT and DL tracking.


Subject: sched: Add DL utilization tracking
From: Peter Zijlstra 
Date: Mon Aug 7 16:49:48 CEST 2017

Track how much time we spend running DL tasks on avgerage.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/deadline.c |   11 ++-
 kernel/sched/fair.c |   12 
 kernel/sched/sched.h|2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1655,6 +1655,8 @@ static struct sched_dl_entity *pick_next
 	return rb_entry(left, struct sched_dl_entity, rb_node);
 }
 
+extern int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running);
+
 struct task_struct *
 pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
@@ -1702,19 +1704,25 @@ pick_next_task_dl(struct rq *rq, struct
 	p->se.exec_start = rq_clock_task(rq);
 
 	/* Running task will never be pushed. */
-   dequeue_pushable_dl_task(rq, p);
+	dequeue_pushable_dl_task(rq, p);
 
 	if (hrtick_enabled(rq))
 		start_hrtick_dl(rq, p);
 
 	queue_push_tasks(rq);
 
+	if (p) {
+		update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), dl_rq,
+rq->curr->sched_class == _sched_class);
+	}
+
 	return p;
 }
 
 static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
 {
 	update_curr_dl(rq);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), >dl, 1);
 
 	if (on_dl_rq(>dl) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_dl_task(rq, p);
@@ -1723,6 +1731,7 @@ static void put_prev_task_dl(struct rq *
 static void task_tick_dl(struct rq *rq, struct task_struct *p, int queued)
 {
 	update_curr_dl(rq);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), >dl, 1);
 
 	/*
 	 * Even when we have runtime, update_curr_dl() might have resulted in us
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,6 +3017,11 @@ int update_rt_rq_load_avg(u64 now, int c
 	return ___update_load_avg(now, cpu, _rq->avg, 0, running, NULL);
 }
 
+int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running)
+{
+	return ___update_load_avg(now, cpu, _rq->avg, 0, running, NULL);
+}
+
 
 /*
  * Signed add and clamp on underflow.
@@ -3550,6 +3555,11 @@ int update_rt_rq_load_avg(u64 now, int c
 	return 0;
 }
 
+int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running)
+{
+	return 0;
+}
+
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 
@@ -6945,6 +6955,7 @@ static void update_blocked_averages(int
 	}
 
 	update_rt_rq_load_avg(rq_clock_task(rq), cpu, >rt, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu, >dl, 0);
 	rq_unlock_irqrestore(rq, );
 }
 
@@ -7005,6 +7016,7 @@ static inline void update_blocked_averag
 	update_rq_clock(rq);
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true);
 	update_rt_rq_load_avg(rq_clock_task(rq), cpu, >rt, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu, >dl, 0);
 	rq_unlock_irqrestore(rq, );
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -580,6 +580,8 @@ struct dl_rq {
 	 */
 	struct rb_root pushable_dl_tasks_root;
 	struct rb_node *pushable_dl_tasks_leftmost;
+
+	struct sched_avg avg;
 #else
 	struct dl_bw dl_bw;
 #endif
Subject: sched: Add IRQ utilization tracking
From: Peter Zijlstra 
Date: Mon Aug 7 17:22:46 CEST 2017

Track how much time we spend on IRQ...

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/core.c  |6 +-
 kernel/sched/fair.c  |   12 
 kernel/sched/sched.h |2 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ 

Re: [RESEND PATCH v2 1/2] sched/rt: add utilization tracking

2017-08-07 Thread Peter Zijlstra
On Fri, Aug 04, 2017 at 03:40:21PM +0200, Vincent Guittot wrote:

> There were several comments on v1:
> - As raised by Peter for v1, if IRQ time is taken into account in
>   rt_avg, it will not be accounted in rq->clock_task. This means that cfs
>   utilization is not affected by some extra contributions or decays
>   because of IRQ.  

Right.

> - Regading the sync of rt and cfs utilization, both cfs and rt use the same
>   rq->clock_task. Then, we have the same issue than cfs regarding blocked 
> value.
>   The utilization of idle cfs/rt rqs are not updated regularly but only when a
>   load_balance is triggered (more precisely a call to update_blocked_average).
>   I'd like to fix this issue for both cfs and rt with a separate patch that
>   will ensure that utilization (and load) are updated regularly even for
>   idle CPUs

Yeah, that needs help.

> - One last open question is the location of rt utilization function in fair.c
>   file. PELT related funtions should probably move in a dedicated pelt.c file.
>   This would also help to address one comment about having a place to update
>   metrics of NOHZ idle CPUs. Thought ?

Probably, but I have a bunch of patches lined up changing that code, so
lets not do that now.

In any case, would something like the attached patches make sense? It
completely replaces rt_avg with separate IRQ,RT and DL tracking.


Subject: sched: Add DL utilization tracking
From: Peter Zijlstra 
Date: Mon Aug 7 16:49:48 CEST 2017

Track how much time we spend running DL tasks on avgerage.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/deadline.c |   11 ++-
 kernel/sched/fair.c |   12 
 kernel/sched/sched.h|2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1655,6 +1655,8 @@ static struct sched_dl_entity *pick_next
 	return rb_entry(left, struct sched_dl_entity, rb_node);
 }
 
+extern int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running);
+
 struct task_struct *
 pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
@@ -1702,19 +1704,25 @@ pick_next_task_dl(struct rq *rq, struct
 	p->se.exec_start = rq_clock_task(rq);
 
 	/* Running task will never be pushed. */
-   dequeue_pushable_dl_task(rq, p);
+	dequeue_pushable_dl_task(rq, p);
 
 	if (hrtick_enabled(rq))
 		start_hrtick_dl(rq, p);
 
 	queue_push_tasks(rq);
 
+	if (p) {
+		update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), dl_rq,
+rq->curr->sched_class == _sched_class);
+	}
+
 	return p;
 }
 
 static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
 {
 	update_curr_dl(rq);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), >dl, 1);
 
 	if (on_dl_rq(>dl) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_dl_task(rq, p);
@@ -1723,6 +1731,7 @@ static void put_prev_task_dl(struct rq *
 static void task_tick_dl(struct rq *rq, struct task_struct *p, int queued)
 {
 	update_curr_dl(rq);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu_of(rq), >dl, 1);
 
 	/*
 	 * Even when we have runtime, update_curr_dl() might have resulted in us
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,6 +3017,11 @@ int update_rt_rq_load_avg(u64 now, int c
 	return ___update_load_avg(now, cpu, _rq->avg, 0, running, NULL);
 }
 
+int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running)
+{
+	return ___update_load_avg(now, cpu, _rq->avg, 0, running, NULL);
+}
+
 
 /*
  * Signed add and clamp on underflow.
@@ -3550,6 +3555,11 @@ int update_rt_rq_load_avg(u64 now, int c
 	return 0;
 }
 
+int update_dl_rq_load_avg(u64 now, int cpu, struct dl_rq *dl_rq, int running)
+{
+	return 0;
+}
+
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 
@@ -6945,6 +6955,7 @@ static void update_blocked_averages(int
 	}
 
 	update_rt_rq_load_avg(rq_clock_task(rq), cpu, >rt, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu, >dl, 0);
 	rq_unlock_irqrestore(rq, );
 }
 
@@ -7005,6 +7016,7 @@ static inline void update_blocked_averag
 	update_rq_clock(rq);
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true);
 	update_rt_rq_load_avg(rq_clock_task(rq), cpu, >rt, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), cpu, >dl, 0);
 	rq_unlock_irqrestore(rq, );
 }
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -580,6 +580,8 @@ struct dl_rq {
 	 */
 	struct rb_root pushable_dl_tasks_root;
 	struct rb_node *pushable_dl_tasks_leftmost;
+
+	struct sched_avg avg;
 #else
 	struct dl_bw dl_bw;
 #endif
Subject: sched: Add IRQ utilization tracking
From: Peter Zijlstra 
Date: Mon Aug 7 17:22:46 CEST 2017

Track how much time we spend on IRQ...

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/core.c  |6 +-
 kernel/sched/fair.c  |   12 
 kernel/sched/sched.h |2 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -151,6 +151,8 @@ struct rq *task_rq_lock(struct task_stru
 	}
 }