Re: [RFD/RFC PATCH 4/8] sched: Split scheduler execution context

2019-05-06 Thread Claudio Scordino
Dear Juri,

just a small comment for the next round of patches (I guess after
OSPM)...


On 091018, 11:24, Juri Lelli wrote:
> From: Peter Zijlstra 
> 
> Lets define the scheduling context as all the scheduler state in
> task_struct and the execution context as all state required to run the
> task.
> 
> Currently both are intertwined in task_struct. We want to logically
> split these such that we can run the execution context of one task
> with the scheduling context of another.
> 
> To this purpose introduce rq::proxy to point to the task_struct used
> for scheduler state and preserve rq::curr to denote the execution
> context.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> [added lot of comments/questions - identifiable by XXX]
> Signed-off-by: Juri Lelli 
> ---
>  kernel/sched/core.c  | 62 ++--
>  kernel/sched/fair.c  |  4 +++
>  kernel/sched/sched.h | 30 -
>  3 files changed, 82 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fe0223121883..d3c481b734dd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -224,12 +224,13 @@ static enum hrtimer_restart hrtick(struct hrtimer 
> *timer)
>  {
>   struct rq *rq = container_of(timer, struct rq, hrtick_timer);
>   struct rq_flags rf;
> + struct task_struct *curr = rq->proxy;

You may want to use a different naming for these local variables (e.g.
"proxy") to help the reader in not confusing curr (i.e. scheduling ctx)
with rq::curr (i.e. execution ctx).

Best regards,

   Claudio


Re: [PATCH] kernel/time/posix-cpu-timers: Remove useless call to check_dl_overrun

2018-11-08 Thread Claudio Scordino
On 071118, 12:10, Juri Lelli wrote:
> check_dl_overrun is used to send a SIGXCPU to users that asked to be
> informed when SCHED_DEADLINE runtime overruns occur.
> 
> The function is called by check_thread_timers already, so the call in
> check_process_timers is redundant/wrong (even though harmless).
> 
> Remove it.
> 
> Fixes: 34be39305a77 ("sched/deadline: Implement "runtime overrun signal" 
> support")
> Signed-off-by: Juri Lelli 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Luca Abeni 
> Cc: Claudio Scordino 
> Cc: Mathieu Poirier 
> ---
>  kernel/time/posix-cpu-timers.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index ce32cf741b25..8f0644af40be 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -917,9 +917,6 @@ static void check_process_timers(struct task_struct *tsk,
>   struct task_cputime cputime;
>   unsigned long soft;
>  
> - if (dl_task(tsk))
> - check_dl_overrun(tsk);
> -
>   /*
>* If cputimer is not running, then there are no active
>* process wide timers (POSIX 1.b, itimers, RLIMIT_CPU).
> -- 

[ Superfluous as (fortunately) already queued...]
Tested-by: Claudio Scordino 

Thank you for taking care of this pending issue.

  Claudio


Re: [PATCH v5 00/10] track CPU utilization

2018-06-06 Thread Claudio Scordino
Hi Quentin,

2018-06-06 15:20 GMT+02:00 Quentin Perret :
>
> Hi Claudio,
>
> On Wednesday 06 Jun 2018 at 15:05:58 (+0200), Claudio Scordino wrote:
> > Hi Quentin,
> >
> > Il 05/06/2018 16:13, Juri Lelli ha scritto:
> > > On 05/06/18 15:01, Quentin Perret wrote:
> > > > On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote:
> > > > > On 05/06/18 14:05, Quentin Perret wrote:
> > > > > > On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote:
> > > > > > > Hi Quentin,
> > > > > > >
> > > > > > > On 05/06/18 11:57, Quentin Perret wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > What about the diff below (just a quick hack to show the idea) 
> > > > > > > > applied
> > > > > > > > on tip/sched/core ?
> > > > > > > >
> > > > > > > > ---8<---
> > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c 
> > > > > > > > b/kernel/sched/cpufreq_schedutil.c
> > > > > > > > index a8ba6d1f262a..23a4fb1c2c25 100644
> > > > > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > > > > @@ -180,9 +180,12 @@ static void sugov_get_util(struct 
> > > > > > > > sugov_cpu *sg_cpu)
> > > > > > > >   sg_cpu->util_dl  = cpu_util_dl(rq);
> > > > > > > >   }
> > > > > > > > +unsigned long scale_rt_capacity(int cpu);
> > > > > > > >   static unsigned long sugov_aggregate_util(struct sugov_cpu 
> > > > > > > > *sg_cpu)
> > > > > > > >   {
> > > > > > > >   struct rq *rq = cpu_rq(sg_cpu->cpu);
> > > > > > > > + int cpu = sg_cpu->cpu;
> > > > > > > > + unsigned long util, dl_bw;
> > > > > > > >   if (rq->rt.rt_nr_running)
> > > > > > > >   return sg_cpu->max;
> > > > > > > > @@ -197,7 +200,14 @@ static unsigned long 
> > > > > > > > sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> > > > > > > >* util_cfs + util_dl as requested freq. However, 
> > > > > > > > cpufreq is not yet
> > > > > > > >* ready for such an interface. So, we only do the 
> > > > > > > > latter for now.
> > > > > > > >*/
> > > > > > > > - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > > > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) * 
> > > > > > > > scale_rt_capacity(cpu);
> > > > > > >
> > > > > > > Sorry to be pedantinc, but this (ATM) includes DL avg 
> > > > > > > contribution, so,
> > > > > > > since we use max below, we will probably have the same problem 
> > > > > > > that we
> > > > > > > discussed on Vincent's approach (overestimation of DL 
> > > > > > > contribution while
> > > > > > > we could use running_bw).
> > > > > >
> > > > > > Ah no, you're right, this isn't great for long running deadline 
> > > > > > tasks.
> > > > > > We should definitely account for the running_bw here, not the dl 
> > > > > > avg...
> > > > > >
> > > > > > I was trying to address the issue of RT stealing time from CFS 
> > > > > > here, but
> > > > > > the DL integration isn't quite right which this patch as-is, I 
> > > > > > agree ...
> > > > > >
> > > > > > >
> > > > > > > > + util >>= SCHED_CAPACITY_SHIFT;
> > > > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) - util;
> > > > > > > > + util += sg_cpu->util_cfs;
> > > > > > > > + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> > > > > > >
> > > > > > > Why this_bw instead of running_bw?
> > > > > >
> > > > > >

Re: [PATCH v5 00/10] track CPU utilization

2018-06-06 Thread Claudio Scordino

Hi Quentin,

Il 05/06/2018 16:13, Juri Lelli ha scritto:

On 05/06/18 15:01, Quentin Perret wrote:

On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote:

On 05/06/18 14:05, Quentin Perret wrote:

On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote:

Hi Quentin,

On 05/06/18 11:57, Quentin Perret wrote:

[...]


What about the diff below (just a quick hack to show the idea) applied
on tip/sched/core ?

---8<---
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index a8ba6d1f262a..23a4fb1c2c25 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
sg_cpu->util_dl  = cpu_util_dl(rq);
  }
  
+unsigned long scale_rt_capacity(int cpu);

  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
  {
struct rq *rq = cpu_rq(sg_cpu->cpu);
+   int cpu = sg_cpu->cpu;
+   unsigned long util, dl_bw;
  
  	if (rq->rt.rt_nr_running)

return sg_cpu->max;
@@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu 
*sg_cpu)
 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 * ready for such an interface. So, we only do the latter for now.
 */
-   return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
+   util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu);


Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so,
since we use max below, we will probably have the same problem that we
discussed on Vincent's approach (overestimation of DL contribution while
we could use running_bw).


Ah no, you're right, this isn't great for long running deadline tasks.
We should definitely account for the running_bw here, not the dl avg...

I was trying to address the issue of RT stealing time from CFS here, but
the DL integration isn't quite right which this patch as-is, I agree ...




+   util >>= SCHED_CAPACITY_SHIFT;
+   util = arch_scale_cpu_capacity(NULL, cpu) - util;
+   util += sg_cpu->util_cfs;
+   dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;


Why this_bw instead of running_bw?


So IIUC, this_bw should basically give you the absolute reservation (== the
sum of runtime/deadline ratios of all DL tasks on that rq).


Yep.


The reason I added this max is because I'm still not sure to understand
how we can safely drop the freq below that point ? If we don't guarantee
to always stay at least at the freq required by DL, aren't we risking to
start a deadline tasks stuck at a low freq because of rate limiting ? In
this case, if that tasks uses all of its runtime then you might start
missing deadlines ...


We decided to avoid (software) rate limiting for DL with e97a90f7069b
("sched/cpufreq: Rate limits for SCHED_DEADLINE").


Right, I spotted that one, but yeah you could also be limited by HW ...




My feeling is that the only safe thing to do is to guarantee to never go
below the freq required by DL, and to optimistically add CFS tasks
without raising the OPP if we have good reasons to think that DL is
using less than it required (which is what we should get by using
running_bw above I suppose). Does that make any sense ?


Then we can't still avoid the hardware limits, so using running_bw is a
trade off between safety (especially considering soft real-time
scenarios) and energy consumption (which seems to be working in
practice).


Ok, I see ... Have you guys already tried something like my patch above
(keeping the freq >= this_bw) in real world use cases ? Is this costing
that much energy in practice ? If we fill the gaps left by DL (when it


IIRC, Claudio (now Cc-ed) did experiment a bit with both approaches, so
he might add some numbers to my words above. I didn't (yet). But, please
consider that I might be reserving (for example) 50% of bandwidth for my
heavy and time sensitive task and then have that task wake up only once
in a while (but I'll be keeping clock speed up for the whole time). :/


As far as I can remember, we never tested energy consumption of running_bw vs 
this_bw, as at OSPM'17 we had already decided to use running_bw implementing 
GRUB-PA.
The rationale is that, as Juri pointed out, the amount of spare (i.e. 
reclaimable) bandwidth in this_bw is very user-dependent.
For example, the user can let this_bw be much higher than the measured 
bandwidth, just to be sure that the deadlines are met even in corner cases.
In practice, this means that the task executes for quite a short time and then 
blocks (with its bandwidth reclaimed, hence the CPU frequency reduced, at the 
0lag time).
Using this_bw rather than running_bw, the CPU frequency would remain at the 
same fixed value even when the task is blocked.
I understand that on some cases it could even be better (i.e. no waste of 
energy in frequency switch).
However, IMHO, these are corner cases and in the average case it is

[tip:sched/core] sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation

2018-05-14 Thread tip-bot for Claudio Scordino
Commit-ID:  bb4e30a48045c9cc16c4efe447489542750397cc
Gitweb: https://git.kernel.org/tip/bb4e30a48045c9cc16c4efe447489542750397cc
Author: Claudio Scordino 
AuthorDate: Tue, 3 Apr 2018 09:42:42 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 14 May 2018 09:12:27 +0200

sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation

Signed-off-by: Claudio Scordino 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jonathan Corbet 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Luca Abeni 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1522741362-4542-1-git-send-email-clau...@evidence.eu.com
Signed-off-by: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 8ce78f82ae23..b14e03ff3528 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -49,7 +49,7 @@ CONTENTS
 2.1 Main algorithm
 --
 
- SCHED_DEADLINE uses three parameters, named "runtime", "period", and
+ SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and
  "deadline", to schedule tasks. A SCHED_DEADLINE task should receive
  "runtime" microseconds of execution time every "period" microseconds, and
  these "runtime" microseconds are available within "deadline" microseconds
@@ -117,6 +117,10 @@ CONTENTS
  scheduling deadline = scheduling deadline + period
  remaining runtime = remaining runtime + runtime
 
+ The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task
+ to get informed about runtime overruns through the delivery of SIGXCPU
+ signals.
+
 
 2.2 Bandwidth reclaiming
 
@@ -279,6 +283,19 @@ CONTENTS
 running_bw is incremented.
 
 
+2.3 Energy-aware scheduling
+
+
+ When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the
+ GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum
+ value that still allows to meet the deadlines. This behavior is currently
+ implemented only for ARM architectures.
+
+ A particular care must be taken in case the time needed for changing frequency
+ is of the same order of magnitude of the reservation period. In such cases,
+ setting a fixed CPU frequency results in a lower amount of deadline misses.
+
+
 3. Scheduling Real-Time Tasks
 =
 
@@ -505,6 +522,12 @@ CONTENTS
   17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: 
parallel
or sequential?. In Proceedings of the 31st Annual ACM Symposium on 
Applied
Computing, 2016.
+  18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the
+   Linux kernel, Software: Practice and Experience, 46(6): 821-839, June
+   2016.
+  19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in
+   the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC
+   2018), Pau, France, April 2018.
 
 
 4. Bandwidth management


Re: [RFC PATCH] sched/cpufreq/schedutil: handling urgent frequency requests

2018-05-08 Thread Claudio Scordino



Il 08/05/2018 08:54, Viresh Kumar ha scritto:

On 07-05-18, 16:43, Claudio Scordino wrote:

At OSPM, it was mentioned the issue about urgent CPU frequency requests
arriving when a frequency switch is already in progress.

Besides the various issues (physical time for switching frequency,
on-going kthread activity, etc.) one (minor) issue is the kernel
"forgetting" such request, thus waiting the next switch time for
recomputing the needed frequency and behaving accordingly.

This patch makes the kthread serve any urgent request occurred during
the previous frequency switch. It introduces a specific flag, only set
when the SCHED_DEADLINE scheduling class increases the CPU utilization,
aiming at decreasing the likelihood of a deadline miss.

Indeed, some preliminary tests in critical conditions (i.e.
SCHED_DEADLINE tasks with short periods) have shown reductions of more
than 10% of the average number of deadline misses. On the other hand,
the increase in terms of energy consumption when running SCHED_DEADLINE
tasks (not yet measured) is likely to be not negligible (especially in
case of critical scenarios like "ramp up" utilizations).

The patch is meant as follow-up discussion after OSPM.

Signed-off-by: Claudio Scordino 
CC: Viresh Kumar 
CC: Rafael J. Wysocki 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Juri Lelli 
Cc: Luca Abeni 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
---
  kernel/sched/cpufreq_schedutil.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d2c6083..4de06b0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -41,6 +41,7 @@ struct sugov_policy {
boolwork_in_progress;
  
  	bool			need_freq_update;

+   boolurgent_freq_update;
  };
  
  struct sugov_cpu {

@@ -92,6 +93,14 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
  
+	/*

+* Continue computing the new frequency. In case of work_in_progress,
+* the kthread will resched a change once the current transition is
+* finished.
+*/
+   if (sg_policy->urgent_freq_update)
+   return true;
+
if (sg_policy->work_in_progress)
return false;
  
@@ -121,6 +130,9 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,

sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
  
+	if (sg_policy->work_in_progress)

+   return;
+
if (policy->fast_switch_enabled) {
next_freq = cpufreq_driver_fast_switch(policy, next_freq);
if (!next_freq)
@@ -274,7 +286,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu 
*sg_cpu) { return false; }
  static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct 
sugov_policy *sg_policy)
  {
if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
-   sg_policy->need_freq_update = true;
+   sg_policy->urgent_freq_update = true;
  }
  
  static void sugov_update_single(struct update_util_data *hook, u64 time,

@@ -383,8 +395,11 @@ static void sugov_work(struct kthread_work *work)
struct sugov_policy *sg_policy = container_of(work, struct 
sugov_policy, work);
  
  	mutex_lock(&sg_policy->work_lock);

-   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
+   do {
+   sg_policy->urgent_freq_update = false;
+   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
CPUFREQ_RELATION_L);


If we are going to solve this problem, then maybe instead of the added
complexity and a new flag we can look for need_freq_update flag at this location
and re-calculate the next frequency if required.


I agree.
Indeed, I've been in doubt if adding a new flag or relying on the existing 
need_freq_update flag (whose name, however, didn't seem to reflect any sense of 
urgency).
Maybe we can use need_freq_update but change its name to a more meaningful 
string ?

Thanks,

   Claudio


[RFC PATCH] sched/cpufreq/schedutil: handling urgent frequency requests

2018-05-07 Thread Claudio Scordino
At OSPM, it was mentioned the issue about urgent CPU frequency requests
arriving when a frequency switch is already in progress.

Besides the various issues (physical time for switching frequency,
on-going kthread activity, etc.) one (minor) issue is the kernel
"forgetting" such request, thus waiting the next switch time for
recomputing the needed frequency and behaving accordingly.

This patch makes the kthread serve any urgent request occurred during
the previous frequency switch. It introduces a specific flag, only set
when the SCHED_DEADLINE scheduling class increases the CPU utilization,
aiming at decreasing the likelihood of a deadline miss.

Indeed, some preliminary tests in critical conditions (i.e.
SCHED_DEADLINE tasks with short periods) have shown reductions of more
than 10% of the average number of deadline misses. On the other hand,
the increase in terms of energy consumption when running SCHED_DEADLINE
tasks (not yet measured) is likely to be not negligible (especially in
case of critical scenarios like "ramp up" utilizations).

The patch is meant as follow-up discussion after OSPM.

Signed-off-by: Claudio Scordino 
CC: Viresh Kumar 
CC: Rafael J. Wysocki 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Juri Lelli 
Cc: Luca Abeni 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
---
 kernel/sched/cpufreq_schedutil.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d2c6083..4de06b0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -41,6 +41,7 @@ struct sugov_policy {
boolwork_in_progress;
 
boolneed_freq_update;
+   boolurgent_freq_update;
 };
 
 struct sugov_cpu {
@@ -92,6 +93,14 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
 
+   /*
+* Continue computing the new frequency. In case of work_in_progress,
+* the kthread will resched a change once the current transition is
+* finished.
+*/
+   if (sg_policy->urgent_freq_update)
+   return true;
+
if (sg_policy->work_in_progress)
return false;
 
@@ -121,6 +130,9 @@ static void sugov_update_commit(struct sugov_policy 
*sg_policy, u64 time,
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
 
+   if (sg_policy->work_in_progress)
+   return;
+
if (policy->fast_switch_enabled) {
next_freq = cpufreq_driver_fast_switch(policy, next_freq);
if (!next_freq)
@@ -274,7 +286,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu 
*sg_cpu) { return false; }
 static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct 
sugov_policy *sg_policy)
 {
if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
-   sg_policy->need_freq_update = true;
+   sg_policy->urgent_freq_update = true;
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
@@ -383,8 +395,11 @@ static void sugov_work(struct kthread_work *work)
struct sugov_policy *sg_policy = container_of(work, struct 
sugov_policy, work);
 
mutex_lock(&sg_policy->work_lock);
-   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
+   do {
+   sg_policy->urgent_freq_update = false;
+   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
CPUFREQ_RELATION_L);
+   } while (sg_policy->urgent_freq_update);
mutex_unlock(&sg_policy->work_lock);
 
sg_policy->work_in_progress = false;
@@ -673,6 +688,7 @@ static int sugov_start(struct cpufreq_policy *policy)
sg_policy->next_freq= UINT_MAX;
sg_policy->work_in_progress = false;
sg_policy->need_freq_update = false;
+   sg_policy->urgent_freq_update   = false;
sg_policy->cached_raw_freq  = 0;
 
for_each_cpu(cpu, policy->cpus) {
-- 
2.7.4



[PATCH] sched/deadline: add overrun signal and GRUB-PA in the documentation

2018-04-03 Thread Claudio Scordino
Signed-off-by: Claudio Scordino 
CC: Juri Lelli 
CC: Luca Abeni 
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
---
 Documentation/scheduler/sched-deadline.txt | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index 8ce78f8..b14e03f 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -49,7 +49,7 @@ CONTENTS
 2.1 Main algorithm
 --
 
- SCHED_DEADLINE uses three parameters, named "runtime", "period", and
+ SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and
  "deadline", to schedule tasks. A SCHED_DEADLINE task should receive
  "runtime" microseconds of execution time every "period" microseconds, and
  these "runtime" microseconds are available within "deadline" microseconds
@@ -117,6 +117,10 @@ CONTENTS
  scheduling deadline = scheduling deadline + period
  remaining runtime = remaining runtime + runtime
 
+ The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task
+ to get informed about runtime overruns through the delivery of SIGXCPU
+ signals.
+
 
 2.2 Bandwidth reclaiming
 
@@ -279,6 +283,19 @@ CONTENTS
 running_bw is incremented.
 
 
+2.3 Energy-aware scheduling
+
+
+ When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the
+ GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum
+ value that still allows to meet the deadlines. This behavior is currently
+ implemented only for ARM architectures.
+
+ A particular care must be taken in case the time needed for changing frequency
+ is of the same order of magnitude of the reservation period. In such cases,
+ setting a fixed CPU frequency results in a lower amount of deadline misses.
+
+
 3. Scheduling Real-Time Tasks
 =
 
@@ -505,6 +522,12 @@ CONTENTS
   17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: 
parallel
or sequential?. In Proceedings of the 31st Annual ACM Symposium on 
Applied
Computing, 2016.
+  18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the
+   Linux kernel, Software: Practice and Experience, 46(6): 821-839, June
+   2016.
+  19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in
+   the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC
+   2018), Pau, France, April 2018.
 
 
 4. Bandwidth management
-- 
2.7.4



[tip:sched/core] sched/cpufreq: Rate limits for SCHED_DEADLINE

2018-03-23 Thread tip-bot for Claudio Scordino
Commit-ID:  e97a90f7069b740575bcb1dae86596e0484b8957
Gitweb: https://git.kernel.org/tip/e97a90f7069b740575bcb1dae86596e0484b8957
Author: Claudio Scordino 
AuthorDate: Tue, 13 Mar 2018 11:35:40 +0100
Committer:  Thomas Gleixner 
CommitDate: Fri, 23 Mar 2018 22:48:22 +0100

sched/cpufreq: Rate limits for SCHED_DEADLINE

When the SCHED_DEADLINE scheduling class increases the CPU utilization, it
should not wait for the rate limit, otherwise it may miss some deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).

Signed-off-by: Claudio Scordino 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Rafael J. Wysocki 
Acked-by: Viresh Kumar 
Cc: Juri Lelli 
Cc: Joel Fernandes 
Cc: Vincent Guittot 
Cc: linux...@vger.kernel.org
Cc: Peter Zijlstra 
Cc: Morten Rasmussen 
Cc: Patrick Bellasi 
Cc: Todd Kjos 
Cc: Dietmar Eggemann 
Link: 
https://lkml.kernel.org/r/1520937340-2755-1-git-send-email-clau...@evidence.eu.com
---
 kernel/sched/cpufreq_schedutil.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 89fe78ecb88c..2b124811947d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -267,6 +267,16 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; 
}
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/*
+ * Make sugov_should_update_freq() ignore the rate limit when DL
+ * has increased the utilization.
+ */
+static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct 
sugov_policy *sg_policy)
+{
+   if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
+   sg_policy->need_freq_update = true;
+}
+
 static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
 {
@@ -279,6 +289,8 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
 
+   ignore_dl_rate_limit(sg_cpu, sg_policy);
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -356,6 +368,8 @@ sugov_update_shared(struct update_util_data *hook, u64 
time, unsigned int flags)
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
 
+   ignore_dl_rate_limit(sg_cpu, sg_policy);
+
if (sugov_should_update_freq(sg_policy, time)) {
next_f = sugov_next_freq_shared(sg_cpu, time);
sugov_update_commit(sg_policy, time, next_f);


[PATCH v4] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-03-13 Thread Claudio Scordino
When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some
deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).

Signed-off-by: Claudio Scordino 
Acked-by: Viresh Kumar 
Reviewed-by: Rafael J. Wysocki 
CC: Rafael J. Wysocki 
CC: Viresh Kumar 
CC: Patrick Bellasi 
CC: Dietmar Eggemann 
CC: Morten Rasmussen 
CC: Juri Lelli 
CC: Vincent Guittot 
CC: Todd Kjos 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
Changes from v3:
 - Specific routine renamed as ignore_dl_rate_limit()
---
Changes from v2:
 - Rate limit ignored also in case of "fast switch"
 - Specific routine added
---
Changes from v1:
 - Logic moved from sugov_should_update_freq() to
   sugov_update_single()/_shared() to not duplicate data structures
 - Rate limit not ignored in case of "fast switch"
---
 kernel/sched/cpufreq_schedutil.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index feb5f89..2aeb1ca 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -257,6 +257,16 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; 
}
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/*
+ * Make sugov_should_update_freq() ignore the rate limit when DL
+ * has increased the utilization.
+ */
+static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct 
sugov_policy *sg_policy)
+{
+   if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
+   sg_policy->need_freq_update = true;
+}
+
 static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
 {
@@ -270,6 +280,8 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time);
sg_cpu->last_update = time;
 
+   ignore_dl_rate_limit(sg_cpu, sg_policy);
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -351,6 +363,8 @@ sugov_update_shared(struct update_util_data *hook, u64 
time, unsigned int flags)
 
raw_spin_lock(&sg_policy->update_lock);
 
+   ignore_dl_rate_limit(sg_cpu, sg_policy);
+
sugov_get_util(sg_cpu);
sg_cpu->flags = flags;
 
-- 
2.7.4



[PATCH v3] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-03-08 Thread Claudio Scordino
When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some
deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).

Signed-off-by: Claudio Scordino 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Dietmar Eggemann 
CC: Morten Rasmussen 
CC: Juri Lelli 
CC: Viresh Kumar 
CC: Vincent Guittot 
CC: Todd Kjos 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
Changes from v2:
 - Rate limit ignored also in case of "fast switch"
 - Specific routine added
---
Changes from v1:
 - Logic moved from sugov_should_update_freq() to
   sugov_update_single()/_shared() to not duplicate data structures
 - Rate limit not ignored in case of "fast switch"
---
 kernel/sched/cpufreq_schedutil.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 7936f54..13f9cce 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -260,6 +260,17 @@ static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; 
}
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/*
+ * Make sugov_should_update_freq() ignore the rate limit when DL
+ * has increased the utilization.
+ */
+static inline
+void set_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy 
*sg_policy)
+{
+   if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
+   sg_policy->need_freq_update = true;
+}
+
 static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int flags)
 {
@@ -273,6 +284,8 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time);
sg_cpu->last_update = time;
 
+   set_dl_rate_limit(sg_cpu, sg_policy);
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -354,6 +367,8 @@ static void sugov_update_shared(struct update_util_data 
*hook, u64 time,
 
raw_spin_lock(&sg_policy->update_lock);
 
+   set_dl_rate_limit(sg_cpu, sg_policy);
+
sugov_get_util(sg_cpu);
sg_cpu->flags = flags;
 
-- 
2.7.4



Re: [PATCH v2] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-02-28 Thread Claudio Scordino

Dear Rafael, dear Viresh,

Il 28/02/2018 12:06, Claudio Scordino ha scritto:

When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some
deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).


As a follow up of the previous thread, I've put some figures here: 
https://gist.github.com/claudioscordino/d4a10e8b3ceac419fb0c8b552db19806

In some cases, I've noticed the patch to even reduce the energy consumption 
(due to a mix of factors plus DL tasks entering the inactive state sooner).

I've also tried to create the "ramp-up" scenario by allocating 10 DL tasks on 
the same core, but it didn't produce any significant increase of consumption.

IMHO, the overall behavior looks better.

Best regards,

 Claudio


[PATCH v2] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-02-28 Thread Claudio Scordino
When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some
deadline.

Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have
shown reductions of even 10% of deadline misses with a negligible
increase of energy consumption (measured through Baylibre Cape).

Signed-off-by: Claudio Scordino 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Dietmar Eggemann 
CC: Morten Rasmussen 
CC: Juri Lelli 
CC: Viresh Kumar 
CC: Vincent Guittot 
CC: Todd Kjos 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
Changes from v1:
 - Logic moved from sugov_should_update_freq() to
   sugov_update_single()/_shared() to not duplicate data structures
 - Rate limit not ignored in case of "fast switch"
---
 kernel/sched/cpufreq_schedutil.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 7936f54..ca6ce72 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -273,6 +273,14 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time);
sg_cpu->last_update = time;
 
+   /*
+* Make sugov_should_update_freq() ignore the rate limit when DL
+* has increased the utilization.
+*/
+   if ((cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) &&
+   !(sg_policy->policy->fast_switch_enabled))
+   sg_policy->need_freq_update = true;
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -354,6 +362,14 @@ static void sugov_update_shared(struct update_util_data 
*hook, u64 time,
 
raw_spin_lock(&sg_policy->update_lock);
 
+   /*
+* Make sugov_should_update_freq() ignore the rate limit when DL
+* has increased the utilization.
+*/
+   if ((cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl) &&
+   !(sg_policy->policy->fast_switch_enabled))
+   sg_policy->need_freq_update = true;
+
sugov_get_util(sg_cpu);
sg_cpu->flags = flags;
 
-- 
2.7.4



Re: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-02-09 Thread Claudio Scordino



Il 09/02/2018 13:56, Rafael J. Wysocki ha scritto:

On Fri, Feb 9, 2018 at 1:52 PM, Juri Lelli  wrote:

On 09/02/18 13:08, Rafael J. Wysocki wrote:

On Fri, Feb 9, 2018 at 12:51 PM, Juri Lelli  wrote:

On 09/02/18 12:37, Rafael J. Wysocki wrote:

On Fri, Feb 9, 2018 at 12:26 PM, Juri Lelli  wrote:

On 09/02/18 12:04, Rafael J. Wysocki wrote:

On Fri, Feb 9, 2018 at 11:53 AM, Juri Lelli  wrote:

Hi,

On 09/02/18 11:36, Rafael J. Wysocki wrote:

On Friday, February 9, 2018 9:02:34 AM CET Claudio Scordino wrote:

Hi Viresh,

Il 09/02/2018 04:51, Viresh Kumar ha scritto:

On 08-02-18, 18:01, Claudio Scordino wrote:

When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some deadline.

Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline
misses for tasks with low RT periods.

The patch applies on top of the one recently proposed by Peter to drop the
SCHED_CPUFREQ_* flags.



[cut]





Is it possible to (somehow) check here if the DL tasks will miss
deadline if we continue to run at current frequency? And only ignore
rate-limit if that is the case ?


Isn't it always the case? Utilization associated to DL tasks is given by
what the user said it's needed to meet a task deadlines (admission
control). If that task wakes up and we realize that adding its
utilization contribution is going to require a frequency change, we
should _theoretically_ always do it, or it will be too late. Now, user
might have asked for a bit more than what strictly required (this is
usually the case to compensate for discrepancies between theory and real
world, e.g.  hw transition limits), but I don't think there is a way to
know "how much". :/


You are right.

I'm somewhat concerned about "fast switch" cases when the rate limit
is used to reduce overhead.


Mmm, right. I'm thinking that in those cases we could leave rate limit
as is. The user should then be aware of it and consider it as proper
overhead when designing her/his system.

But then, isn't it the same for "non fast switch" platforms? I mean,
even in the latter case we can't go faster than hw limits.. mmm, maybe
the difference is that in the former case we could go as fast as theory
would expect.. but we shouldn't. :)


Well, in practical terms that means "no difference" IMO. :-)

I can imagine that in some cases this approach may lead to better
results than reducing the rate limit overall, but the general case I'm
not sure about.

I mean, if overriding the rate limit doesn't take place very often,
then it really should make no difference overhead-wise.  Now, of
course, how to define "not very often" is a good question as that
leads to rate-limiting the overriding of the original rate limit and
that scheme may continue indefinitely ...


:)

My impression is that rate limit helps a lot for CFS, where the "true"
utilization is not known in advance, and being too responsive might
actually be counterproductive.

For DEADLINE (and RT, with differences) we should always respond as
quick as we can (and probably remember that a frequency transition was
requested if hw was already performing one, but that's another patch)
because, if we don't, a task belonging to a lower priority class might
induce deadline misses in highest priority activities. E.g., a CFS task
that happens to trigger a freq switch right before a DEADLINE task wakes
up and needs an higher frequency to meet its deadline: if we wait for
the rate limit of the CFS originated transition.. deadline miss!


Fair enough, but if there's too much overhead as a result of this, you
can't guarantee the deadlines to be met anyway.


Indeed. I guess this only works if corner cases as the one above don't
happen too often.


Well, that's the point.

So there is a tradeoff: do we want to allow deadlines to be missed
because of excessive overhead or do we want to allow deadlines to be
missed because of the rate limit.


For a very few tasks, the tests have indeed shown that the approach pays off: 
we get a significant reduction of misses with a negligible increase of energy 
consumption.
I still need to check what happens for a high amount of tasks, trying to reproduce the  
"ramp up" pattern (in which DL keeps increasing the utilization, ignoring the 
rate limit and adding overhead)

Thanks,

  Claudio


Re: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-02-09 Thread Claudio Scordino

Hi Viresh,

Il 09/02/2018 04:51, Viresh Kumar ha scritto:

On 08-02-18, 18:01, Claudio Scordino wrote:

When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some deadline.

Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline
misses for tasks with low RT periods.

The patch applies on top of the one recently proposed by Peter to drop the
SCHED_CPUFREQ_* flags.

Signed-off-by: Claudio Scordino 
CC: Rafael J . Wysocki 
CC: Patrick Bellasi 
CC: Dietmar Eggemann 
CC: Morten Rasmussen 
CC: Juri Lelli 
CC: Viresh Kumar 
CC: Vincent Guittot 
CC: Todd Kjos 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
  kernel/sched/cpufreq_schedutil.c | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)


So the previous commit was surely incorrect as it relied on comparing
frequencies instead of dl-util, and freq requirements could have even
changed due to CFS.


You're right.
The very original patch (not posted) added a specific SCHED_CPUFREQ flag to let 
the scheduling class ask for ignoring the rate limit.
However, polluting the API with further flags is not such a good approach.
The next patches didn't introduce such flag, but were incorrect.




diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index b0bd77d..d8dcba2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -74,7 +74,10 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
  
  / Governor internals ***/
  
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

+static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
+u64 time,
+struct sugov_cpu *sg_cpu_old,
+struct sugov_cpu *sg_cpu_new)
  {
s64 delta_ns;
  
@@ -111,6 +114,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

return true;
}
  
+	/* Ignore rate limit when DL increased utilization. */

+   if (sg_cpu_new->util_dl > sg_cpu_old->util_dl)
+   return true;
+


Changing the frequency has a penalty, specially in the ARM world (and
that's where you are testing your stuff). I am worried that we will
have (corner) cases where we will waste a lot of time changing the
frequencies. For example (I may be wrong here), what if 10 small DL
tasks are queued one after the other? The util will keep on changing
and so will the frequency ? There may be more similar cases ?


I forgot to say that I've not observed any relevant increase of the energy 
consumption (measured through a Baylibre Cape).
However, the tests had a very small number of RT tasks.

If I'm not wrong, at the hardware level we do have a physical rate limit (as we 
cannot trigger a frequency update when there is one already on-going).
Don't know if this could somehow mitigate such effect.

Anyway, I'll repeat the tests with a considerable amount of RT tasks to check if I can 
reproduce such "ramp up" situation.
Depending on the energy results, we may have to choose between meeting more RT 
deadlines and consuming less energy.



Is it possible to (somehow) check here if the DL tasks will miss
deadline if we continue to run at current frequency? And only ignore
rate-limit if that is the case ?


I need to think further about it.




delta_ns = time - sg_policy->last_freq_update_time;
return delta_ns >= sg_policy->freq_update_delay_ns;
  }
@@ -271,6 +278,7 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
unsigned int flags)
  {
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, 
update_util);
+   struct sugov_cpu sg_cpu_old = *sg_cpu;


Not really a big deal, but this structure is 80 bytes on ARM64, why
copy everything when what we need is just 8 bytes ?


I didn't want to add deadline-specific code into the sugov_should_update_freq() 
signature as it should remain independent from the scheduling classes.
In my opinion, the best approach would be to group util_cfs and util_dl in a 
struct within sugov_cpu and pass that struct to sugov_should_update_freq().

Thanks for your comments.

   Claudio


[PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE

2018-02-08 Thread Claudio Scordino
When the SCHED_DEADLINE scheduling class increases the CPU utilization,
we should not wait for the rate limit, otherwise we may miss some deadline.

Tests using rt-app on Exynos5422 have shown reductions of about 10% of deadline
misses for tasks with low RT periods.

The patch applies on top of the one recently proposed by Peter to drop the
SCHED_CPUFREQ_* flags.

Signed-off-by: Claudio Scordino 
CC: Rafael J . Wysocki 
CC: Patrick Bellasi 
CC: Dietmar Eggemann 
CC: Morten Rasmussen 
CC: Juri Lelli 
CC: Viresh Kumar 
CC: Vincent Guittot 
CC: Todd Kjos 
CC: Joel Fernandes 
CC: linux...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 kernel/sched/cpufreq_schedutil.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index b0bd77d..d8dcba2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -74,7 +74,10 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 / Governor internals ***/
 
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
+static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
+u64 time,
+struct sugov_cpu *sg_cpu_old,
+struct sugov_cpu *sg_cpu_new)
 {
s64 delta_ns;
 
@@ -111,6 +114,10 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
return true;
}
 
+   /* Ignore rate limit when DL increased utilization. */
+   if (sg_cpu_new->util_dl > sg_cpu_old->util_dl)
+   return true;
+
delta_ns = time - sg_policy->last_freq_update_time;
return delta_ns >= sg_policy->freq_update_delay_ns;
 }
@@ -271,6 +278,7 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
unsigned int flags)
 {
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, 
update_util);
+   struct sugov_cpu sg_cpu_old = *sg_cpu;
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
unsigned long util, max;
unsigned int next_f;
@@ -279,7 +287,7 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
 
-   if (!sugov_should_update_freq(sg_policy, time))
+   if (!sugov_should_update_freq(sg_policy, time, &sg_cpu_old, sg_cpu))
return;
 
busy = sugov_cpu_is_busy(sg_cpu);
@@ -350,6 +358,7 @@ static void sugov_update_shared(struct update_util_data 
*hook, u64 time,
unsigned int flags)
 {
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, 
update_util);
+   struct sugov_cpu sg_cpu_old = *sg_cpu;
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
unsigned int next_f;
 
@@ -359,7 +368,7 @@ static void sugov_update_shared(struct update_util_data 
*hook, u64 time,
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
 
-   if (sugov_should_update_freq(sg_policy, time)) {
+   if (sugov_should_update_freq(sg_policy, time, &sg_cpu_old, sg_cpu)) {
next_f = sugov_next_freq_shared(sg_cpu, time);
sugov_update_commit(sg_policy, time, next_f);
}
-- 
2.7.4



Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates

2018-02-08 Thread Claudio Scordino

Hi Patrick,

Il 06/02/2018 19:36, Patrick Bellasi ha scritto:

On 06-Feb 19:14, Claudio Scordino wrote:

Hi Patrick,



At first glance, your proposal below makes to make sense.

However, I'm wondering if we cannot get it working using
rq->dl's provided information instead of flags?


Yes, we can use the value of rq->dl to check if there has been an increase of 
the deadline utilization.
Even if schedutil might have been triggered by a different scheduling class, 
the effect should be almost the same.

Below a potential patch. I've kept all frequency update decisions in a single 
point (i.e. sugov_should_update_freq).
Not yet tested (waiting for further comments).


I have a todo list entry to backport/test Peter's series on Android...
will add this patch too. Thanks.


Please discard my latest patch, as the tests on Odroid have shown performance 
regressions
(likely due to sugov_next_freq_shared being called more often)

Never mind. I have already tested another (even simpler) patch.
Sending soon as a new thread on LKML.

Many thanks and best regards,

Claudio


Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates

2018-02-06 Thread Claudio Scordino

Hi Patrick,

Il 06/02/2018 16:43, Patrick Bellasi ha scritto:

Hi Claudio,

On 06-Feb 11:55, Claudio Scordino wrote:

Hi Peter,

Il 20/12/2017 16:30, Peter Zijlstra ha scritto:


So I ended up with the below (on top of Juri's cpufreq-dl patches).

It compiles, but that's about all the testing it had.

--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h


[..]


@@ -188,17 +187,23 @@ static void sugov_get_util(struct sugov_
  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
  {
+   unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
+   struct rq *rq = cpu_rq(sg_cpu->cpu);
+
+   if (rq->rt.rt_nr_running)
+   util = sg_cpu->max;
+
/*
 * Ideally we would like to set util_dl as min/guaranteed freq and
 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 * ready for such an interface. So, we only do the latter for now.
 */
-   return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
+   return min(util, sg_cpu->max);
  }


[...]



What is the status of this patch ? I couldn't find it on the
tip/queue repositories.

BTW, I wonder if we actually want to remove also the information
about the scheduling class who triggered the frequency change.


Removing flags was the main goal of the patch, since they represents
mainly duplicated information which scheduling classes already know.

This was making flags update error prone and difficult to keep
aligned with existing scheduling classes info.


This prevents us from adopting class-specific behaviors.


In Peter's proposal he replaces flags with checks like:

if (rq->rt.rt_nr_running)


For example, we might want to skip the rate limits when deadline
asks for an increase of frequency, as shown in the patch below.
In this case, we could just remove the flags from sugov_cpu, but
leave the defines and the argument for sugov_update_*()


At first glance, your proposal below makes to make sense.

However, I'm wondering if we cannot get it working using
rq->dl's provided information instead of flags?


Yes, we can use the value of rq->dl to check if there has been an increase of 
the deadline utilization.
Even if schedutil might have been triggered by a different scheduling class, 
the effect should be almost the same.

Below a potential patch. I've kept all frequency update decisions in a single 
point (i.e. sugov_should_update_freq).
Not yet tested (waiting for further comments).

Thanks,

   Claudio




From 49a6eec60574ae93297406d40155e6ce4113e442 Mon Sep 17 00:00:00 2001
From: Claudio Scordino 
Date: Tue, 6 Feb 2018 18:42:23 +0100
Subject: [PATCH] cpufreq: schedutil: rate limits for SCHED_DEADLINE

When the SCHED_DEADLINE scheduling class asks to increase the CPU
frequency, we should not wait the rate limit, otherwise we may miss some
deadline.

This patch moves all frequency update decisions to a single point:
sugov_should_update_freq(). In addition, it ignores the rate limit
whenever there is an increase of the CPU frequency given by an increase
of the deadline utilization.

Signed-off-by: Claudio Scordino 
---
 kernel/sched/cpufreq_schedutil.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index b0bd77d..e8504f5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -74,7 +74,11 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 / Governor internals ***/
 
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

+static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
+u64 time,
+struct sugov_cpu *sg_cpu_old,
+struct sugov_cpu *sg_cpu_new,
+unsigned int next_freq)
 {
s64 delta_ns;
 
@@ -111,6 +115,14 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

return true;
}
 
+	/*

+* Ignore rate limit when DL asked to increase the CPU frequency,
+* otherwise we may miss some deadline.
+*/
+   if ((next_freq > sg_policy->next_freq) &&
+   (sg_cpu_new->util_dl > sg_cpu_old->util_dl))
+   return true;
+
delta_ns = time - sg_policy->last_freq_update_time;
return delta_ns >= sg_policy->freq_update_delay_ns;
 }
@@ -271,6 +283,7 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
unsigned int flags)
 {
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, 
update_util);
+   struct sugov_cpu sg_cpu_old = *sg_cpu;
struct su

Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates

2018-02-06 Thread Claudio Scordino
gt;flags & SCHED_CPUFREQ_RT)
-   return policy->cpuinfo.max_freq;
  
  		j_max = j_sg_cpu->max;

j_util = sugov_aggregate_util(j_sg_cpu);
@@ -357,17 +356,11 @@ static void sugov_update_shared(struct u
raw_spin_lock(&sg_policy->update_lock);
  
  	sugov_get_util(sg_cpu);

-   sg_cpu->flags = flags;
-
-   sugov_set_iowait_boost(sg_cpu, time);
+   sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
  
  	if (sugov_should_update_freq(sg_policy, time)) {

-   if (flags & SCHED_CPUFREQ_RT)
-   next_f = sg_policy->policy->cpuinfo.max_freq;
-   else
-   next_f = sugov_next_freq_shared(sg_cpu, time);
-
+   next_f = sugov_next_freq_shared(sg_cpu, time);
sugov_update_commit(sg_policy, time, next_f);
}
  
@@ -678,7 +671,6 @@ static int sugov_start(struct cpufreq_po

memset(sg_cpu, 0, sizeof(*sg_cpu));
sg_cpu->cpu = cpu;
sg_cpu->sg_policy = sg_policy;
-   sg_cpu->flags = 0;
sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
}
  
--- a/kernel/sched/deadline.c

+++ b/kernel/sched/deadline.c
@@ -87,7 +87,7 @@ void __add_running_bw(u64 dl_bw, struct
SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-   cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
+   cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
  }
  
  static inline

@@ -101,7 +101,7 @@ void __sub_running_bw(u64 dl_bw, struct
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;
/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-   cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
+   cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
  }
  
  static inline

--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq
if (unlikely((s64)delta_exec <= 0))
return;
  
-	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */

-   cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
-
schedstat_set(curr->se.statistics.exec_max,
  max(curr->se.statistics.exec_max, delta_exec));
  
@@ -1003,6 +1000,9 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
  
  	sub_nr_running(rq, rt_rq->rt_nr_running);

rt_rq->rt_queued = 0;
+
+   /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+   cpufreq_update_util(rq, 0);
  }
  
  static void

@@ -1019,6 +1019,9 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq)
  
  	add_nr_running(rq, rt_rq->rt_nr_running);

rt_rq->rt_queued = 1;
+
+   /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+   cpufreq_update_util(rq, 0);
  }
  
  #if defined CONFIG_SMP




What is the status of this patch ? I couldn't find it on the tip/queue 
repositories.

BTW, I wonder if we actually want to remove also the information about the 
scheduling class who triggered the frequency change.
This prevents us from adopting class-specific behaviors.
For example, we might want to skip the rate limits when deadline asks for an 
increase of frequency, as shown in the patch below.
In this case, we could just remove the flags from sugov_cpu, but leave the 
defines and the argument for sugov_update_*()

Best regards,

Claudio



From ed13fa5a8f93a43f8ff8f7d354b18c0031df482c Mon Sep 17 00:00:00 2001
From: Claudio Scordino 
Date: Wed, 27 Sep 2017 17:16:36 +0200
Subject: [PATCH RFC] cpufreq: schedutil: rate limits for SCHED_DEADLINE

When the SCHED_DEADLINE scheduling class asks to increase CPU frequency,
we should not wait the rate limit, otherwise we may miss some deadline.
The patch just ignores the limit whenever SCHED_DEADLINE asks for a
higher CPU frequency.

Signed-off-by: Claudio Scordino 
---
 kernel/sched/cpufreq_schedutil.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index dd062a1..5027ab1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -75,7 +75,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 / Governor internals ***/
 
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

+static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,
+unsigned int next_freq, unsigned int flags)
 {
s64 delta_ns;
 
@@ -112,6 +113,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

return true;
}
 
+

Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates

2018-01-02 Thread Claudio Scordino

Hi Peter,

Il 31/12/2017 10:43, Claudio Scordino ha scritto:

Hi all,

Il 20/12/2017 16:56, Peter Zijlstra ha scritto:

On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:


So I ended up with the below (on top of Juri's cpufreq-dl patches).

It compiles, but that's about all the testing it had.


Should all be available at:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

compile tested only, I suppose the 0day bot will let me know soon enough
:-)


I'm going to do some testing from Tuesday.


The rebase looks good to me.

Tested on Odroid XU4 (ARM big.LITTLE), no regressions found.

Many thanks and best regards,

  Claudio


Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates

2017-12-31 Thread Claudio Scordino

Hi all,

Il 20/12/2017 16:56, Peter Zijlstra ha scritto:

On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:


So I ended up with the below (on top of Juri's cpufreq-dl patches).

It compiles, but that's about all the testing it had.


Should all be available at:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

compile tested only, I suppose the 0day bot will let me know soon enough
:-)


I'm going to do some testing from Tuesday.

Happy new year,

   Claudio


[PATCH v3] sched/deadline: runtime overrun signal

2017-12-12 Thread Claudio Scordino
From: Juri Lelli 

This patch adds the possibility of getting the delivery of a SIGXCPU
signal whenever there is a runtime overrun. The request is done through
the sched_flags field within the sched_attr structure.

Forward port of https://lkml.org/lkml/2009/10/16/170

Signed-off-by: Juri Lelli 
Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Tested-by: Mathieu Poirier 
Cc: Tommaso Cucinotta 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Thomas Gleixner 
CC: LKML 
CC: linux-rt-users 
---
Changes from v2:

 - do not use a dubious signed bitfield for the dl_overrun flag

Changes from v1:

 - fix: do not send the signal in case of yield (reported by Mathieu Poirier)
 - removed the deadline miss signal because not needed in most of the cases
---
 include/linux/sched.h  |  4 
 include/uapi/linux/sched.h |  1 +
 kernel/sched/core.c|  3 ++-
 kernel/sched/deadline.c|  7 +++
 kernel/time/posix-cpu-timers.c | 20 
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 21991d6..a62a2f1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -472,11 +472,15 @@ struct sched_dl_entity {
 * has not been executed yet. This flag is useful to avoid race
 * conditions between the inactive timer handler and the wakeup
 * code.
+*
+* @dl_overrun tells if the task asked to be informed about runtime
+* overruns.
 */
unsigned intdl_throttled  : 1;
unsigned intdl_boosted: 1;
unsigned intdl_yielded: 1;
unsigned intdl_non_contending : 1;
+   unsigned intdl_overrun: 1;
 
/*
 * Bandwidth enforcement timer. Each -deadline task has its
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 30a9e51..beb8ecd 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -49,5 +49,6 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 #define SCHED_FLAG_RECLAIM 0x02
+#define SCHED_FLAG_DL_OVERRUN  0x04
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75554f3..5bb67af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4041,7 +4041,8 @@ static int __sched_setscheduler(struct task_struct *p,
}
 
if (attr->sched_flags &
-   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM |
+   SCHED_FLAG_DL_OVERRUN))
return -EINVAL;
 
/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 2473736..72554d1 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1155,6 +1155,12 @@ static void update_curr_dl(struct rq *rq)
 throttle:
if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
dl_se->dl_throttled = 1;
+
+   /* If requested, inform the user about runtime overruns. */
+   if (dl_runtime_exceeded(dl_se) &&
+   (dl_se->flags & SCHED_FLAG_DL_OVERRUN))
+   dl_se->dl_overrun = 1;
+
__dequeue_task_dl(rq, curr, 0);
if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
@@ -2566,6 +2572,7 @@ void __dl_clear_params(struct task_struct *p)
dl_se->dl_throttled = 0;
dl_se->dl_yielded = 0;
dl_se->dl_non_contending = 0;
+   dl_se->dl_overrun = 0;
 }
 
 bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 1f27887a..48281d9 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "posix-timers.h"
 
@@ -791,6 +792,16 @@ check_timers_list(struct list_head *timers,
return 0;
 }
 
+static inline void check_dl_overrun(struct task_struct *tsk)
+{
+   if (tsk->dl.dl_overrun) {
+   tsk->dl.dl_overrun = 0;
+   pr_info("runtime overrun: %s[%d]\n",
+   tsk->comm, task_pid_nr(tsk));
+   __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+   }
+}
+
 /*
  * Check for any per-thread CPU timers that have fired and move them off
  * the tsk->cpu_timers[N] list onto the firing list.  Here we update the
@@ -804,6 +815,9 @@ static void check_thread_timers(struct task_struct *tsk,
u64 expires;
unsigned long soft;
 
+   if (dl_task(tsk))
+   check_dl_overrun(tsk);
+
/*
 * If cputime_expires is zero, then there a

[tip:sched/urgent] sched/deadline: Fix the description of runtime accounting in the documentation

2017-11-16 Thread tip-bot for Claudio Scordino
Commit-ID:  5c0342ca7ef17220d8dd2da68d0d349c26ab19df
Gitweb: https://git.kernel.org/tip/5c0342ca7ef17220d8dd2da68d0d349c26ab19df
Author: Claudio Scordino 
AuthorDate: Tue, 14 Nov 2017 12:19:26 +0100
Committer:  Ingo Molnar 
CommitDate: Thu, 16 Nov 2017 09:00:35 +0100

sched/deadline: Fix the description of runtime accounting in the documentation

Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Acked-by: Daniel Bristot de Oliveira 
Acked-by: Peter Zijlstra 
Cc: Jonathan Corbet 
Cc: Linus Torvalds 
Cc: Mathieu Poirier 
Cc: Thomas Gleixner 
Cc: Tommaso Cucinotta 
Cc: linux-...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1510658366-28995-1-git-send-email-clau...@evidence.eu.com
Signed-off-by: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index e89e36e..8ce78f8 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -204,10 +204,17 @@ CONTENTS
  It does so by decrementing the runtime of the executing task Ti at a pace 
equal
  to
 
-   dq = -max{ Ui, (1 - Uinact) } dt
+   dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
 
- where Uinact is the inactive utilization, computed as (this_bq - running_bw),
- and Ui is the bandwidth of task Ti.
+ where:
+
+  - Ui is the bandwidth of task Ti;
+  - Umax is the maximum reclaimable utilization (subjected to RT throttling
+limits);
+  - Uinact is the (per runqueue) inactive utilization, computed as
+(this_bq - running_bw);
+  - Uextra is the (per runqueue) extra reclaimable utilization
+(subjected to RT throttling limits).
 
 
  Let's now see a trivial example of two deadline tasks with runtime equal


[PATCH v2] sched/deadline: runtime overrun signal

2017-11-16 Thread Claudio Scordino
From: Juri Lelli 

This patch adds the possibility of getting the delivery of a SIGXCPU
signal whenever there is a runtime overrun. The request is done through
the sched_flags field within the sched_attr structure.

Forward port of https://lkml.org/lkml/2009/10/16/170

Signed-off-by: Juri Lelli 
Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Tested-by: Mathieu Poirier 
Cc: Tommaso Cucinotta 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Thomas Gleixner 
---
Changes from v1:

 - fix: do not send the signal in case of yield (reported by Mathieu)
 - removed the deadline miss signal because not needed in most of the cases
---
 include/linux/sched.h  |  4 
 include/uapi/linux/sched.h |  1 +
 kernel/sched/core.c|  3 ++-
 kernel/sched/deadline.c|  7 +++
 kernel/time/posix-cpu-timers.c | 20 
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a5dc7c9..2ec6014 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -472,11 +472,15 @@ struct sched_dl_entity {
 * has not been executed yet. This flag is useful to avoid race
 * conditions between the inactive timer handler and the wakeup
 * code.
+*
+* @dl_overrun tells if the task asked to be informed about runtime
+* overruns.
 */
int dl_throttled  : 1;
int dl_boosted: 1;
int dl_yielded: 1;
int dl_non_contending : 1;
+   int dl_overrun: 1;
 
/*
 * Bandwidth enforcement timer. Each -deadline task has its
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 30a9e51..beb8ecd 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -49,5 +49,6 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 #define SCHED_FLAG_RECLAIM 0x02
+#define SCHED_FLAG_DL_OVERRUN  0x04
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b82a00..5bd2498f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4040,7 +4040,8 @@ static int __sched_setscheduler(struct task_struct *p,
}
 
if (attr->sched_flags &
-   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM |
+   SCHED_FLAG_DL_OVERRUN))
return -EINVAL;
 
/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f349f7e..304087c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1155,6 +1155,12 @@ static void update_curr_dl(struct rq *rq)
 throttle:
if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
dl_se->dl_throttled = 1;
+
+   /* If requested, inform the user about runtime overruns. */
+   if (dl_runtime_exceeded(dl_se) &&
+   (dl_se->flags & SCHED_FLAG_DL_OVERRUN))
+   dl_se->dl_overrun = 1;
+
__dequeue_task_dl(rq, curr, 0);
if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
@@ -2566,6 +2572,7 @@ void __dl_clear_params(struct task_struct *p)
dl_se->dl_throttled = 0;
dl_se->dl_yielded = 0;
dl_se->dl_non_contending = 0;
+   dl_se->dl_overrun = 0;
 }
 
 bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 1f27887..48281d9 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "posix-timers.h"
 
@@ -791,6 +792,16 @@ check_timers_list(struct list_head *timers,
return 0;
 }
 
+static inline void check_dl_overrun(struct task_struct *tsk)
+{
+   if (tsk->dl.dl_overrun) {
+   tsk->dl.dl_overrun = 0;
+   pr_info("runtime overrun: %s[%d]\n",
+   tsk->comm, task_pid_nr(tsk));
+   __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+   }
+}
+
 /*
  * Check for any per-thread CPU timers that have fired and move them off
  * the tsk->cpu_timers[N] list onto the firing list.  Here we update the
@@ -804,6 +815,9 @@ static void check_thread_timers(struct task_struct *tsk,
u64 expires;
unsigned long soft;
 
+   if (dl_task(tsk))
+   check_dl_overrun(tsk);
+
/*
 * If cputime_expires is zero, then there are no active
 * per thread CPU timers.
@@ -906,6 +920,9 @@ static void check_process_timers(struct task_struct *tsk,

[PATCH v2] sched/deadline: fix runtime accounting in documentation

2017-11-14 Thread Claudio Scordino
Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Acked-by: Daniel Bristot de Oliveira 
CC: Jonathan Corbet 
CC: "Peter Zijlstra (Intel)" 
CC: Ingo Molnar 
CC: linux-...@vger.kernel.org
Cc: Tommaso Cucinotta 
Cc: Mathieu Poirier 
---
 Documentation/scheduler/sched-deadline.txt | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index e89e36e..8ce78f8 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -204,10 +204,17 @@ CONTENTS
  It does so by decrementing the runtime of the executing task Ti at a pace 
equal
  to
 
-   dq = -max{ Ui, (1 - Uinact) } dt
+   dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
 
- where Uinact is the inactive utilization, computed as (this_bq - running_bw),
- and Ui is the bandwidth of task Ti.
+ where:
+
+  - Ui is the bandwidth of task Ti;
+  - Umax is the maximum reclaimable utilization (subjected to RT throttling
+limits);
+  - Uinact is the (per runqueue) inactive utilization, computed as
+(this_bq - running_bw);
+  - Uextra is the (per runqueue) extra reclaimable utilization
+(subjected to RT throttling limits).
 
 
  Let's now see a trivial example of two deadline tasks with runtime equal
-- 
2.7.4



[PATCH] sched/deadline: runtime overrun and deadline miss signals

2017-10-31 Thread Claudio Scordino
From: Juri Lelli 

This patch adds the possibility to get the delivery of two signals
whenever there is a runtime overrun or a deadline miss, respectively.
The request is done through the sched_flags field within the sched_attr
structure.

Forward port of https://lkml.org/lkml/2009/10/16/170

Signed-off-by: Juri Lelli 
Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Cc: Tommaso Cucinotta 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Thomas Gleixner 
Cc: Mathieu Poirier 
---
 include/linux/sched.h  |  8 
 include/uapi/linux/sched.h |  2 ++
 kernel/sched/core.c|  3 ++-
 kernel/sched/deadline.c| 13 +
 kernel/time/posix-cpu-timers.c | 26 ++
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0f897df..285d1b4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -473,11 +473,19 @@ struct sched_dl_entity {
 * has not been executed yet. This flag is useful to avoid race
 * conditions between the inactive timer handler and the wakeup
 * code.
+*
+* @dl_overrun tells if the task asked to be informed about budget
+* overruns.
+*
+* @dl_dmiss tells if the task asked to be informed about deadline
+* misses.
 */
int dl_throttled  : 1;
int dl_boosted: 1;
int dl_yielded: 1;
int dl_non_contending : 1;
+   int dl_overrun: 1;
+   int dl_dmiss  : 1;
 
/*
 * Bandwidth enforcement timer. Each -deadline task has its
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index e2a6c7b..544be0c 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -48,5 +48,7 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 #define SCHED_FLAG_RECLAIM 0x02
+#define SCHED_FLAG_DL_OVERRUN  0x04
+#define SCHED_FLAG_DL_DMISS0x08
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 97227df..d79df7a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4041,7 +4041,8 @@ static int __sched_setscheduler(struct task_struct *p,
}
 
if (attr->sched_flags &
-   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM |
+   SCHED_FLAG_DL_OVERRUN | SCHED_FLAG_DL_DMISS))
return -EINVAL;
 
/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 8d1b946..8c1aa61 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1154,6 +1154,17 @@ static void update_curr_dl(struct rq *rq)
 throttle:
if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
dl_se->dl_throttled = 1;
+
+   /*
+* If requested, inform the user about deadline misses and/or
+* runtime overruns.
+*/
+   if (unlikely(dl_se->flags & SCHED_FLAG_DL_DMISS &&
+dl_time_before(dl_se->deadline, rq_clock(rq
+   dl_se->dl_dmiss = 1;
+   if (dl_se->flags & SCHED_FLAG_DL_OVERRUN)
+   dl_se->dl_overrun = 1;
+
__dequeue_task_dl(rq, curr, 0);
if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
@@ -2565,6 +2576,8 @@ void __dl_clear_params(struct task_struct *p)
dl_se->dl_throttled = 0;
dl_se->dl_yielded = 0;
dl_se->dl_non_contending = 0;
+   dl_se->dl_overrun = 0;
+   dl_se->dl_dmiss = 0;
 }
 
 bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 8585ad6..f3616c5 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "posix-timers.h"
 
@@ -790,6 +791,22 @@ check_timers_list(struct list_head *timers,
return 0;
 }
 
+static inline void check_dl_overrun(struct task_struct *tsk)
+{
+   if (tsk->dl.dl_overrun) {
+   tsk->dl.dl_overrun = 0;
+   pr_info("runtime overrun: %s[%d]\n",
+   tsk->comm, task_pid_nr(tsk));
+   __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+   }
+   if (tsk->dl.dl_dmiss) {
+   tsk->dl.dl_dmiss = 0;
+   pr_info("scheduling deadline miss: %s[%d]\n",
+   tsk->comm, task_

[PATCH] sched/deadline: fix runtime accounting in documentation

2017-10-13 Thread Claudio Scordino
Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
---
 Documentation/scheduler/sched-deadline.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index e89e36e..79f40c6 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -204,10 +204,16 @@ CONTENTS
  It does so by decrementing the runtime of the executing task Ti at a pace 
equal
  to
 
-   dq = -max{ Ui, (1 - Uinact) } dt
+   dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
 
- where Uinact is the inactive utilization, computed as (this_bq - running_bw),
- and Ui is the bandwidth of task Ti.
+ where:
+
+  - Ui is the bandwidth of task Ti;
+  - Umax is the maximum reclaimable utilization (subject to RT limits);
+  - Uinact is the (per runqueue) inactive utilization, computed as
+(this_bq - running_bw);
+  - Uextra is the (per runqueue) extra reclaimable utilization
+(subject to RT limits).
 
 
  Let's now see a trivial example of two deadline tasks with runtime equal
-- 
2.7.4



Re: [PATCH] sched/deadline: add GRUB bw change tracepoints

2017-07-04 Thread Claudio Scordino
Ouch, you'right.

Now that I recall, you provided the first draft for internal debugging
but it was Luca who agreed to add it to the patchset. Sorry for the
mistake guys.

Best,

  Claudio

2017-07-04 15:12 GMT+02:00 Claudio Scordino :
> Ouch, you'right.
>
> Now that I recall, you provided the first draft for internal debugging but
> it was Luca who agreed to add it to the patchset. Sorry for the mistake
> guys.
>
> Best,
>
>   Claudio
>
>
> Il 04/lug/2017 14:49, "Juri Lelli"  ha scritto:
>
> Hi Claudio,
>
> On 04/07/17 14:29, Claudio Scordino wrote:
>> This patch adds a tracepoint for tracing the total and running
>> bandwidths of GRUB's runqueue.
>>
>> Signed-off-by: Claudio Scordino 
>> Signed-off-by: Juri Lelli 
>
> Don't remember signing this off.
> Or if I did it was maybe for internal debugging?
>
> Best,
>
> - Juri
>
>



-- 
Claudio Scordino, Ph.D.
Project Manager - Funded research projects

Evidence Srl
Via Carducci 56
56010 S.Giuliano Terme - Pisa - Italy
Phone:  +39 050 99 11 122
Mobile: + 39 393 811 7491
Fax:   +39 050 99 10 812
http://www.evidence.eu.com


[PATCH] sched/deadline: add GRUB bw change tracepoints

2017-07-04 Thread Claudio Scordino
This patch adds a tracepoint for tracing the total and running
bandwidths of GRUB's runqueue.

Signed-off-by: Claudio Scordino 
Signed-off-by: Juri Lelli 
Signed-off-by: Luca Abeni 
---
 include/trace/events/sched.h | 32 
 kernel/sched/deadline.c  | 11 +++
 2 files changed, 43 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index ae1409f..050fcb2 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -545,6 +545,38 @@ TRACE_EVENT(sched_swap_numa,
__entry->dst_cpu, __entry->dst_nid)
 );
 
+DECLARE_EVENT_CLASS(sched_dl_grub_template,
+
+   TP_PROTO(u64 this_bw, u64 running_bw, unsigned int cpu_id),
+
+   TP_ARGS(this_bw, running_bw, cpu_id),
+
+   TP_STRUCT__entry(
+   __field(u64,this_bw)
+   __field(u64,running_bw)
+   __field(u32,cpu_id)
+   ),
+
+   TP_fast_assign(
+   __entry->this_bw = this_bw;
+   __entry->running_bw = running_bw;
+   __entry->cpu_id = cpu_id;
+   ),
+
+   TP_printk("total_bw=%llu running_bw=%llu cpu_id=%lu",
+   (unsigned long long)__entry->this_bw,
+   (unsigned long long)__entry->running_bw,
+   (unsigned long)__entry->cpu_id)
+);
+
+
+DEFINE_EVENT(sched_dl_grub_template, sched_dl_grub,
+
+   TP_PROTO(u64 this_bw, u64 running_bw, unsigned int cpu_id),
+
+   TP_ARGS(this_bw, running_bw, cpu_id)
+);
+
 /*
  * Tracepoint for waking a polling cpu without an IPI.
  */
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a84299f..ae5c7ef 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -17,6 +17,7 @@
 #include "sched.h"
 
 #include 
+#include 
 #include 
 
 struct dl_bandwidth def_dl_bandwidth;
@@ -85,6 +86,8 @@ void add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
dl_rq->running_bw += dl_bw;
SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+   trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw,
+   rq_of_dl_rq(dl_rq)->cpu);
 }
 
 static inline
@@ -97,6 +100,8 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;
+   trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw,
+   rq_of_dl_rq(dl_rq)->cpu);
 }
 
 static inline
@@ -107,6 +112,9 @@ void add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
dl_rq->this_bw += dl_bw;
SCHED_WARN_ON(dl_rq->this_bw < old); /* overflow */
+   trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw,
+   rq_of_dl_rq(dl_rq)->cpu);
+
 }
 
 static inline
@@ -120,6 +128,9 @@ void sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
if (dl_rq->this_bw > old)
dl_rq->this_bw = 0;
SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+   trace_sched_dl_grub(dl_rq->this_bw, dl_rq->running_bw,
+   rq_of_dl_rq(dl_rq)->cpu);
+
 }
 
 void dl_change_utilization(struct task_struct *p, u64 new_bw)
-- 
2.7.4



[tip:sched/core] sched/deadline: Add documentation about GRUB reclaiming

2017-06-08 Thread tip-bot for Claudio Scordino
Commit-ID:  ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7
Gitweb: http://git.kernel.org/tip/ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7
Author: Claudio Scordino 
AuthorDate: Thu, 18 May 2017 22:13:37 +0200
Committer:  Ingo Molnar 
CommitDate: Thu, 8 Jun 2017 10:31:56 +0200

sched/deadline: Add documentation about GRUB reclaiming

This patch adds the documentation about the GRUB reclaiming algorithm,
adding a few details discussed in list.

Signed-off-by: Claudio Scordino 
Signed-off-by: Luca Abeni 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Daniel Bristot de Oliveira 
Cc: Joel Fernandes 
Cc: Juri Lelli 
Cc: Linus Torvalds 
Cc: Mathieu Poirier 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Tommaso Cucinotta 
Link: 
http://lkml.kernel.org/r/1495138417-6203-11-git-send-email-luca.ab...@santannapisa.it
Signed-off-by: Ingo Molnar 
---
 Documentation/scheduler/sched-deadline.txt | 168 +
 1 file changed, 168 insertions(+)

diff --git a/Documentation/scheduler/sched-deadline.txt 
b/Documentation/scheduler/sched-deadline.txt
index cbc1b46..e89e36e 100644
--- a/Documentation/scheduler/sched-deadline.txt
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -7,6 +7,8 @@ CONTENTS
  0. WARNING
  1. Overview
  2. Scheduling algorithm
+   2.1 Main algorithm
+   2.2 Bandwidth reclaiming
  3. Scheduling Real-Time Tasks
3.1 Definitions
3.2 Schedulability Analysis for Uniprocessor Systems
@@ -44,6 +46,9 @@ CONTENTS
 2. Scheduling algorithm
 ==
 
+2.1 Main algorithm
+--
+
  SCHED_DEADLINE uses three parameters, named "runtime", "period", and
  "deadline", to schedule tasks. A SCHED_DEADLINE task should receive
  "runtime" microseconds of execution time every "period" microseconds, and
@@ -113,6 +118,160 @@ CONTENTS
  remaining runtime = remaining runtime + runtime
 
 
+2.2 Bandwidth reclaiming
+
+
+ Bandwidth reclaiming for deadline tasks is based on the GRUB (Greedy
+ Reclamation of Unused Bandwidth) algorithm [15, 16, 17] and it is enabled
+ when flag SCHED_FLAG_RECLAIM is set.
+
+ The following diagram illustrates the state names for tasks handled by GRUB:
+
+ 
+ (d)|   Active   |
+  ->||
+  | | Contending |
+  |  
+  |A  |
+  --   |  |
+ |  |  |  |
+ | Inactive |  |(b)   | (a)
+ |  |  |  |
+  --   |  |
+  A|  V
+  |  
+  | |   Active   |
+  --| Non|
+ (c)| Contending |
+ 
+
+ A task can be in one of the following states:
+
+  - ActiveContending: if it is ready for execution (or executing);
+
+  - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
+time;
+
+  - Inactive: if it is blocked and has surpassed the 0-lag time.
+
+ State transitions:
+
+  (a) When a task blocks, it does not become immediately inactive since its
+  bandwidth cannot be immediately reclaimed without breaking the
+  real-time guarantees. It therefore enters a transitional state called
+  ActiveNonContending. The scheduler arms the "inactive timer" to fire at
+  the 0-lag time, when the task's bandwidth can be reclaimed without
+  breaking the real-time guarantees.
+
+  The 0-lag time for a task entering the ActiveNonContending state is
+  computed as
+
+(runtime * dl_period)
+ deadline - -
+ dl_runtime
+
+  where runtime is the remaining runtime, while dl_runtime and dl_period
+  are the reservation parameters.
+
+  (b) If the task wakes up before the inactive timer fires, the task re-enters
+  the ActiveContending state and the "inactive timer" is canceled.
+  In addition, if the task wakes up on a different runqueue, then
+  the task's utilization must be removed from the previous runqueue's 
active
+  utilization and must be added to the new runqueue's active utilization.
+  In order to avoid races between a task waking up on a runqueue while the
+   "inactive timer" is running on a different CPU, the "dl_non_contending"
+  flag is used to indicate that a task is not on a runqueue but is active
+  (so, the flag is set when the task blocks and is cleared when the
+  "inactive timer" fires or when the task  wakes up).
+
+  (c) When the "inactive timer" fires, the task enters the Inactive state

Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

2017-03-27 Thread Claudio Scordino
Hi guys,

2017-03-27 10:20 GMT+02:00 Luca Abeni :
> On Fri, 24 Mar 2017 22:31:46 -0400
> Steven Rostedt  wrote:
>
>> On Fri, 24 Mar 2017 22:47:15 +0100
>> luca abeni  wrote:
>>
>> > Ok... Since I am not good at ascii art, would it be ok to add a
>> > textual description? If yes, I'll add a comment like:
>> > "
>> > The utilization of a task is added to the runqueue's active
>> > utilization when the task becomes active (is enqueued in the
>> > runqueue), and is removed when the task becomes inactive. A task
>> > does not become immediately inactive when it blocks, but becomes
>> > inactive at the so called "0 lag time"; so, we setup the "inactive
>> > timer" to fire at the "0 lag time". When the "inactive timer"
>> > fires, the task utilization is removed from the runqueue's active
>> > utilization. If the task wakes up again on the same runqueue before
>> > the "0 lag time", the active utilization must not be changed and
>> > the "inactive timer" must be cancelled. If the task wakes up again
>> > on a different runqueue before the "0 lag time", then the task's
>> > utilization must be removed from the previous runqueue's active
>> > utilization and must be added to the new runqueue's active
>> > utilization. In order to avoid races between a task waking up on a
>> > runqueue while the "inactive timer" is running on a different CPU,
>> > the "dl_non_contending" flag is used to indicate that a task is not
>> > on a runqueue but is active (so, the flag is set when the task
>> > blocks and is cleared when the "inactive timer" fires or when the
>> > task  wakes up).
>>
>> Sure, the above is great if you never want anyone to read it ;)
>>
>> Can you please break it up a little. My head starts to spin by the
>> third line down.
>
> Ok... Maybe finding a clean and understandable way to explain the
> above sentence is something that can be done at the OSPM summit?

What about adding the following text to Documentation/ ?
Does it make things clearer ?

Cheers,

 Claudio




The following figure illustrates the state names of the GRUB algorithm:

  
  (d)|   Active   |
   ->||
   | | Contending |
   |  
   |A  |
   --   |  |
  |  |  |  |
  | Inactive |  |(b)   | (a)
  |  |  |  |
   --   |  |
   A|  V
   |  
   | |   Active   |
   --| Non|
  (c)| Contending |
  

A task can be in one of the following states:

 - ActiveContending: if it is ready for execution (or executing);

 - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
   time;

 - Inactive: if it is blocked and has surpassed the 0-lag time.


For each runqueue, the algorithm GRUB keeps track of two different bandwidths:

 - Active bandwidth (running_bw): this is the sum of the bandwidths of all
   tasks in active state (i.e., ActiveContending or ActiveNonContending);

 - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
   runqueue, including the tasks in Inactive state.


State transitions:

 (a) When a task blocks, it does not become immediately inactive since its
 bandwidth cannot be immediately reclaimed without breaking the
 real-time guarantees. It therefore enters a transitional state called
 ActiveNonContending. The scheduler arms the "inactive timer" to fire at
 the 0-lag time, when the task's bandwidth can be reclaimed without
 breaking the real-time guarantees.

 (b) If the task wakes up before the inactive timer fires, the task re-enters
 the ActiveContending state and the "inactive timer" is canceled.
 In addition, if the task wakes up on a different runqueue, then
 the task's utilization must be removed from the previous runqueue's active
 utilization and must be added to the new runqueue's active utilization.
 In order to avoid races between a task waking up on a runqueue while the
  "inactive timer" is running on a different CPU, the "dl_non_contending"
 flag is used to indicate that a task is not on a runqueue but is active
 (so, the flag is set when the task blocks and is cleared when the
 "inactive timer" fires or when the task  wakes up).

 (c) When the "inactive timer" fires, the task enters the Inactive state and its
 utilization is removed from the runqueue's active utilization.

 (d) When an inactive task wakes up, it enters the ActiveContending state and
 its utilization is added to the active utilization of the runqueue where
 it has been enqueued.


The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to

Re: sched_{set,get}attr() manpage

2014-04-10 Thread Claudio Scordino

Il 10/04/2014 09:47, Juri Lelli ha scritto:

Hi all,

On Wed, 9 Apr 2014 17:42:04 +0200
Peter Zijlstra  wrote:


On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote:

The following "real-time" policies are also supported, for

why the "'s?

I borrowed those from SCHED_SETSCHEDULER(2).


sched_attr::sched_flags additional flags that can influence
scheduling behaviour. Currently as per Linux kernel 3.14:

SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
on fork().

is the only supported flag.

...


The flags argument should be 0.

What about SCHED_FLAG_RESET_ON_FOR?

Different flags. The one is sched_attr::flags the other is
sched_setattr(.flags).


The other sched_attr fields are filled out as described in
sched_setattr().

Scheduling Policies
The  scheduler  is  the  kernel  component  that decides which runnable
process will be executed by the CPU next.  Each process has an  associ‐
ated  scheduling  policy and a static scheduling priority, sched_prior‐
ity; these are the settings that are modified by  sched_setscheduler().
The  scheduler  makes it decisions based on knowledge of the scheduling
policy and static priority of all processes on the system.

Isn't this last sentence redundant/sliglhtly repetitive?

I borrowed that from SCHED_SETSCHEDULER(2) again.


 SCHED_DEADLINE: Sporadic task model deadline scheduling
SCHED_DEADLINE is an implementation of GEDF (Global Earliest
Deadline First) with additional CBS (Constant Bandwidth Server).
The CBS guarantees that tasks that over-run their specified
budget are throttled and do not affect the correct performance
of other SCHED_DEADLINE tasks.

SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN

Setting SCHED_DEADLINE can fail with -EINVAL when admission
control tests fail.

Perhaps add a note about the deadline-class having higher priority than the
other classes; i.e. if a deadline-task is runnable, it will preempt any
other SCHED_(RR|FIFO) regardless of priority?

Yes, good point, will do.


SCHED_FIFO: First In-First Out scheduling
SCHED_FIFO can only be used with static priorities higher than 0, which
means that when a SCHED_FIFO processes becomes runnable, it will always
immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
out time slicing.  For processes scheduled under the SCHED_FIFO policy,
the following rules apply:

*  A  SCHED_FIFO  process that has been preempted by another process of
   higher priority will stay at the head of the list for  its  priority
   and  will resume execution as soon as all processes of higher prior‐
   ity are blocked again.

*  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
   the end of the list for its priority.

*  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
   SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
   the  list  if it was runnable.  As a consequence, it may preempt the
   currently  running  process   if   it   has   the   same   priority.
   (POSIX.1-2001 specifies that the process should go to the end of the
   list.)

*  A process calling sched_yield(2) will be put at the end of the list.

How about the recent discussion regarding sched_yield(). Is this correct?

lkml.kernel.org/r/alpine.deb.2.02.1403312333100.14...@ionos.tec.linutronix.de

Is this the correct place to add a note explaining te potentional pitfalls
using sched_yield?

I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that
nonsense.

Also; I realized I have not described the DEADLINE sched_yield()
behaviour.


So, for SCHED_DEADLINE we currently have this behaviour:

/*
  * Yield task semantic for -deadline tasks is:
  *
  *   get off from the CPU until our next instance, with
  *   a new runtime. This is of little use now, since we
  *   don't have a bandwidth reclaiming mechanism. Anyway,
  *   bandwidth reclaiming is planned for the future, and
  *   yield_task_dl will indicate that some spare budget
  *   is available for other task instances to use it.
  */

But, considering also the discussion above, I'm less sure now that's
what we want. Still, I think we will want some way in the future to be
able to say "I'm finished with my current job, give this remaining
runtime to someone else", like another syscall or something.


Hi Juri, hi Peter,

my two cents:

A syscall to block the task until its next instance is definitely useful.
This way, a periodic task doesn't have to sleep anymore: the kernel 
takes care of unblocki

[PATCH RESEND] umc-bus.c: fix usage of device_trylock

2012-10-09 Thread Claudio Scordino

Hi all.

I've not received any feedback about this patch, so I'm resending just
to check if someone is going to take care of it.

The patch fixes the usage of device_trylock inside the driver umc-bus.c:
device_trylock has the same semantics of mutex_trylock, so it returns 1
if the lock has been acquired successfully.

Best regards,

Claudio


Subject: umc-bus.c: fix usage of device_trylock
From: Claudio Scordino 

Fix usage of device_trylock. It has the same semantics of mutex_trylock, so it
returns 1 if the lock has been acquired successfully.

Signed-off-by: Claudio Scordino 
Signed-off-by: Bruno Morelli 
---
 drivers/uwb/umc-bus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/uwb/umc-bus.c b/drivers/uwb/umc-bus.c
index 82a84d5..5c5b3fc 100644
--- a/drivers/uwb/umc-bus.c
+++ b/drivers/uwb/umc-bus.c
@@ -63,7 +63,7 @@ int umc_controller_reset(struct umc_dev *umc)
struct device *parent = umc->dev.parent;
int ret = 0;
 
-	if (device_trylock(parent))

+   if (!device_trylock(parent))
return -EAGAIN;
ret = device_for_each_child(parent, parent, umc_bus_pre_reset_helper);
if (ret >= 0)
--
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] umc-bus.c: fix usage of device_trylock

2012-10-03 Thread Claudio Scordino

Hi all.

I've probably found a wrong usage of device_trylock inside the driver
umc-bus.c: device_trylock has the same semantics of mutex_trylock, so it
returns 1 if the lock has been acquired successfully.

Please, find below a patch.

Best regards,

Claudio


Subject: umc-bus.c: fix usage of device_trylock
From: Claudio Scordino 

device_trylock has the same semantics of mutex_trylock, so it returns 1
if the lock has been acquired successfully.

Signed-off-by: Claudio Scordino 
---
 drivers/uwb/umc-bus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/uwb/umc-bus.c b/drivers/uwb/umc-bus.c
index 82a84d5..5c5b3fc 100644
--- a/drivers/uwb/umc-bus.c
+++ b/drivers/uwb/umc-bus.c
@@ -63,7 +63,7 @@ int umc_controller_reset(struct umc_dev *umc)
struct device *parent = umc->dev.parent;
int ret = 0;
 
-	if (device_trylock(parent))

+   if (!device_trylock(parent))
return -EAGAIN;
ret = device_for_each_child(parent, parent, umc_bus_pre_reset_helper);
if (ret >= 0)
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun

2012-07-30 Thread Claudio Scordino

Il 20/07/2012 17:15, Greg KH ha scritto:

On Fri, Jul 20, 2012 at 12:26:10PM +0200, Claudio Scordino wrote:

Il 20/07/2012 00:58, Greg KH ha scritto:

On Wed, Jul 18, 2012 at 10:53:09AM +0200, Claudio Scordino wrote:

Il 06/07/2012 19:41, Greg KH ha scritto:

On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote:

Hi Olav,

please find below a patch for the isp1362-hcd.c driver to always
save the message in case of underrun. More information is provided
inside the patch comment. Let us know if you need any further
information.

Best regards,

Claudio


Subject: isp1362-hcd.c: usb message always saved in case of underrun
From: Bruno Morelli

The usb message must be saved also in case the USB endpoint is not a
control endpoint (i.e., "endpoint 0"), otherwise in some circumstances
we don't have a payload in case of error.

The patch has been created by tracing with usbmon the different error
messages generated by this driver with respect to the ehci-hcd driver.

Signed-off-by: Bruno Morelli
Signed-off-by: Claudio Scordino
Tested-by: Bruno Morelli
---
  drivers/usb/host/isp1362-hcd.c |   11 ++-
  1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c
index 2ed112d..61bf1b2 100644
--- a/drivers/usb/host/isp1362-hcd.c
+++ b/drivers/usb/host/isp1362-hcd.c
@@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, 
struct isp1362_ep *ep)
usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid,
short_ok ? "" : "not_",
PTD_GET_COUNT(ptd), ep->maxpacket, len);
+   /* save the data underrun error code for later and
+* proceed with the status stage
+*/
+   urb->actual_length += PTD_GET_COUNT(ptd);
+   BUG_ON(urb->actual_length>
+   urb->transfer_buffer_length);


Please NEVER crash the machine in a driver like this, it's bad and can
cause problems.  Yes, I know you are just moving it from the lines
below:


if (usb_pipecontrol(urb->pipe)) {
ep->nextpid = USB_PID_ACK;
-   /* save the data underrun error code for later 
and
-* proceed with the status stage
-*/
-   urb->actual_length += PTD_GET_COUNT(ptd);
-   BUG_ON(urb->actual_length>
urb->transfer_buffer_length);


But really, it should not be in the driver at all.  Please remove it, at
the most, do a WARN_ON() so that someone can see the problem and at
least report it.

Actually, what is this checking?  How can someone recover from it?  Who
is this check for?  The developer of this driver?  Another driver?
Hardware developer?  End user?  Who would be able to fix the problem if
this happens?

As it is, I can't take it, sorry.



Hi Greg.

I understand. As you have already said, this driver is full of BUG_ON
statements.

So, we can shift just the assignment as in the patch below, to have a
correct behavior, leaving the BUG_ON where it already is. Then, we may
propose a different patch to change BUG_ONs to WARN_ONs.


Your updated patch is much better, thanks.

But it doesn't apply to my tree right now.  Can you please refresh it
against the usb-next tree and resend it?


Actually, I did.

So, this means that I'm using the wrong tree...

I'm using the "usb-next" branch available on your tree at

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git

Is this the wrong one ?


That is the correct one.  It didn't work for me, so try refreshing your
patch and resending it.


Hi Greg.

I've just refreshed the patch by cloning the repo from scratch and
creating the patch against the "usb-next" branch.

You can find the patch below, but the content is still the same.

Can you please check it again ? Is it possible that you have local
changes not yet committed  ?

Many thanks,

Claudio


Subject: isp1362-hcd.c: usb message always saved in case of underrun
From: Bruno Morelli 

The usb message must be saved also in case the USB endpoint is not a
control endpoint (i.e., "endpoint 0"), otherwise in some circumstances
we don't have a payload in case of error.

The patch has been created by tracing with usbmon the different error
messages generated by this driver with respect to the ehci-hcd driver.

Signed-off-by: Bruno Morelli 
Signed-off-by: Claudio Scordino 
Tested-by: Bruno Morelli 
---
 drivers/usb/host/isp1362-hcd.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp136

Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun

2012-07-20 Thread Claudio Scordino

Il 20/07/2012 00:58, Greg KH ha scritto:

On Wed, Jul 18, 2012 at 10:53:09AM +0200, Claudio Scordino wrote:

Il 06/07/2012 19:41, Greg KH ha scritto:

On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote:

Hi Olav,

please find below a patch for the isp1362-hcd.c driver to always
save the message in case of underrun. More information is provided
inside the patch comment. Let us know if you need any further
information.

Best regards,

Claudio


Subject: isp1362-hcd.c: usb message always saved in case of underrun
From: Bruno Morelli

The usb message must be saved also in case the USB endpoint is not a
control endpoint (i.e., "endpoint 0"), otherwise in some circumstances
we don't have a payload in case of error.

The patch has been created by tracing with usbmon the different error
messages generated by this driver with respect to the ehci-hcd driver.

Signed-off-by: Bruno Morelli
Signed-off-by: Claudio Scordino
Tested-by: Bruno Morelli
---
  drivers/usb/host/isp1362-hcd.c |   11 ++-
  1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c
index 2ed112d..61bf1b2 100644
--- a/drivers/usb/host/isp1362-hcd.c
+++ b/drivers/usb/host/isp1362-hcd.c
@@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, 
struct isp1362_ep *ep)
usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid,
short_ok ? "" : "not_",
PTD_GET_COUNT(ptd), ep->maxpacket, len);
+   /* save the data underrun error code for later and
+* proceed with the status stage
+*/
+   urb->actual_length += PTD_GET_COUNT(ptd);
+   BUG_ON(urb->actual_length>
+   urb->transfer_buffer_length);


Please NEVER crash the machine in a driver like this, it's bad and can
cause problems.  Yes, I know you are just moving it from the lines
below:


if (usb_pipecontrol(urb->pipe)) {
ep->nextpid = USB_PID_ACK;
-   /* save the data underrun error code for later 
and
-* proceed with the status stage
-*/
-   urb->actual_length += PTD_GET_COUNT(ptd);
-   BUG_ON(urb->actual_length>   
urb->transfer_buffer_length);


But really, it should not be in the driver at all.  Please remove it, at
the most, do a WARN_ON() so that someone can see the problem and at
least report it.

Actually, what is this checking?  How can someone recover from it?  Who
is this check for?  The developer of this driver?  Another driver?
Hardware developer?  End user?  Who would be able to fix the problem if
this happens?

As it is, I can't take it, sorry.



Hi Greg.

I understand. As you have already said, this driver is full of BUG_ON
statements.

So, we can shift just the assignment as in the patch below, to have a
correct behavior, leaving the BUG_ON where it already is. Then, we may
propose a different patch to change BUG_ONs to WARN_ONs.


Your updated patch is much better, thanks.

But it doesn't apply to my tree right now.  Can you please refresh it
against the usb-next tree and resend it?


Actually, I did.

So, this means that I'm using the wrong tree...

I'm using the "usb-next" branch available on your tree at

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git

Is this the wrong one ?

Many thanks,

Claudio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] isp1362-hcd.c: usb message always saved in case of underrun

2012-07-18 Thread Claudio Scordino

Il 06/07/2012 19:41, Greg KH ha scritto:

On Wed, Jun 27, 2012 at 06:01:39PM +0200, Claudio Scordino wrote:

Hi Olav,

please find below a patch for the isp1362-hcd.c driver to always
save the message in case of underrun. More information is provided
inside the patch comment. Let us know if you need any further
information.

Best regards,

Claudio


Subject: isp1362-hcd.c: usb message always saved in case of underrun
From: Bruno Morelli

The usb message must be saved also in case the USB endpoint is not a
control endpoint (i.e., "endpoint 0"), otherwise in some circumstances
we don't have a payload in case of error.

The patch has been created by tracing with usbmon the different error
messages generated by this driver with respect to the ehci-hcd driver.

Signed-off-by: Bruno Morelli
Signed-off-by: Claudio Scordino
Tested-by: Bruno Morelli
---
  drivers/usb/host/isp1362-hcd.c |   11 ++-
  1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c
index 2ed112d..61bf1b2 100644
--- a/drivers/usb/host/isp1362-hcd.c
+++ b/drivers/usb/host/isp1362-hcd.c
@@ -543,13 +543,14 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, 
struct isp1362_ep *ep)
usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid,
short_ok ? "" : "not_",
PTD_GET_COUNT(ptd), ep->maxpacket, len);
+   /* save the data underrun error code for later and
+* proceed with the status stage
+*/
+   urb->actual_length += PTD_GET_COUNT(ptd);
+   BUG_ON(urb->actual_length>
+   urb->transfer_buffer_length);


Please NEVER crash the machine in a driver like this, it's bad and can
cause problems.  Yes, I know you are just moving it from the lines
below:


if (usb_pipecontrol(urb->pipe)) {
ep->nextpid = USB_PID_ACK;
-   /* save the data underrun error code for later 
and
-* proceed with the status stage
-*/
-   urb->actual_length += PTD_GET_COUNT(ptd);
-   BUG_ON(urb->actual_length>  
urb->transfer_buffer_length);


But really, it should not be in the driver at all.  Please remove it, at
the most, do a WARN_ON() so that someone can see the problem and at
least report it.

Actually, what is this checking?  How can someone recover from it?  Who
is this check for?  The developer of this driver?  Another driver?
Hardware developer?  End user?  Who would be able to fix the problem if
this happens?

As it is, I can't take it, sorry.



Hi Greg.

I understand. As you have already said, this driver is full of BUG_ON
statements.

So, we can shift just the assignment as in the patch below, to have a
correct behavior, leaving the BUG_ON where it already is. Then, we may
propose a different patch to change BUG_ONs to WARN_ONs.

Best regards,

   Claudio


Subject: isp1362-hcd.c: usb message always saved in case of underrun
From: Bruno Morelli 

The usb message must be saved also in case the USB endpoint is not a control
endpoint (i.e., "endpoint 0"), otherwise in some circumstances we don't have a
payload in case of error.

The patch has been created by tracing with usbmon the different error messages
generated by this driver with respect to the ehci-hcd driver.

Signed-off-by: Bruno Morelli 
Signed-off-by: Claudio Scordino 
Tested-by: Bruno Morelli 
---
 drivers/usb/host/isp1362-hcd.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/isp1362-hcd.c b/drivers/usb/host/isp1362-hcd.c
index 2ed112d..2563263 100644
--- a/drivers/usb/host/isp1362-hcd.c
+++ b/drivers/usb/host/isp1362-hcd.c
@@ -543,12 +543,12 @@ static void postproc_ep(struct isp1362_hcd *isp1362_hcd, 
struct isp1362_ep *ep)
usb_pipein(urb->pipe) ? "IN" : "OUT", ep->nextpid,
short_ok ? "" : "not_",
PTD_GET_COUNT(ptd), ep->maxpacket, len);
+   /* save the data underrun error code for later and
+* proceed with the status stage
+*/
+   urb->actual_length += PTD_GET_COUNT(ptd);
if (usb_pipecontrol(urb->pipe)) {
ep->nextpid = USB_PID_ACK;
-   /* save the data underrun error code for later 
and
-* proceed with the status stage
-*/
-