Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-28 Thread Preeti U Murthy
On 04/28/2014 11:34 PM, Jason Low wrote:
> On Sun, 2014-04-27 at 14:01 +0530, Preeti Murthy wrote:
>> Hi Jason, Peter,
>>
>> The below patch looks good to me except for one point.
>>
>> In idle_balance() the below code snippet does not look right:
>>
>> - if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
>> - /*
>> - * We are going idle. next_balance may be set based on
>> - * a busy processor. So reset next_balance.
>> - */
>> +out:
>> + /* Move the next balance forward */
>> + if (time_after(this_rq->next_balance, next_balance))
>>   this_rq->next_balance = next_balance;
>> - }
>>
>> By not checking this_rq->next_balance against jiffies,
>> we might end up not updating this parameter when it
>> has expired.
>>
>> So shouldn't it be:
>>
>> if (time_after(jiffies, this_rq->next_balance) ||
>>time_after(this_rq->next_balance, next_balance))
>> this_rq->next_balance = next_balance;
> 
> Hi Preeti,
> 
> If jiffies is after this_rq->next_balance, doesn't that mean that it's
> actually due for a periodic balance and we wouldn't need to modify it?
> In rebalance_domains(), we do load_balance if time_after_eq(jiffies,
> sd->last_balance + interval).

Right. So I missed the point that we don't really have a problem with
the rq->next_balance being expired. It will anyway ensure that in the
next call to rebalance_domains() load balancing will be done and that is
all we want. Thanks for pointing it out.

Regards
Preeti U Murthy
> 
>>
>> Besides this:
>> Reviewed-by: Preeti U Murthy 
> 
> Thanks for the review.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-28 Thread Preeti U Murthy
On 04/28/2014 02:54 PM, Peter Zijlstra wrote:
> On Sun, Apr 27, 2014 at 02:01:45PM +0530, Preeti Murthy wrote:
>> Hi Jason, Peter,
>>
>> The below patch looks good to me except for one point.
>>
>> In idle_balance() the below code snippet does not look right:
>>
>> - if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
>> - /*
>> - * We are going idle. next_balance may be set based on
>> - * a busy processor. So reset next_balance.
>> - */
>> +out:
>> + /* Move the next balance forward */
>> + if (time_after(this_rq->next_balance, next_balance))
>>   this_rq->next_balance = next_balance;
>> - }
>>
>> By not checking this_rq->next_balance against jiffies,
>> we might end up not updating this parameter when it
>> has expired.
>>
>> So shouldn't it be:
>>
>> if (time_after(jiffies, this_rq->next_balance) ||
>>time_after(this_rq->next_balance, next_balance))
>> this_rq->next_balance = next_balance;
> 
> So the reason I didn't do that is that nothing else does that either.
> Also, note that the value we set rq->next_balance to might itself
> already be expired. There is no guarantee that last_balance + interval
> is in the future.
> 
Hmm this makes sense. Thanks!

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-28 Thread Jason Low
On Sun, 2014-04-27 at 14:01 +0530, Preeti Murthy wrote:
> Hi Jason, Peter,
> 
> The below patch looks good to me except for one point.
> 
> In idle_balance() the below code snippet does not look right:
> 
> - if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
> - /*
> - * We are going idle. next_balance may be set based on
> - * a busy processor. So reset next_balance.
> - */
> +out:
> + /* Move the next balance forward */
> + if (time_after(this_rq->next_balance, next_balance))
>   this_rq->next_balance = next_balance;
> - }
> 
> By not checking this_rq->next_balance against jiffies,
> we might end up not updating this parameter when it
> has expired.
> 
> So shouldn't it be:
> 
> if (time_after(jiffies, this_rq->next_balance) ||
>time_after(this_rq->next_balance, next_balance))
> this_rq->next_balance = next_balance;

Hi Preeti,

If jiffies is after this_rq->next_balance, doesn't that mean that it's
actually due for a periodic balance and we wouldn't need to modify it?
In rebalance_domains(), we do load_balance if time_after_eq(jiffies,
sd->last_balance + interval).

> 
> Besides this:
> Reviewed-by: Preeti U Murthy 

Thanks for the review.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-28 Thread Jason Low
On Sat, 2014-04-26 at 16:50 +0200, Peter Zijlstra wrote:
> On Fri, Apr 25, 2014 at 12:54:14PM -0700, Jason Low wrote:
> > Preeti mentioned that sd->balance_interval is changed during load_balance().
> > Should we also consider updating the interval in rebalance_domains() after
> > calling load_balance(),
> 
> Yeah, that might make sense.
> 
> > and also taking max_load_balance_interval into account
> > in the updates for next_balance in idle_balance()?
> 
> I was thinking that max_load_balance_interval thing was mostly about the
> *busy_factor thing, but sure, can't hurt to be consistent and always do
> it.
> 
> > If so, how about the something like the below change which also introduces
> > get_sd_balance_interval() to obtain the sd's balance interval, and have both
> > update_next_balance() and rebalance_domains() use that function.
> 
> Yes, that looks good.
> 
> Can you send it with a proper changelog?

Sure, I'll send a v2 patchset so that this applies with the other
patches. I also think it would be beneficial to split this change into 2
patches (the 1st patch fixes commit e5fc6611, and the 2nd patch changes
how next_balance gets updated).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-28 Thread Peter Zijlstra
On Sun, Apr 27, 2014 at 02:01:45PM +0530, Preeti Murthy wrote:
> Hi Jason, Peter,
> 
> The below patch looks good to me except for one point.
> 
> In idle_balance() the below code snippet does not look right:
> 
> - if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
> - /*
> - * We are going idle. next_balance may be set based on
> - * a busy processor. So reset next_balance.
> - */
> +out:
> + /* Move the next balance forward */
> + if (time_after(this_rq->next_balance, next_balance))
>   this_rq->next_balance = next_balance;
> - }
> 
> By not checking this_rq->next_balance against jiffies,
> we might end up not updating this parameter when it
> has expired.
> 
> So shouldn't it be:
> 
> if (time_after(jiffies, this_rq->next_balance) ||
>time_after(this_rq->next_balance, next_balance))
> this_rq->next_balance = next_balance;

So the reason I didn't do that is that nothing else does that either.
Also, note that the value we set rq->next_balance to might itself
already be expired. There is no guarantee that last_balance + interval
is in the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-27 Thread Preeti Murthy
Hi Jason, Peter,

The below patch looks good to me except for one point.

In idle_balance() the below code snippet does not look right:

- if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
- /*
- * We are going idle. next_balance may be set based on
- * a busy processor. So reset next_balance.
- */
+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
  this_rq->next_balance = next_balance;
- }

By not checking this_rq->next_balance against jiffies,
we might end up not updating this parameter when it
has expired.

So shouldn't it be:

if (time_after(jiffies, this_rq->next_balance) ||
   time_after(this_rq->next_balance, next_balance))
this_rq->next_balance = next_balance;


Besides this:
Reviewed-by: Preeti U Murthy 

Regards
Preeti U Murthy


On Sat, Apr 26, 2014 at 1:24 AM, Jason Low  wrote:
> Signed-off-by: Jason Low 
> ---
>  kernel/sched/fair.c |   81 
> ---
>  1 files changed, 51 insertions(+), 30 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 43232b8..09c546c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6645,27 +6645,59 @@ out:
> return ld_moved;
>  }
>
> +static inline unsigned long get_sd_balance_interval(struct sched_domain *sd, 
> int busy)
> +{
> +   unsigned long interval = sd->balance_interval;
> +
> +   if (busy)
> +   interval *= sd->busy_factor;
> +
> +   /* scale ms to jiffies */
> +   interval = msecs_to_jiffies(interval);
> +   interval = clamp(interval, 1UL, max_load_balance_interval);
> +
> +   return interval;
> +}
> +
> +static inline void
> +update_next_balance(struct sched_domain *sd, int busy, unsigned long 
> *next_balance)
> +{
> +   unsigned long interval, next;
> +
> +   interval = get_sd_balance_interval(sd, busy);
> +   next = sd->last_balance + interval;
> +
> +   if (time_after(*next_balance, next))
> +   *next_balance = next;
> +}
> +
>  /*
>   * idle_balance is called by schedule() if this_cpu is about to become
>   * idle. Attempts to pull tasks from other CPUs.
>   */
>  static int idle_balance(struct rq *this_rq)
>  {
> +   unsigned long next_balance = jiffies + HZ;
> +   int this_cpu = this_rq->cpu;
> struct sched_domain *sd;
> int pulled_task = 0;
> -   unsigned long next_balance = jiffies + HZ;
> u64 curr_cost = 0;
> -   int this_cpu = this_rq->cpu;
>
> idle_enter_fair(this_rq);
> +
> /*
>  * We must set idle_stamp _before_ calling idle_balance(), such that 
> we
>  * measure the duration of idle_balance() as idle time.
>  */
> this_rq->idle_stamp = rq_clock(this_rq);
>
> -   if (this_rq->avg_idle < sysctl_sched_migration_cost)
> +   if (this_rq->avg_idle < sysctl_sched_migration_cost) {
> +   rcu_read_lock();
> +   sd = rcu_dereference_check_sched_domain(this_rq->sd);
> +   update_next_balance(sd, 0, &next_balance);
> +   rcu_read_unlock();
> goto out;
> +   }
>
> /*
>  * Drop the rq->lock, but keep IRQ/preempt disabled.
> @@ -6675,15 +6707,16 @@ static int idle_balance(struct rq *this_rq)
> update_blocked_averages(this_cpu);
> rcu_read_lock();
> for_each_domain(this_cpu, sd) {
> -   unsigned long interval;
> int continue_balancing = 1;
> u64 t0, domain_cost;
>
> if (!(sd->flags & SD_LOAD_BALANCE))
> continue;
>
> -   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
> +   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> +   update_next_balance(sd, 0, &next_balance);
> break;
> +   }
>
> if (sd->flags & SD_BALANCE_NEWIDLE) {
> t0 = sched_clock_cpu(this_cpu);
> @@ -6700,9 +6733,7 @@ static int idle_balance(struct rq *this_rq)
> curr_cost += domain_cost;
> }
>
> -   interval = msecs_to_jiffies(sd->balance_interval);
> -   if (time_after(next_balance, sd->last_balance + interval))
> -   next_balance = sd->last_balance + interval;
> +   update_next_balance(sd, 0, &next_balance);
> if (pulled_task)
> break;
> }
> @@ -6710,27 +6741,22 @@ static int idle_balance(struct rq *this_rq)
>
> raw_spin_lock(&this_rq->lock);
>
> +   if (curr_cost > this_rq->max_idle_balance_cost)
> +   this_rq->max_idle_balance_cost = curr_cost;
> +
> /*
> -* While browsing the domains, we released the rq lock.
> -* A task could have be enqueued in the meantime
> +* While browsing the domains, we released the rq lock, a task cou

Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-26 Thread Peter Zijlstra
On Fri, Apr 25, 2014 at 12:54:14PM -0700, Jason Low wrote:
> Preeti mentioned that sd->balance_interval is changed during load_balance().
> Should we also consider updating the interval in rebalance_domains() after
> calling load_balance(),

Yeah, that might make sense.

> and also taking max_load_balance_interval into account
> in the updates for next_balance in idle_balance()?

I was thinking that max_load_balance_interval thing was mostly about the
*busy_factor thing, but sure, can't hurt to be consistent and always do
it.

> If so, how about the something like the below change which also introduces
> get_sd_balance_interval() to obtain the sd's balance interval, and have both
> update_next_balance() and rebalance_domains() use that function.

Yes, that looks good.

Can you send it with a proper changelog?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-25 Thread Jason Low
On Fri, 2014-04-25 at 11:43 +0200, Peter Zijlstra wrote:
> So how about something like this? It tracks the minimal next_balance for
> whatever domains we do visit, or the very bottom domain in the
> insta-bail case (but yeah, Mike's got a point.. we could think of
> removing that).
> 
> The thought is that since the higher domains have larger intervals
> anyway, its less likely they will move the timer back often, so skipping
> them is not that big a deal.
> 
> I also considered tracking next_busy_balance and using that when
> pulled_task, but I decided against it (after I'd actually written the
> code to do so). We were on the brink of going idle, that's really not
> busy.

Preeti mentioned that sd->balance_interval is changed during load_balance().
Should we also consider updating the interval in rebalance_domains() after
calling load_balance(), and also taking max_load_balance_interval into account
in the updates for next_balance in idle_balance()?

If so, how about the something like the below change which also introduces
get_sd_balance_interval() to obtain the sd's balance interval, and have both
update_next_balance() and rebalance_domains() use that function.


Signed-off-by: Jason Low 
---
 kernel/sched/fair.c |   81 ---
 1 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43232b8..09c546c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6645,27 +6645,59 @@ out:
return ld_moved;
 }
 
+static inline unsigned long get_sd_balance_interval(struct sched_domain *sd, 
int busy)
+{
+   unsigned long interval = sd->balance_interval;
+
+   if (busy)
+   interval *= sd->busy_factor;
+
+   /* scale ms to jiffies */
+   interval = msecs_to_jiffies(interval);
+   interval = clamp(interval, 1UL, max_load_balance_interval);
+
+   return interval;
+}
+
+static inline void
+update_next_balance(struct sched_domain *sd, int busy, unsigned long 
*next_balance)
+{
+   unsigned long interval, next;
+
+   interval = get_sd_balance_interval(sd, busy);
+   next = sd->last_balance + interval;
+
+   if (time_after(*next_balance, next))
+   *next_balance = next;
+}
+
 /*
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
 static int idle_balance(struct rq *this_rq)
 {
+   unsigned long next_balance = jiffies + HZ;
+   int this_cpu = this_rq->cpu;
struct sched_domain *sd;
int pulled_task = 0;
-   unsigned long next_balance = jiffies + HZ;
u64 curr_cost = 0;
-   int this_cpu = this_rq->cpu;
 
idle_enter_fair(this_rq);
+
/*
 * We must set idle_stamp _before_ calling idle_balance(), such that we
 * measure the duration of idle_balance() as idle time.
 */
this_rq->idle_stamp = rq_clock(this_rq);
 
-   if (this_rq->avg_idle < sysctl_sched_migration_cost)
+   if (this_rq->avg_idle < sysctl_sched_migration_cost) {
+   rcu_read_lock();
+   sd = rcu_dereference_check_sched_domain(this_rq->sd);
+   update_next_balance(sd, 0, &next_balance);
+   rcu_read_unlock();
goto out;
+   }
 
/*
 * Drop the rq->lock, but keep IRQ/preempt disabled.
@@ -6675,15 +6707,16 @@ static int idle_balance(struct rq *this_rq)
update_blocked_averages(this_cpu);
rcu_read_lock();
for_each_domain(this_cpu, sd) {
-   unsigned long interval;
int continue_balancing = 1;
u64 t0, domain_cost;
 
if (!(sd->flags & SD_LOAD_BALANCE))
continue;
 
-   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
+   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
+   update_next_balance(sd, 0, &next_balance);
break;
+   }
 
if (sd->flags & SD_BALANCE_NEWIDLE) {
t0 = sched_clock_cpu(this_cpu);
@@ -6700,9 +6733,7 @@ static int idle_balance(struct rq *this_rq)
curr_cost += domain_cost;
}
 
-   interval = msecs_to_jiffies(sd->balance_interval);
-   if (time_after(next_balance, sd->last_balance + interval))
-   next_balance = sd->last_balance + interval;
+   update_next_balance(sd, 0, &next_balance);
if (pulled_task)
break;
}
@@ -6710,27 +6741,22 @@ static int idle_balance(struct rq *this_rq)
 
raw_spin_lock(&this_rq->lock);
 
+   if (curr_cost > this_rq->max_idle_balance_cost)
+   this_rq->max_idle_balance_cost = curr_cost;
+
/*
-* While browsing the domains, we released the rq lock.
-* A task could have be enque

Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-25 Thread Jason Low
On Fri, 2014-04-25 at 09:58 +0200, Mike Galbraith wrote:
> On Fri, 2014-04-25 at 00:13 -0700, Jason Low wrote: 
> > On Fri, 2014-04-25 at 10:42 +0530, Preeti U Murthy wrote:
> > > I agree with this. However I am concerned with an additional point that
> > > I have mentioned in my reply to Peter's mail on this thread.
> > > 
> > > Should we verify if rq->next_balance update is independent of
> > > pulled_tasks? sd->balance_interval is changed during load_balance() and
> > > rq->next_balance should perhaps consider that?
> > 
> > Hi Preeti,
> > 
> > I agree that we may want to consider having rq->next balance update be
> > independent of pulled_task. As you mentioned, load_balance() can modify
> > the balance_interval.
> > 
> > There are a few things I'm wondering if we would need to also add then:
> > 
> > 1. In the case that this_rq->avg_idle < sysctl_sched_migration_cost, we
> >would need to also traverse the domains to properly compute
> >next_balance (without the sd->busy_factor) as we would be going idle.
> >Otherwise, next_balance could get set to jiffies + HZ while the
> >CPU goes idle.
> 
> Avoiding high frequency cache misses and cycle wastage on micro-idle was
> what avg-idle was about.  If you're going to traverse anyway, or have a
> better way to not do that too frequently, you can just nuke it.

Yeah, we already compare avg-idle with the per-domain costs in that
function. I'll run some performance tests with the first check removed,
as such a change can potentially have a (+/-) impact on performance.

Thanks,
Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-25 Thread Peter Zijlstra
On Fri, Apr 25, 2014 at 10:38:43AM +0530, Preeti U Murthy wrote:
> > But if it fails to pull anything, we'll (potentially) iterate the entire
> > tree up to the largest domain; and supposedly set next_balanced to the
> > largest possible interval.
> 
> *to the smallest possible interval.
> > 
> > So when we go from busy to idle (!pulled_task), we actually set
> > ->next_balance to the longest interval. Whereas the commit you
> > referenced says it sets it to a shorter while.
> 
> We will set next_balance to the earliest balance time among the sched
> domains iterated.

Duh. I read that time_after the wrong way around.. silly me :-)

> I am unable to understand how updating of rq->next_balance should depend
> solely on the pulled_task parameter( I am not considering the expiry of
> rq->next_balance here).
> 
> True that we will need to override the busy_factor in rq->next_balance
> if we do not pull any tasks and go to idle. Besides that however we will
> probably need to override rq->next_balance irrespective of whether we
> pull any tasks.
> 
> Lets look at what happens to the sd->balance_interval in load_balance().
> If we pull tasks then it is set to min_interval. If active balance
> occurs or if tasks are pinned then we push the interval farther away.In
> the former case where it is set to min_interval, pulled_tasks > 0, in
> the latter case, especially the pinned case, pulled_task=0 (not sure
> about the active balance case).
> 
> If after this modification on sd->balance_interval,
> rq->next_balance > sd->last_balance + sd->balance_interval then
> shouldn't we be resetting rq->next_balance? And if we should, then the
> dependence on pulled_tasks is not justified is it? All this assuming
> that rq->next_balance should always reflect the minimum value of
> sd->next_balance among the sched domains of which the rq is a part.

So how about something like this? It tracks the minimal next_balance for
whatever domains we do visit, or the very bottom domain in the
insta-bail case (but yeah, Mike's got a point.. we could think of
removing that).

The thought is that since the higher domains have larger intervals
anyway, its less likely they will move the timer back often, so skipping
them is not that big a deal.

I also considered tracking next_busy_balance and using that when
pulled_task, but I decided against it (after I'd actually written the
code to do so). We were on the brink of going idle, that's really not
busy.

---
 kernel/sched/fair.c | 58 -
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43232b8bacde..2ac1ad3de6c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6645,17 +6645,29 @@ static int load_balance(int this_cpu, struct rq 
*this_rq,
return ld_moved;
 }
 
+static inline void
+update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
+{
+   unsigned long interval, next;
+
+   interval = msecs_to_jiffies(sd->balance_interval);
+   next = sd->last_balance + interval;
+
+   if (time_after(*next_balance, next))
+   *next_balance = next;
+}
+
 /*
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
 static int idle_balance(struct rq *this_rq)
 {
+   unsigned long next_balance = jiffies + HZ;
+   int this_cpu = this_rq->cpu;
struct sched_domain *sd;
int pulled_task = 0;
-   unsigned long next_balance = jiffies + HZ;
u64 curr_cost = 0;
-   int this_cpu = this_rq->cpu;
 
idle_enter_fair(this_rq);
/*
@@ -6664,8 +6676,14 @@ static int idle_balance(struct rq *this_rq)
 */
this_rq->idle_stamp = rq_clock(this_rq);
 
-   if (this_rq->avg_idle < sysctl_sched_migration_cost)
+   if (this_rq->avg_idle < sysctl_sched_migration_cost) {
+   rcu_read_lock();
+   update_next_balance(
+   rcu_dereference_check_sched_domain(this_rq->sd),
+   &next_balance);
+   rcu_read_unlock();
goto out;
+   }
 
/*
 * Drop the rq->lock, but keep IRQ/preempt disabled.
@@ -6675,15 +6693,16 @@ static int idle_balance(struct rq *this_rq)
update_blocked_averages(this_cpu);
rcu_read_lock();
for_each_domain(this_cpu, sd) {
-   unsigned long interval;
int continue_balancing = 1;
u64 t0, domain_cost;
 
if (!(sd->flags & SD_LOAD_BALANCE))
continue;
 
-   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
+   if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
+   update_next_balance(sd, &next_balance);
break;
+   }
 
if (sd->flags & SD_BALANCE_NEWIDLE) {
t0

Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-25 Thread Mike Galbraith
On Fri, 2014-04-25 at 00:13 -0700, Jason Low wrote: 
> On Fri, 2014-04-25 at 10:42 +0530, Preeti U Murthy wrote:
> > I agree with this. However I am concerned with an additional point that
> > I have mentioned in my reply to Peter's mail on this thread.
> > 
> > Should we verify if rq->next_balance update is independent of
> > pulled_tasks? sd->balance_interval is changed during load_balance() and
> > rq->next_balance should perhaps consider that?
> 
> Hi Preeti,
> 
> I agree that we may want to consider having rq->next balance update be
> independent of pulled_task. As you mentioned, load_balance() can modify
> the balance_interval.
> 
> There are a few things I'm wondering if we would need to also add then:
> 
> 1. In the case that this_rq->avg_idle < sysctl_sched_migration_cost, we
>would need to also traverse the domains to properly compute
>next_balance (without the sd->busy_factor) as we would be going idle.
>Otherwise, next_balance could get set to jiffies + HZ while the
>CPU goes idle.

Avoiding high frequency cache misses and cycle wastage on micro-idle was
what avg-idle was about.  If you're going to traverse anyway, or have a
better way to not do that too frequently, you can just nuke it.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-25 Thread Jason Low
On Fri, 2014-04-25 at 10:42 +0530, Preeti U Murthy wrote:
> I agree with this. However I am concerned with an additional point that
> I have mentioned in my reply to Peter's mail on this thread.
> 
> Should we verify if rq->next_balance update is independent of
> pulled_tasks? sd->balance_interval is changed during load_balance() and
> rq->next_balance should perhaps consider that?

Hi Preeti,

I agree that we may want to consider having rq->next balance update be
independent of pulled_task. As you mentioned, load_balance() can modify
the balance_interval.

There are a few things I'm wondering if we would need to also add then:

1. In the case that this_rq->avg_idle < sysctl_sched_migration_cost, we
   would need to also traverse the domains to properly compute
   next_balance (without the sd->busy_factor) as we would be going idle.
   Otherwise, next_balance could get set to jiffies + HZ while the
   CPU goes idle.

2. In the domain traversal, when we pulled_task, we might want to
   multiply interval by sd->busy_factor because the rq will remain busy.

3. If this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost, then we
   may still want to compute next_balance, rather than simply break out
   of the sched domain traversal loop. This is also to avoid having
   the next_balance = jiffies + HZ when a domain should rebalance
   less than 1 second later.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Preeti U Murthy
Hi Jason,

On 04/25/2014 03:48 AM, Jason Low wrote:
> On Thu, 2014-04-24 at 19:14 +0200, Peter Zijlstra wrote:
>> On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
>>>
>>> So I thought that the original rationale (commit 1bd77f2d) behind
>>> updating rq->next_balance in idle_balance() is that, if we are going
>>> idle (!pulled_task), we want to ensure that the next_balance gets
>>> calculated without the busy_factor.
>>>
>>> If the rq is busy, then rq->next_balance gets updated based on
>>> sd->interval * busy_factor. However, when the rq goes from "busy"
>>> to idle, rq->next_balance might still have been calculated under
>>> the assumption that the rq is busy. Thus, if we are going idle, we
>>> would then properly update next_balance without the busy factor
>>> if we update when !pulled_task.
>>>
>>
>> Its late here and I'm confused!
>>
>> So the for_each_domain() loop calculates a new next_balance based on
>> ->balance_interval (which has that busy_factor on, right).
>>
>> But if it fails to pull anything, we'll (potentially) iterate the entire
>> tree up to the largest domain; and supposedly set next_balanced to the
>> largest possible interval.
>>
>> So when we go from busy to idle (!pulled_task), we actually set
>> ->next_balance to the longest interval. Whereas the commit you
>> referenced says it sets it to a shorter while.
>>
>> Not seeing it.
> 
> So this is the way I understand that code:
> 
> In rebalance_domain, next_balance is suppose to be set to the
> minimum of all sd->last_balance + interval so that we properly call
> into rebalance_domains() if one of the domains is due for a balance.
> 
> In the domain traversals:
> 
>   if (time_after(next_balance, sd->last_balance + interval))
>   next_balance = sd->last_balance + interval;
> 
> we update next_balance to a new value if the current next_balance
> is after, and we only update next_balance to a smaller value.
> 
> In rebalance_domains, we have code:
> 
>   interval = sd->balance_interval;
>   if (idle != CPU_IDLE)
>   interval *= sd->busy_factor;
> 
>   ...
> 
>   if (time_after(next_balance, sd->last_balance + interval)) {
>   next_balance = sd->last_balance + interval;
> 
>   ...
> 
>   rq->next_balance = next_balance;
> 
> In the CPU_IDLE case, interval would not include the busy factor,
> whereas in the !CPU_IDLE case, we multiply the interval by the
> sd->busy_factor.
> 
> So as an example, if a CPU is not idle and we run this:
> 
> rebalance_domain()
>   interval = 1 ms;
>   if (idle != CPU_IDLE)
>   interval *= 64;
> 
>   next_balance = sd->last_balance + 64 ms
> 
>   rq->next_balance = next_balance
> 
> The rq->next_balance is set to a large value since the CPU is not idle.
> 
> Then, let's say the CPU then goes idle 1 ms later. The
> rq->next_balance can be up to 63 ms later, because we computed
> it when the CPU is not idle. Now that we are going idle,
> we would have to wait a long time for the next balance.
> 
> So I believe that the initial reason why rq->next_balance was
> updated in idle_balance is that if the CPU is in the process 
> of going idle (!pulled_task in idle_balance()), we can reset the
> rq->next_balance based on the interval = 1 ms, as oppose to
> having it remain up to 64 ms later (in idle_balance(), interval
> doesn't get multiplied by sd->busy_factor).

I agree with this. However I am concerned with an additional point that
I have mentioned in my reply to Peter's mail on this thread.

Should we verify if rq->next_balance update is independent of
pulled_tasks? sd->balance_interval is changed during load_balance() and
rq->next_balance should perhaps consider that?

Regards
Preeti U Murthy
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Preeti U Murthy
On 04/24/2014 10:44 PM, Peter Zijlstra wrote:
> On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
>>
>> So I thought that the original rationale (commit 1bd77f2d) behind
>> updating rq->next_balance in idle_balance() is that, if we are going
>> idle (!pulled_task), we want to ensure that the next_balance gets
>> calculated without the busy_factor.
>>
>> If the rq is busy, then rq->next_balance gets updated based on
>> sd->interval * busy_factor. However, when the rq goes from "busy"
>> to idle, rq->next_balance might still have been calculated under
>> the assumption that the rq is busy. Thus, if we are going idle, we
>> would then properly update next_balance without the busy factor
>> if we update when !pulled_task.
>>
> 
> Its late here and I'm confused!
> 
> So the for_each_domain() loop calculates a new next_balance based on
> ->balance_interval (which has that busy_factor on, right).
> 
> But if it fails to pull anything, we'll (potentially) iterate the entire
> tree up to the largest domain; and supposedly set next_balanced to the
> largest possible interval.

*to the smallest possible interval.
> 
> So when we go from busy to idle (!pulled_task), we actually set
> ->next_balance to the longest interval. Whereas the commit you
> referenced says it sets it to a shorter while.

We will set next_balance to the earliest balance time among the sched
domains iterated.
> 
> Not seeing it.
> 
> So the code as modified by Ingo in one of the initial CFS commits, will
> move the ->next_balance time ahead if the balance succeeded
> (pulled_task), thereby reflecting that we are busy and we just did a
> balance so we need not do one again soon. (we might want to re-think
> this if we really make the idle balance only pull 1 task max).
> 
> Of course, I've now gone over this code 3 times today, so I'm terminally
> confused.

I am unable to understand how updating of rq->next_balance should depend
solely on the pulled_task parameter( I am not considering the expiry of
rq->next_balance here).

True that we will need to override the busy_factor in rq->next_balance
if we do not pull any tasks and go to idle. Besides that however we will
probably need to override rq->next_balance irrespective of whether we
pull any tasks.

Lets look at what happens to the sd->balance_interval in load_balance().
If we pull tasks then it is set to min_interval. If active balance
occurs or if tasks are pinned then we push the interval farther away.In
the former case where it is set to min_interval, pulled_tasks > 0, in
the latter case, especially the pinned case, pulled_task=0 (not sure
about the active balance case).

If after this modification on sd->balance_interval,
rq->next_balance > sd->last_balance + sd->balance_interval then
shouldn't we be resetting rq->next_balance? And if we should, then the
dependence on pulled_tasks is not justified is it? All this assuming
that rq->next_balance should always reflect the minimum value of
sd->next_balance among the sched domains of which the rq is a part.

Regards
Preeti U Murthy
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Jason Low
On Thu, 2014-04-24 at 19:14 +0200, Peter Zijlstra wrote:
> On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
> > 
> > So I thought that the original rationale (commit 1bd77f2d) behind
> > updating rq->next_balance in idle_balance() is that, if we are going
> > idle (!pulled_task), we want to ensure that the next_balance gets
> > calculated without the busy_factor.
> > 
> > If the rq is busy, then rq->next_balance gets updated based on
> > sd->interval * busy_factor. However, when the rq goes from "busy"
> > to idle, rq->next_balance might still have been calculated under
> > the assumption that the rq is busy. Thus, if we are going idle, we
> > would then properly update next_balance without the busy factor
> > if we update when !pulled_task.
> > 
> 
> Its late here and I'm confused!
> 
> So the for_each_domain() loop calculates a new next_balance based on
> ->balance_interval (which has that busy_factor on, right).
> 
> But if it fails to pull anything, we'll (potentially) iterate the entire
> tree up to the largest domain; and supposedly set next_balanced to the
> largest possible interval.
> 
> So when we go from busy to idle (!pulled_task), we actually set
> ->next_balance to the longest interval. Whereas the commit you
> referenced says it sets it to a shorter while.
> 
> Not seeing it.

So this is the way I understand that code:

In rebalance_domain, next_balance is suppose to be set to the
minimum of all sd->last_balance + interval so that we properly call
into rebalance_domains() if one of the domains is due for a balance.

In the domain traversals:

if (time_after(next_balance, sd->last_balance + interval))
next_balance = sd->last_balance + interval;

we update next_balance to a new value if the current next_balance
is after, and we only update next_balance to a smaller value.

In rebalance_domains, we have code:

interval = sd->balance_interval;
if (idle != CPU_IDLE)
interval *= sd->busy_factor;

...

if (time_after(next_balance, sd->last_balance + interval)) {
next_balance = sd->last_balance + interval;

...

rq->next_balance = next_balance;

In the CPU_IDLE case, interval would not include the busy factor,
whereas in the !CPU_IDLE case, we multiply the interval by the
sd->busy_factor.

So as an example, if a CPU is not idle and we run this:

rebalance_domain()
interval = 1 ms;
if (idle != CPU_IDLE)
interval *= 64;

next_balance = sd->last_balance + 64 ms

rq->next_balance = next_balance

The rq->next_balance is set to a large value since the CPU is not idle.

Then, let's say the CPU then goes idle 1 ms later. The
rq->next_balance can be up to 63 ms later, because we computed
it when the CPU is not idle. Now that we are going idle,
we would have to wait a long time for the next balance.

So I believe that the initial reason why rq->next_balance was
updated in idle_balance is that if the CPU is in the process 
of going idle (!pulled_task in idle_balance()), we can reset the
rq->next_balance based on the interval = 1 ms, as oppose to
having it remain up to 64 ms later (in idle_balance(), interval
doesn't get multiplied by sd->busy_factor).



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 07:14:53PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
> > 
> > So I thought that the original rationale (commit 1bd77f2d) behind
> > updating rq->next_balance in idle_balance() is that, if we are going
> > idle (!pulled_task), we want to ensure that the next_balance gets
> > calculated without the busy_factor.
> > 
> > If the rq is busy, then rq->next_balance gets updated based on
> > sd->interval * busy_factor. However, when the rq goes from "busy"
> > to idle, rq->next_balance might still have been calculated under
> > the assumption that the rq is busy. Thus, if we are going idle, we
> > would then properly update next_balance without the busy factor
> > if we update when !pulled_task.
> > 
> 
> Its late here and I'm confused!
> 
> So the for_each_domain() loop calculates a new next_balance based on
> ->balance_interval (which has that busy_factor on, right).

Not right, ->balance_interval is the base interval. rebalance_domains()
doesn't update it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
> 
> So I thought that the original rationale (commit 1bd77f2d) behind
> updating rq->next_balance in idle_balance() is that, if we are going
> idle (!pulled_task), we want to ensure that the next_balance gets
> calculated without the busy_factor.
> 
> If the rq is busy, then rq->next_balance gets updated based on
> sd->interval * busy_factor. However, when the rq goes from "busy"
> to idle, rq->next_balance might still have been calculated under
> the assumption that the rq is busy. Thus, if we are going idle, we
> would then properly update next_balance without the busy factor
> if we update when !pulled_task.
> 

Its late here and I'm confused!

So the for_each_domain() loop calculates a new next_balance based on
->balance_interval (which has that busy_factor on, right).

But if it fails to pull anything, we'll (potentially) iterate the entire
tree up to the largest domain; and supposedly set next_balanced to the
largest possible interval.

So when we go from busy to idle (!pulled_task), we actually set
->next_balance to the longest interval. Whereas the commit you
referenced says it sets it to a shorter while.

Not seeing it.

So the code as modified by Ingo in one of the initial CFS commits, will
move the ->next_balance time ahead if the balance succeeded
(pulled_task), thereby reflecting that we are busy and we just did a
balance so we need not do one again soon. (we might want to re-think
this if we really make the idle balance only pull 1 task max).

Of course, I've now gone over this code 3 times today, so I'm terminally
confused.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Jason Low
On Thu, 2014-04-24 at 14:44 +0200, Peter Zijlstra wrote:
> On Thu, Apr 24, 2014 at 02:04:15PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 24, 2014 at 03:44:47PM +0530, Preeti U Murthy wrote:
> > > What about the update of next_balance field? See the code snippet below.
> > > This will also be skipped as a consequence of the commit e5fc6611 right?
> > > 
> > >  if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
> > >  /*
> > >   * We are going idle. next_balance may be set based on
> > >   * a busy processor. So reset next_balance.
> > >   */
> > >  this_rq->next_balance = next_balance;
> > >  }
> > > 
> > > Also the comment in the above snippet does not look right to me.
> > > It says "we are going idle" but the condition checks for pulled_task.
> > 
> > Yeah, that's odd indeed. Ingo did that back in dd41f596cda0d, I suspect
> > its an error, but..
> > 
> > So I think that should become !pulled_task || time_after().
> 
> Hmm, no, I missed that the for_each_domain() loop pushes next_balance
> ahead if it did a balance on the domain.
> 
> So it actually makes sense and the comment is wrong, but then you're
> also right that we want to not skip that.

Hi Preeti, Peter,

So I thought that the original rationale (commit 1bd77f2d) behind
updating rq->next_balance in idle_balance() is that, if we are going
idle (!pulled_task), we want to ensure that the next_balance gets
calculated without the busy_factor.

If the rq is busy, then rq->next_balance gets updated based on
sd->interval * busy_factor. However, when the rq goes from "busy"
to idle, rq->next_balance might still have been calculated under
the assumption that the rq is busy. Thus, if we are going idle, we
would then properly update next_balance without the busy factor
if we update when !pulled_task.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 02:04:15PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 24, 2014 at 03:44:47PM +0530, Preeti U Murthy wrote:
> > What about the update of next_balance field? See the code snippet below.
> > This will also be skipped as a consequence of the commit e5fc6611 right?
> > 
> >if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
> >  /*
> >   * We are going idle. next_balance may be set based on
> >   * a busy processor. So reset next_balance.
> >   */
> >  this_rq->next_balance = next_balance;
> >  }
> > 
> > Also the comment in the above snippet does not look right to me.
> > It says "we are going idle" but the condition checks for pulled_task.
> 
> Yeah, that's odd indeed. Ingo did that back in dd41f596cda0d, I suspect
> its an error, but..
> 
> So I think that should become !pulled_task || time_after().

Hmm, no, I missed that the for_each_domain() loop pushes next_balance
ahead if it did a balance on the domain.

So it actually makes sense and the comment is wrong, but then you're
also right that we want to not skip that.

So how about something like so?

---
Subject: sched,fair: Fix idle_balance()'s pulled_task logic
From: Peter Zijlstra 
Date: Thu Apr 24 14:24:20 CEST 2014

Jason reported that we can fail to update max_idle_balance_cost, even
if we actually did try to find one.

Preeti then noticed that the next_balance update logic was equally
flawed.

So fix both and update the comments.

Fixes: e5fc66119ec9 ("sched: Fix race in idle_balance()")
Cc: Daniel Lezcano 
Reported-by: Jason Low 
Reported-by: Preeti U Murthy 
Signed-off-by: Peter Zijlstra 
---
 kernel/sched/fair.c |   23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6711,21 +6711,18 @@ static int idle_balance(struct rq *this_
raw_spin_lock(&this_rq->lock);
 
/*
-* While browsing the domains, we released the rq lock.
-* A task could have be enqueued in the meantime
+* If we pulled a task (or if the interval expired), we did a balance
+* pass, so update next_balance.
 */
-   if (this_rq->cfs.h_nr_running && !pulled_task) {
-   pulled_task = 1;
-   goto out;
-   }
-
-   if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
-   /*
-* We are going idle. next_balance may be set based on
-* a busy processor. So reset next_balance.
-*/
+   if (pulled_task || time_after(jiffies, this_rq->next_balance))
this_rq->next_balance = next_balance;
-   }
+
+   /*
+* While browsing the domains, we released the rq lock, a task could
+* have been enqueued in the meantime.
+*/
+   if (this_rq->cfs.h_nr_running && !pulled_task)
+   pulled_task = 1;
 
if (curr_cost > this_rq->max_idle_balance_cost)
this_rq->max_idle_balance_cost = curr_cost;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 03:44:47PM +0530, Preeti U Murthy wrote:
> What about the update of next_balance field? See the code snippet below.
> This will also be skipped as a consequence of the commit e5fc6611 right?
> 
>  if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
>  /*
>   * We are going idle. next_balance may be set based on
>   * a busy processor. So reset next_balance.
>   */
>  this_rq->next_balance = next_balance;
>  }
> 
> Also the comment in the above snippet does not look right to me.
> It says "we are going idle" but the condition checks for pulled_task.

Yeah, that's odd indeed. Ingo did that back in dd41f596cda0d, I suspect
its an error, but..

So I think that should become !pulled_task || time_after().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-24 Thread Preeti U Murthy
Hi Jason,

On 04/24/2014 07:00 AM, Jason Low wrote:
> Commit e5fc6611 can potentially cause rq->max_idle_balance_cost to not be
> updated, even when load_balance(NEWLY_IDLE) is attempted and the per-sd
> max cost value is updated.
> 
> In this patch, we update the rq->max_idle_balance_cost regardless of
> whether or not a task has been enqueued while browsing the domains.
> 
> Signed-off-by: Jason Low 
> ---
>  kernel/sched/fair.c |9 +
>  1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 43232b8..3e3ffb8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6658,6 +6658,7 @@ static int idle_balance(struct rq *this_rq)
>   int this_cpu = this_rq->cpu;
> 
>   idle_enter_fair(this_rq);
> +
>   /*
>* We must set idle_stamp _before_ calling idle_balance(), such that we
>* measure the duration of idle_balance() as idle time.
> @@ -6710,9 +6711,12 @@ static int idle_balance(struct rq *this_rq)
> 
>   raw_spin_lock(&this_rq->lock);
> 
> + if (curr_cost > this_rq->max_idle_balance_cost)
> + this_rq->max_idle_balance_cost = curr_cost;
> +
>   /*

What about the update of next_balance field? See the code snippet below.
This will also be skipped as a consequence of the commit e5fc6611 right?

if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
   /*
  * We are going idle. next_balance may be set based on
  * a busy processor. So reset next_balance.
  */
 this_rq->next_balance = next_balance;
 }

Also the comment in the above snippet does not look right to me.
It says "we are going idle" but the condition checks for pulled_task.


Regards
Preeti U Murthy

>* While browsing the domains, we released the rq lock.
> -  * A task could have be enqueued in the meantime
> +  * A task could have been enqueued in the meantime.
>*/
>   if (this_rq->cfs.h_nr_running && !pulled_task) {
>   pulled_task = 1;
> @@ -6727,9 +6731,6 @@ static int idle_balance(struct rq *this_rq)
>   this_rq->next_balance = next_balance;
>   }
> 
> - if (curr_cost > this_rq->max_idle_balance_cost)
> - this_rq->max_idle_balance_cost = curr_cost;
> -
>  out:
>   /* Is there a task of a high priority class? */
>   if (this_rq->nr_running != this_rq->cfs.h_nr_running)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

2014-04-23 Thread Jason Low
Commit e5fc6611 can potentially cause rq->max_idle_balance_cost to not be
updated, even when load_balance(NEWLY_IDLE) is attempted and the per-sd
max cost value is updated.

In this patch, we update the rq->max_idle_balance_cost regardless of
whether or not a task has been enqueued while browsing the domains.

Signed-off-by: Jason Low 
---
 kernel/sched/fair.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43232b8..3e3ffb8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6658,6 +6658,7 @@ static int idle_balance(struct rq *this_rq)
int this_cpu = this_rq->cpu;
 
idle_enter_fair(this_rq);
+
/*
 * We must set idle_stamp _before_ calling idle_balance(), such that we
 * measure the duration of idle_balance() as idle time.
@@ -6710,9 +6711,12 @@ static int idle_balance(struct rq *this_rq)
 
raw_spin_lock(&this_rq->lock);
 
+   if (curr_cost > this_rq->max_idle_balance_cost)
+   this_rq->max_idle_balance_cost = curr_cost;
+
/*
 * While browsing the domains, we released the rq lock.
-* A task could have be enqueued in the meantime
+* A task could have been enqueued in the meantime.
 */
if (this_rq->cfs.h_nr_running && !pulled_task) {
pulled_task = 1;
@@ -6727,9 +6731,6 @@ static int idle_balance(struct rq *this_rq)
this_rq->next_balance = next_balance;
}
 
-   if (curr_cost > this_rq->max_idle_balance_cost)
-   this_rq->max_idle_balance_cost = curr_cost;
-
 out:
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/