Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-03 Thread Peter Zijlstra
On Tue, Sep 02, 2014 at 12:41:24AM -0700, Jason Low wrote:
> On Mon, 2014-09-01 at 14:55 +0200, Peter Zijlstra wrote:
> > But yes, unbounded errors here are a problem, sure relaxing the updates
> > makes things go fast, they also make things go skew.
> 
> Okay. In that case, would you like to take our original patch which
> avoids unnecessary updates?

This seems a safe patch; yes I'll queue it while we think of smarter
things :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-03 Thread Peter Zijlstra
On Tue, Sep 02, 2014 at 12:41:24AM -0700, Jason Low wrote:
 On Mon, 2014-09-01 at 14:55 +0200, Peter Zijlstra wrote:
  But yes, unbounded errors here are a problem, sure relaxing the updates
  makes things go fast, they also make things go skew.
 
 Okay. In that case, would you like to take our original patch which
 avoids unnecessary updates?

This seems a safe patch; yes I'll queue it while we think of smarter
things :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-02 Thread Jason Low
On Mon, 2014-09-01 at 14:55 +0200, Peter Zijlstra wrote:
> But yes, unbounded errors here are a problem, sure relaxing the updates
> makes things go fast, they also make things go skew.

Okay. In that case, would you like to take our original patch which
avoids unnecessary updates?

-
Subject: [PATCH] sched: Reduce contention in update_cfs_rq_blocked_load

When running workloads on 2+ socket systems, based on perf profiles, the
update_cfs_rq_blocked_load function often shows up as taking up a
noticeable % of run time.

Much of the contention is in __update_cfs_rq_tg_load_contrib when we
update the tg load contribution stats.  However, it turns out that in many
cases, they don't need to be updated and "tg_contrib" is 0.

This patch adds a check in __update_cfs_rq_tg_load_contrib to skip updating
tg load contribution stats when nothing needs to be updated. This reduces the
cacheline contention that would be unnecessary.

Cc: Yuyang Du 
Cc: Aswin Chandramouleeswaran 
Cc: Chegu Vinod 
Cc: Scott J Norton 
Reviewed-by: Ben Segall 
Reviewed-by: Waiman Long 
Signed-off-by: Jason Low 
---
 kernel/sched/fair.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3427a8..45e346c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2382,6 +2382,9 @@ static inline void __update_cfs_rq_tg_load_contrib(struct 
cfs_rq *cfs_rq,
tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
tg_contrib -= cfs_rq->tg_load_contrib;
 
+   if (!tg_contrib)
+   return;
+
if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
atomic_long_add(tg_contrib, >load_avg);
cfs_rq->tg_load_contrib += tg_contrib;
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-02 Thread Jason Low
On Mon, 2014-09-01 at 14:55 +0200, Peter Zijlstra wrote:
 But yes, unbounded errors here are a problem, sure relaxing the updates
 makes things go fast, they also make things go skew.

Okay. In that case, would you like to take our original patch which
avoids unnecessary updates?

-
Subject: [PATCH] sched: Reduce contention in update_cfs_rq_blocked_load

When running workloads on 2+ socket systems, based on perf profiles, the
update_cfs_rq_blocked_load function often shows up as taking up a
noticeable % of run time.

Much of the contention is in __update_cfs_rq_tg_load_contrib when we
update the tg load contribution stats.  However, it turns out that in many
cases, they don't need to be updated and tg_contrib is 0.

This patch adds a check in __update_cfs_rq_tg_load_contrib to skip updating
tg load contribution stats when nothing needs to be updated. This reduces the
cacheline contention that would be unnecessary.

Cc: Yuyang Du yuyang...@intel.com
Cc: Aswin Chandramouleeswaran as...@hp.com
Cc: Chegu Vinod chegu_vi...@hp.com
Cc: Scott J Norton scott.nor...@hp.com
Reviewed-by: Ben Segall bseg...@google.com
Reviewed-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Jason Low jason.l...@hp.com
---
 kernel/sched/fair.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3427a8..45e346c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2382,6 +2382,9 @@ static inline void __update_cfs_rq_tg_load_contrib(struct 
cfs_rq *cfs_rq,
tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
tg_contrib -= cfs_rq-tg_load_contrib;
 
+   if (!tg_contrib)
+   return;
+
if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
atomic_long_add(tg_contrib, tg-load_avg);
cfs_rq-tg_load_contrib += tg_contrib;
-- 
1.7.1



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-01 Thread Peter Zijlstra
On Thu, Aug 28, 2014 at 12:46:36PM -0700, Jason Low wrote:
> On Wed, 2014-08-27 at 16:32 -0700, Tim Chen wrote:

> > If there are multiple non-forced updates, option 1's error seems to
> > accumulate and non-bounded as we do not actually update?  
> > Is this a concern?
> 
> It should be fine. Once the delta is large enough, we will end up doing
> the update anyway.

Well, the thing is you can have nr_cpus * 12.5% of outstanding delta;
that might be a lot, esp on the large machines.

Now there's two problems with all this; the first is the relative
threshold, typically such per-cpu things have a fixed update threshold,
this makes it much easier to qualify the actual error.

Secondly the indeed the nr_cpus in the error bound. Some things; like
the proportion code scale the threshold by log2(nr_cpus) in an attempt
to do something sensible there.

But yes, unbounded errors here are a problem, sure relaxing the updates
makes things go fast, they also make things go skew.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-09-01 Thread Peter Zijlstra
On Thu, Aug 28, 2014 at 12:46:36PM -0700, Jason Low wrote:
 On Wed, 2014-08-27 at 16:32 -0700, Tim Chen wrote:

  If there are multiple non-forced updates, option 1's error seems to
  accumulate and non-bounded as we do not actually update?  
  Is this a concern?
 
 It should be fine. Once the delta is large enough, we will end up doing
 the update anyway.

Well, the thing is you can have nr_cpus * 12.5% of outstanding delta;
that might be a lot, esp on the large machines.

Now there's two problems with all this; the first is the relative
threshold, typically such per-cpu things have a fixed update threshold,
this makes it much easier to qualify the actual error.

Secondly the indeed the nr_cpus in the error bound. Some things; like
the proportion code scale the threshold by log2(nr_cpus) in an attempt
to do something sensible there.

But yes, unbounded errors here are a problem, sure relaxing the updates
makes things go fast, they also make things go skew.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-28 Thread Jason Low
On Wed, 2014-08-27 at 16:32 -0700, Tim Chen wrote:
> On Wed, 2014-08-27 at 10:34 -0700, Jason Low wrote:
> > On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
> > > On Tue, Aug 26, 2014 at 4:11 PM, Jason Low  wrote:
> > > > Based on perf profiles, the update_cfs_rq_blocked_load function 
> > > > constantly
> > > > shows up as taking up a noticeable % of system run time. This is 
> > > > especially
> > > > apparent on larger numa systems.
> > > >
> > > > Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
> > > > updating the tg load contribution stats. However, it was noticed that 
> > > > the
> > > > values often don't get modified by much. In fact, much of the time, they
> > > > don't get modified at all. However, the update can always get attempted 
> > > > due
> > > > to force_update.
> > > >
> > > > In this patch, we remove the force_update in only the
> > > > __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
> > > > modified only if the delta is large enough (in the current code, they 
> > > > get
> > > > updated when the delta is larger than 12.5%). This is a way to 
> > > > rate-limit
> > > > the updates while largely keeping the values accurate.
> > > >
> > > > When testing this change with AIM7 workloads, we found that it was able 
> > > > to
> > > > reduce the overhead of the function by up to a factor of 20x.
> > > 
> > > Looks reasonable.
> > > 
> > > >
> > > > Cc: Yuyang Du 
> > > > Cc: Waiman Long 
> > > > Cc: Mel Gorman 
> > > > Cc: Mike Galbraith 
> > > > Cc: Rik van Riel 
> > > > Cc: Aswin Chandramouleeswaran 
> > > > Cc: Chegu Vinod 
> > > > Cc: Scott J Norton 
> > > > Signed-off-by: Jason Low 
> > > > ---
> > > >  kernel/sched/fair.c |   10 --
> > > >  1 files changed, 4 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index fea7d33..7a6e18b 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -2352,8 +2352,7 @@ static inline u64 
> > > > __synchronize_entity_decay(struct sched_entity *se)
> > > >  }
> > > >
> > > >  #ifdef CONFIG_FAIR_GROUP_SCHED
> > > > -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq 
> > > > *cfs_rq,
> > > > -int force_update)
> > > > +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq 
> > > > *cfs_rq)
> > > >  {
> > > > struct task_group *tg = cfs_rq->tg;
> > > > long tg_contrib;
> > > > @@ -2361,7 +2360,7 @@ static inline void 
> > > > __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > > > tg_contrib = cfs_rq->runnable_load_avg + 
> > > > cfs_rq->blocked_load_avg;
> > > > tg_contrib -= cfs_rq->tg_load_contrib;
> > > >
> > > > -   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 
> > > > 8) {
> > > 
> > > Another option with slightly higher accuracy would be to increase the
> > > sensitivity here when force_update == 1.
> > > 
> > > E.g.:
> > > abs(tg_contrib) > cfs_rq->tg_load_contrib / (8 * (1 + force_update))) 
> > > { ...
> > > 
> > > Alternatively we could bound total inaccuracy:
> > >int divisor = force_update ? NR_CPUS : 8;
> > >if (abs(tg_contrib) > cfs_rq->tg_load_contrib / divisor) { ...
> > > 
> > > 
> > > [ And probably rename force_update to want_update ]
> > 
> > Out of the 2 additional options, I think the first one is better. The
> > other option of using NR_CPUS looks like we're increasing the update
> > rate as the system gets larger, and its the larger systems that are
> > typically more affected by the contention.
> 
> Probably num_present_cpus() will be better than NR_CPUS, which can
> be much larger than the actual cpus present.

Yeah, num_present_cpus(), though the same issue would still be there.

> > 
> > Do you prefer either of the 2 other options over this v2 patch? If so, I
> > can test and send out a new patch, otherwise, I'll keep this current v2
> > patch.
> 
> If there are multiple non-forced updates, option 1's error seems to
> accumulate and non-bounded as we do not actually update?  
> Is this a concern?

It should be fine. Once the delta is large enough, we will end up doing
the update anyway.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-28 Thread Jason Low
On Wed, 2014-08-27 at 16:32 -0700, Tim Chen wrote:
 On Wed, 2014-08-27 at 10:34 -0700, Jason Low wrote:
  On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
   On Tue, Aug 26, 2014 at 4:11 PM, Jason Low jason.l...@hp.com wrote:
Based on perf profiles, the update_cfs_rq_blocked_load function 
constantly
shows up as taking up a noticeable % of system run time. This is 
especially
apparent on larger numa systems.
   
Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
updating the tg load contribution stats. However, it was noticed that 
the
values often don't get modified by much. In fact, much of the time, they
don't get modified at all. However, the update can always get attempted 
due
to force_update.
   
In this patch, we remove the force_update in only the
__update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
modified only if the delta is large enough (in the current code, they 
get
updated when the delta is larger than 12.5%). This is a way to 
rate-limit
the updates while largely keeping the values accurate.
   
When testing this change with AIM7 workloads, we found that it was able 
to
reduce the overhead of the function by up to a factor of 20x.
   
   Looks reasonable.
   
   
Cc: Yuyang Du yuyang...@intel.com
Cc: Waiman Long waiman.l...@hp.com
Cc: Mel Gorman mgor...@suse.de
Cc: Mike Galbraith umgwanakikb...@gmail.com
Cc: Rik van Riel r...@redhat.com
Cc: Aswin Chandramouleeswaran as...@hp.com
Cc: Chegu Vinod chegu_vi...@hp.com
Cc: Scott J Norton scott.nor...@hp.com
Signed-off-by: Jason Low jason.l...@hp.com
---
 kernel/sched/fair.c |   10 --
 1 files changed, 4 insertions(+), 6 deletions(-)
   
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea7d33..7a6e18b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2352,8 +2352,7 @@ static inline u64 
__synchronize_entity_decay(struct sched_entity *se)
 }
   
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq 
*cfs_rq,
-int force_update)
+static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq 
*cfs_rq)
 {
struct task_group *tg = cfs_rq-tg;
long tg_contrib;
@@ -2361,7 +2360,7 @@ static inline void 
__update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
tg_contrib = cfs_rq-runnable_load_avg + 
cfs_rq-blocked_load_avg;
tg_contrib -= cfs_rq-tg_load_contrib;
   
-   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 
8) {
   
   Another option with slightly higher accuracy would be to increase the
   sensitivity here when force_update == 1.
   
   E.g.:
   abs(tg_contrib)  cfs_rq-tg_load_contrib / (8 * (1 + force_update))) 
   { ...
   
   Alternatively we could bound total inaccuracy:
  int divisor = force_update ? NR_CPUS : 8;
  if (abs(tg_contrib)  cfs_rq-tg_load_contrib / divisor) { ...
   
   
   [ And probably rename force_update to want_update ]
  
  Out of the 2 additional options, I think the first one is better. The
  other option of using NR_CPUS looks like we're increasing the update
  rate as the system gets larger, and its the larger systems that are
  typically more affected by the contention.
 
 Probably num_present_cpus() will be better than NR_CPUS, which can
 be much larger than the actual cpus present.

Yeah, num_present_cpus(), though the same issue would still be there.

  
  Do you prefer either of the 2 other options over this v2 patch? If so, I
  can test and send out a new patch, otherwise, I'll keep this current v2
  patch.
 
 If there are multiple non-forced updates, option 1's error seems to
 accumulate and non-bounded as we do not actually update?  
 Is this a concern?

It should be fine. Once the delta is large enough, we will end up doing
the update anyway.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-27 Thread Tim Chen
On Wed, 2014-08-27 at 10:34 -0700, Jason Low wrote:
> On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
> > On Tue, Aug 26, 2014 at 4:11 PM, Jason Low  wrote:
> > > Based on perf profiles, the update_cfs_rq_blocked_load function constantly
> > > shows up as taking up a noticeable % of system run time. This is 
> > > especially
> > > apparent on larger numa systems.
> > >
> > > Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
> > > updating the tg load contribution stats. However, it was noticed that the
> > > values often don't get modified by much. In fact, much of the time, they
> > > don't get modified at all. However, the update can always get attempted 
> > > due
> > > to force_update.
> > >
> > > In this patch, we remove the force_update in only the
> > > __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
> > > modified only if the delta is large enough (in the current code, they get
> > > updated when the delta is larger than 12.5%). This is a way to rate-limit
> > > the updates while largely keeping the values accurate.
> > >
> > > When testing this change with AIM7 workloads, we found that it was able to
> > > reduce the overhead of the function by up to a factor of 20x.
> > 
> > Looks reasonable.
> > 
> > >
> > > Cc: Yuyang Du 
> > > Cc: Waiman Long 
> > > Cc: Mel Gorman 
> > > Cc: Mike Galbraith 
> > > Cc: Rik van Riel 
> > > Cc: Aswin Chandramouleeswaran 
> > > Cc: Chegu Vinod 
> > > Cc: Scott J Norton 
> > > Signed-off-by: Jason Low 
> > > ---
> > >  kernel/sched/fair.c |   10 --
> > >  1 files changed, 4 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index fea7d33..7a6e18b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
> > > sched_entity *se)
> > >  }
> > >
> > >  #ifdef CONFIG_FAIR_GROUP_SCHED
> > > -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > > -int force_update)
> > > +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
> > >  {
> > > struct task_group *tg = cfs_rq->tg;
> > > long tg_contrib;
> > > @@ -2361,7 +2360,7 @@ static inline void 
> > > __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > > tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> > > tg_contrib -= cfs_rq->tg_load_contrib;
> > >
> > > -   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 
> > > 8) {
> > 
> > Another option with slightly higher accuracy would be to increase the
> > sensitivity here when force_update == 1.
> > 
> > E.g.:
> > abs(tg_contrib) > cfs_rq->tg_load_contrib / (8 * (1 + force_update))) { 
> > ...
> > 
> > Alternatively we could bound total inaccuracy:
> >int divisor = force_update ? NR_CPUS : 8;
> >if (abs(tg_contrib) > cfs_rq->tg_load_contrib / divisor) { ...
> > 
> > 
> > [ And probably rename force_update to want_update ]
> 
> Out of the 2 additional options, I think the first one is better. The
> other option of using NR_CPUS looks like we're increasing the update
> rate as the system gets larger, and its the larger systems that are
> typically more affected by the contention.

Probably num_present_cpus() will be better than NR_CPUS, which can
be much larger than the actual cpus present.

> 
> Do you prefer either of the 2 other options over this v2 patch? If so, I
> can test and send out a new patch, otherwise, I'll keep this current v2
> patch.

If there are multiple non-forced updates, option 1's error seems to
accumulate and non-bounded as we do not actually update?  
Is this a concern?

Thanks.

Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-27 Thread Jason Low
On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
> On Tue, Aug 26, 2014 at 4:11 PM, Jason Low  wrote:
> > Based on perf profiles, the update_cfs_rq_blocked_load function constantly
> > shows up as taking up a noticeable % of system run time. This is especially
> > apparent on larger numa systems.
> >
> > Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
> > updating the tg load contribution stats. However, it was noticed that the
> > values often don't get modified by much. In fact, much of the time, they
> > don't get modified at all. However, the update can always get attempted due
> > to force_update.
> >
> > In this patch, we remove the force_update in only the
> > __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
> > modified only if the delta is large enough (in the current code, they get
> > updated when the delta is larger than 12.5%). This is a way to rate-limit
> > the updates while largely keeping the values accurate.
> >
> > When testing this change with AIM7 workloads, we found that it was able to
> > reduce the overhead of the function by up to a factor of 20x.
> 
> Looks reasonable.
> 
> >
> > Cc: Yuyang Du 
> > Cc: Waiman Long 
> > Cc: Mel Gorman 
> > Cc: Mike Galbraith 
> > Cc: Rik van Riel 
> > Cc: Aswin Chandramouleeswaran 
> > Cc: Chegu Vinod 
> > Cc: Scott J Norton 
> > Signed-off-by: Jason Low 
> > ---
> >  kernel/sched/fair.c |   10 --
> >  1 files changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fea7d33..7a6e18b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
> > sched_entity *se)
> >  }
> >
> >  #ifdef CONFIG_FAIR_GROUP_SCHED
> > -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > -int force_update)
> > +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
> >  {
> > struct task_group *tg = cfs_rq->tg;
> > long tg_contrib;
> > @@ -2361,7 +2360,7 @@ static inline void 
> > __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> > tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> > tg_contrib -= cfs_rq->tg_load_contrib;
> >
> > -   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> 
> Another option with slightly higher accuracy would be to increase the
> sensitivity here when force_update == 1.
> 
> E.g.:
> abs(tg_contrib) > cfs_rq->tg_load_contrib / (8 * (1 + force_update))) { 
> ...
> 
> Alternatively we could bound total inaccuracy:
>int divisor = force_update ? NR_CPUS : 8;
>if (abs(tg_contrib) > cfs_rq->tg_load_contrib / divisor) { ...
> 
> 
> [ And probably rename force_update to want_update ]

Out of the 2 additional options, I think the first one is better. The
other option of using NR_CPUS looks like we're increasing the update
rate as the system gets larger, and its the larger systems that are
typically more affected by the contention.

Do you prefer either of the 2 other options over this v2 patch? If so, I
can test and send out a new patch, otherwise, I'll keep this current v2
patch.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-27 Thread Jason Low
On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
 On Tue, Aug 26, 2014 at 4:11 PM, Jason Low jason.l...@hp.com wrote:
  Based on perf profiles, the update_cfs_rq_blocked_load function constantly
  shows up as taking up a noticeable % of system run time. This is especially
  apparent on larger numa systems.
 
  Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
  updating the tg load contribution stats. However, it was noticed that the
  values often don't get modified by much. In fact, much of the time, they
  don't get modified at all. However, the update can always get attempted due
  to force_update.
 
  In this patch, we remove the force_update in only the
  __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
  modified only if the delta is large enough (in the current code, they get
  updated when the delta is larger than 12.5%). This is a way to rate-limit
  the updates while largely keeping the values accurate.
 
  When testing this change with AIM7 workloads, we found that it was able to
  reduce the overhead of the function by up to a factor of 20x.
 
 Looks reasonable.
 
 
  Cc: Yuyang Du yuyang...@intel.com
  Cc: Waiman Long waiman.l...@hp.com
  Cc: Mel Gorman mgor...@suse.de
  Cc: Mike Galbraith umgwanakikb...@gmail.com
  Cc: Rik van Riel r...@redhat.com
  Cc: Aswin Chandramouleeswaran as...@hp.com
  Cc: Chegu Vinod chegu_vi...@hp.com
  Cc: Scott J Norton scott.nor...@hp.com
  Signed-off-by: Jason Low jason.l...@hp.com
  ---
   kernel/sched/fair.c |   10 --
   1 files changed, 4 insertions(+), 6 deletions(-)
 
  diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
  index fea7d33..7a6e18b 100644
  --- a/kernel/sched/fair.c
  +++ b/kernel/sched/fair.c
  @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
  sched_entity *se)
   }
 
   #ifdef CONFIG_FAIR_GROUP_SCHED
  -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
  -int force_update)
  +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
   {
  struct task_group *tg = cfs_rq-tg;
  long tg_contrib;
  @@ -2361,7 +2360,7 @@ static inline void 
  __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
  tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
  tg_contrib -= cfs_rq-tg_load_contrib;
 
  -   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
 
 Another option with slightly higher accuracy would be to increase the
 sensitivity here when force_update == 1.
 
 E.g.:
 abs(tg_contrib)  cfs_rq-tg_load_contrib / (8 * (1 + force_update))) { 
 ...
 
 Alternatively we could bound total inaccuracy:
int divisor = force_update ? NR_CPUS : 8;
if (abs(tg_contrib)  cfs_rq-tg_load_contrib / divisor) { ...
 
 
 [ And probably rename force_update to want_update ]

Out of the 2 additional options, I think the first one is better. The
other option of using NR_CPUS looks like we're increasing the update
rate as the system gets larger, and its the larger systems that are
typically more affected by the contention.

Do you prefer either of the 2 other options over this v2 patch? If so, I
can test and send out a new patch, otherwise, I'll keep this current v2
patch.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-27 Thread Tim Chen
On Wed, 2014-08-27 at 10:34 -0700, Jason Low wrote:
 On Tue, 2014-08-26 at 16:24 -0700, Paul Turner wrote:
  On Tue, Aug 26, 2014 at 4:11 PM, Jason Low jason.l...@hp.com wrote:
   Based on perf profiles, the update_cfs_rq_blocked_load function constantly
   shows up as taking up a noticeable % of system run time. This is 
   especially
   apparent on larger numa systems.
  
   Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
   updating the tg load contribution stats. However, it was noticed that the
   values often don't get modified by much. In fact, much of the time, they
   don't get modified at all. However, the update can always get attempted 
   due
   to force_update.
  
   In this patch, we remove the force_update in only the
   __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
   modified only if the delta is large enough (in the current code, they get
   updated when the delta is larger than 12.5%). This is a way to rate-limit
   the updates while largely keeping the values accurate.
  
   When testing this change with AIM7 workloads, we found that it was able to
   reduce the overhead of the function by up to a factor of 20x.
  
  Looks reasonable.
  
  
   Cc: Yuyang Du yuyang...@intel.com
   Cc: Waiman Long waiman.l...@hp.com
   Cc: Mel Gorman mgor...@suse.de
   Cc: Mike Galbraith umgwanakikb...@gmail.com
   Cc: Rik van Riel r...@redhat.com
   Cc: Aswin Chandramouleeswaran as...@hp.com
   Cc: Chegu Vinod chegu_vi...@hp.com
   Cc: Scott J Norton scott.nor...@hp.com
   Signed-off-by: Jason Low jason.l...@hp.com
   ---
kernel/sched/fair.c |   10 --
1 files changed, 4 insertions(+), 6 deletions(-)
  
   diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
   index fea7d33..7a6e18b 100644
   --- a/kernel/sched/fair.c
   +++ b/kernel/sched/fair.c
   @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
   sched_entity *se)
}
  
#ifdef CONFIG_FAIR_GROUP_SCHED
   -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
   -int force_update)
   +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
{
   struct task_group *tg = cfs_rq-tg;
   long tg_contrib;
   @@ -2361,7 +2360,7 @@ static inline void 
   __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
   tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
   tg_contrib -= cfs_rq-tg_load_contrib;
  
   -   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 
   8) {
  
  Another option with slightly higher accuracy would be to increase the
  sensitivity here when force_update == 1.
  
  E.g.:
  abs(tg_contrib)  cfs_rq-tg_load_contrib / (8 * (1 + force_update))) { 
  ...
  
  Alternatively we could bound total inaccuracy:
 int divisor = force_update ? NR_CPUS : 8;
 if (abs(tg_contrib)  cfs_rq-tg_load_contrib / divisor) { ...
  
  
  [ And probably rename force_update to want_update ]
 
 Out of the 2 additional options, I think the first one is better. The
 other option of using NR_CPUS looks like we're increasing the update
 rate as the system gets larger, and its the larger systems that are
 typically more affected by the contention.

Probably num_present_cpus() will be better than NR_CPUS, which can
be much larger than the actual cpus present.

 
 Do you prefer either of the 2 other options over this v2 patch? If so, I
 can test and send out a new patch, otherwise, I'll keep this current v2
 patch.

If there are multiple non-forced updates, option 1's error seems to
accumulate and non-bounded as we do not actually update?  
Is this a concern?

Thanks.

Tim

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-26 Thread Paul Turner
On Tue, Aug 26, 2014 at 4:11 PM, Jason Low  wrote:
> Based on perf profiles, the update_cfs_rq_blocked_load function constantly
> shows up as taking up a noticeable % of system run time. This is especially
> apparent on larger numa systems.
>
> Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
> updating the tg load contribution stats. However, it was noticed that the
> values often don't get modified by much. In fact, much of the time, they
> don't get modified at all. However, the update can always get attempted due
> to force_update.
>
> In this patch, we remove the force_update in only the
> __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
> modified only if the delta is large enough (in the current code, they get
> updated when the delta is larger than 12.5%). This is a way to rate-limit
> the updates while largely keeping the values accurate.
>
> When testing this change with AIM7 workloads, we found that it was able to
> reduce the overhead of the function by up to a factor of 20x.

Looks reasonable.

>
> Cc: Yuyang Du 
> Cc: Waiman Long 
> Cc: Mel Gorman 
> Cc: Mike Galbraith 
> Cc: Rik van Riel 
> Cc: Aswin Chandramouleeswaran 
> Cc: Chegu Vinod 
> Cc: Scott J Norton 
> Signed-off-by: Jason Low 
> ---
>  kernel/sched/fair.c |   10 --
>  1 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fea7d33..7a6e18b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
> sched_entity *se)
>  }
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> -int force_update)
> +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
>  {
> struct task_group *tg = cfs_rq->tg;
> long tg_contrib;
> @@ -2361,7 +2360,7 @@ static inline void 
> __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> tg_contrib -= cfs_rq->tg_load_contrib;
>
> -   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {

Another option with slightly higher accuracy would be to increase the
sensitivity here when force_update == 1.

E.g.:
abs(tg_contrib) > cfs_rq->tg_load_contrib / (8 * (1 + force_update))) { ...

Alternatively we could bound total inaccuracy:
   int divisor = force_update ? NR_CPUS : 8;
   if (abs(tg_contrib) > cfs_rq->tg_load_contrib / divisor) { ...


[ And probably rename force_update to want_update ]


> +   if (abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> atomic_long_add(tg_contrib, >load_avg);
> cfs_rq->tg_load_contrib += tg_contrib;
> }
> @@ -2436,8 +2435,7 @@ static inline void update_rq_runnable_avg(struct rq 
> *rq, int runnable)
> __update_tg_runnable_avg(>avg, >cfs);
>  }
>  #else /* CONFIG_FAIR_GROUP_SCHED */
> -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> -int force_update) {}
> +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) {}
>  static inline void __update_tg_runnable_avg(struct sched_avg *sa,
>   struct cfs_rq *cfs_rq) {}
>  static inline void __update_group_entity_contrib(struct sched_entity *se) {}
> @@ -2537,7 +2535,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
> *cfs_rq, int force_update)
> cfs_rq->last_decay = now;
> }
>
> -   __update_cfs_rq_tg_load_contrib(cfs_rq, force_update);
> +   __update_cfs_rq_tg_load_contrib(cfs_rq);
>  }
>
>  /* Add the load generated by se into cfs_rq's child load-average */
> --
> 1.7.1
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-26 Thread Jason Low
Based on perf profiles, the update_cfs_rq_blocked_load function constantly
shows up as taking up a noticeable % of system run time. This is especially
apparent on larger numa systems.

Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
updating the tg load contribution stats. However, it was noticed that the
values often don't get modified by much. In fact, much of the time, they
don't get modified at all. However, the update can always get attempted due
to force_update.

In this patch, we remove the force_update in only the
__update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
modified only if the delta is large enough (in the current code, they get
updated when the delta is larger than 12.5%). This is a way to rate-limit
the updates while largely keeping the values accurate.

When testing this change with AIM7 workloads, we found that it was able to
reduce the overhead of the function by up to a factor of 20x.

Cc: Yuyang Du 
Cc: Waiman Long 
Cc: Mel Gorman 
Cc: Mike Galbraith 
Cc: Rik van Riel 
Cc: Aswin Chandramouleeswaran 
Cc: Chegu Vinod 
Cc: Scott J Norton 
Signed-off-by: Jason Low 
---
 kernel/sched/fair.c |   10 --
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea7d33..7a6e18b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
sched_entity *se)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
-int force_update)
+static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
 {
struct task_group *tg = cfs_rq->tg;
long tg_contrib;
@@ -2361,7 +2360,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct 
cfs_rq *cfs_rq,
tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
tg_contrib -= cfs_rq->tg_load_contrib;
 
-   if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
+   if (abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
atomic_long_add(tg_contrib, >load_avg);
cfs_rq->tg_load_contrib += tg_contrib;
}
@@ -2436,8 +2435,7 @@ static inline void update_rq_runnable_avg(struct rq *rq, 
int runnable)
__update_tg_runnable_avg(>avg, >cfs);
 }
 #else /* CONFIG_FAIR_GROUP_SCHED */
-static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
-int force_update) {}
+static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) {}
 static inline void __update_tg_runnable_avg(struct sched_avg *sa,
  struct cfs_rq *cfs_rq) {}
 static inline void __update_group_entity_contrib(struct sched_entity *se) {}
@@ -2537,7 +2535,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
*cfs_rq, int force_update)
cfs_rq->last_decay = now;
}
 
-   __update_cfs_rq_tg_load_contrib(cfs_rq, force_update);
+   __update_cfs_rq_tg_load_contrib(cfs_rq);
 }
 
 /* Add the load generated by se into cfs_rq's child load-average */
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-26 Thread Jason Low
Based on perf profiles, the update_cfs_rq_blocked_load function constantly
shows up as taking up a noticeable % of system run time. This is especially
apparent on larger numa systems.

Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
updating the tg load contribution stats. However, it was noticed that the
values often don't get modified by much. In fact, much of the time, they
don't get modified at all. However, the update can always get attempted due
to force_update.

In this patch, we remove the force_update in only the
__update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
modified only if the delta is large enough (in the current code, they get
updated when the delta is larger than 12.5%). This is a way to rate-limit
the updates while largely keeping the values accurate.

When testing this change with AIM7 workloads, we found that it was able to
reduce the overhead of the function by up to a factor of 20x.

Cc: Yuyang Du yuyang...@intel.com
Cc: Waiman Long waiman.l...@hp.com
Cc: Mel Gorman mgor...@suse.de
Cc: Mike Galbraith umgwanakikb...@gmail.com
Cc: Rik van Riel r...@redhat.com
Cc: Aswin Chandramouleeswaran as...@hp.com
Cc: Chegu Vinod chegu_vi...@hp.com
Cc: Scott J Norton scott.nor...@hp.com
Signed-off-by: Jason Low jason.l...@hp.com
---
 kernel/sched/fair.c |   10 --
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea7d33..7a6e18b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
sched_entity *se)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
-int force_update)
+static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
 {
struct task_group *tg = cfs_rq-tg;
long tg_contrib;
@@ -2361,7 +2360,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct 
cfs_rq *cfs_rq,
tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
tg_contrib -= cfs_rq-tg_load_contrib;
 
-   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
+   if (abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
atomic_long_add(tg_contrib, tg-load_avg);
cfs_rq-tg_load_contrib += tg_contrib;
}
@@ -2436,8 +2435,7 @@ static inline void update_rq_runnable_avg(struct rq *rq, 
int runnable)
__update_tg_runnable_avg(rq-avg, rq-cfs);
 }
 #else /* CONFIG_FAIR_GROUP_SCHED */
-static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
-int force_update) {}
+static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) {}
 static inline void __update_tg_runnable_avg(struct sched_avg *sa,
  struct cfs_rq *cfs_rq) {}
 static inline void __update_group_entity_contrib(struct sched_entity *se) {}
@@ -2537,7 +2535,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
*cfs_rq, int force_update)
cfs_rq-last_decay = now;
}
 
-   __update_cfs_rq_tg_load_contrib(cfs_rq, force_update);
+   __update_cfs_rq_tg_load_contrib(cfs_rq);
 }
 
 /* Add the load generated by se into cfs_rq's child load-average */
-- 
1.7.1



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] sched: Reduce contention in update_cfs_rq_blocked_load

2014-08-26 Thread Paul Turner
On Tue, Aug 26, 2014 at 4:11 PM, Jason Low jason.l...@hp.com wrote:
 Based on perf profiles, the update_cfs_rq_blocked_load function constantly
 shows up as taking up a noticeable % of system run time. This is especially
 apparent on larger numa systems.

 Much of the contention is in __update_cfs_rq_tg_load_contrib when we're
 updating the tg load contribution stats. However, it was noticed that the
 values often don't get modified by much. In fact, much of the time, they
 don't get modified at all. However, the update can always get attempted due
 to force_update.

 In this patch, we remove the force_update in only the
 __update_cfs_rq_tg_load_contrib. Thus the tg load contrib stats now get
 modified only if the delta is large enough (in the current code, they get
 updated when the delta is larger than 12.5%). This is a way to rate-limit
 the updates while largely keeping the values accurate.

 When testing this change with AIM7 workloads, we found that it was able to
 reduce the overhead of the function by up to a factor of 20x.

Looks reasonable.


 Cc: Yuyang Du yuyang...@intel.com
 Cc: Waiman Long waiman.l...@hp.com
 Cc: Mel Gorman mgor...@suse.de
 Cc: Mike Galbraith umgwanakikb...@gmail.com
 Cc: Rik van Riel r...@redhat.com
 Cc: Aswin Chandramouleeswaran as...@hp.com
 Cc: Chegu Vinod chegu_vi...@hp.com
 Cc: Scott J Norton scott.nor...@hp.com
 Signed-off-by: Jason Low jason.l...@hp.com
 ---
  kernel/sched/fair.c |   10 --
  1 files changed, 4 insertions(+), 6 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index fea7d33..7a6e18b 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct 
 sched_entity *se)
  }

  #ifdef CONFIG_FAIR_GROUP_SCHED
 -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 -int force_update)
 +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq)
  {
 struct task_group *tg = cfs_rq-tg;
 long tg_contrib;
 @@ -2361,7 +2360,7 @@ static inline void 
 __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 tg_contrib = cfs_rq-runnable_load_avg + cfs_rq-blocked_load_avg;
 tg_contrib -= cfs_rq-tg_load_contrib;

 -   if (force_update || abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {

Another option with slightly higher accuracy would be to increase the
sensitivity here when force_update == 1.

E.g.:
abs(tg_contrib)  cfs_rq-tg_load_contrib / (8 * (1 + force_update))) { ...

Alternatively we could bound total inaccuracy:
   int divisor = force_update ? NR_CPUS : 8;
   if (abs(tg_contrib)  cfs_rq-tg_load_contrib / divisor) { ...


[ And probably rename force_update to want_update ]


 +   if (abs(tg_contrib)  cfs_rq-tg_load_contrib / 8) {
 atomic_long_add(tg_contrib, tg-load_avg);
 cfs_rq-tg_load_contrib += tg_contrib;
 }
 @@ -2436,8 +2435,7 @@ static inline void update_rq_runnable_avg(struct rq 
 *rq, int runnable)
 __update_tg_runnable_avg(rq-avg, rq-cfs);
  }
  #else /* CONFIG_FAIR_GROUP_SCHED */
 -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 -int force_update) {}
 +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) {}
  static inline void __update_tg_runnable_avg(struct sched_avg *sa,
   struct cfs_rq *cfs_rq) {}
  static inline void __update_group_entity_contrib(struct sched_entity *se) {}
 @@ -2537,7 +2535,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq 
 *cfs_rq, int force_update)
 cfs_rq-last_decay = now;
 }

 -   __update_cfs_rq_tg_load_contrib(cfs_rq, force_update);
 +   __update_cfs_rq_tg_load_contrib(cfs_rq);
  }

  /* Add the load generated by se into cfs_rq's child load-average */
 --
 1.7.1



 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/