Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-05 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/31/2014 01:04 AM, Aaron Lu wrote:

 +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
 unsigned long group_faults_cpu(struct numa_group *group, int
 nid)
 
 /* * These return the fraction of accesses done by a
 particular task, or - * task group, on a particular numa
 node.  The group weight is given a - * larger multiplier, in
 order to group tasks together that are almost - * evenly
 spread out between numa nodes. + * task group, on a
 particular numa node.  The NUMA move threshold + * prevents
 task moves with marginal improvement, and is set to 5%. */ 
 +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
 NUMA_SCALE / 100)
>> 
>> It would be good to see if changing NUMA_MOVE_THRESH to 
>> (NUMA_SCALE / 8) does the trick.
> 
> With your 2nd patch and the above change, the result is:

Peter,

the threshold does not seem to make a difference for the
performance tests on my system, I guess you can drop this
patch :)

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT4U/xAAoJEM553pKExN6DY4oH/ihJDmcCSZ0sKqGbyzJqLrFY
KWCEXhfiN6hQJBrmeOvrbzlHsMH0LzYfgTVnc1nteAcnUXiBeqkgxwf+S1dmvoFr
DZSxC+9tQ68ho0YcLd7rpEMfsnwOQAB9BgX8GxxwMb8q5zZ9Bz3r9NKVF0P2D3cj
eeJ8Z3EGaKOteVhwAPVPeuTf7xwhqoqp4ujLgTL7BcaifqvGhi3+uo9/KcavE15d
eale3MuhbCIsAQeyB4SwgGwilE/oZTPTos4BNdUrIyxO4nDajbeLb1qsLSHYcirH
CA7++bTE9V6TvO1tBLVpeYdSAGcDKKUBHM6N+0UDwkR/Tp4oRyQ115Peo2H34ak=
=kFxZ
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-05 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/31/2014 01:04 AM, Aaron Lu wrote:

 +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
 unsigned long group_faults_cpu(struct numa_group *group, int
 nid)
 
 /* * These return the fraction of accesses done by a
 particular task, or - * task group, on a particular numa
 node.  The group weight is given a - * larger multiplier, in
 order to group tasks together that are almost - * evenly
 spread out between numa nodes. + * task group, on a
 particular numa node.  The NUMA move threshold + * prevents
 task moves with marginal improvement, and is set to 5%. */ 
 +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
 NUMA_SCALE / 100)
 
 It would be good to see if changing NUMA_MOVE_THRESH to 
 (NUMA_SCALE / 8) does the trick.
 
 With your 2nd patch and the above change, the result is:

Peter,

the threshold does not seem to make a difference for the
performance tests on my system, I guess you can drop this
patch :)

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT4U/xAAoJEM553pKExN6DY4oH/ihJDmcCSZ0sKqGbyzJqLrFY
KWCEXhfiN6hQJBrmeOvrbzlHsMH0LzYfgTVnc1nteAcnUXiBeqkgxwf+S1dmvoFr
DZSxC+9tQ68ho0YcLd7rpEMfsnwOQAB9BgX8GxxwMb8q5zZ9Bz3r9NKVF0P2D3cj
eeJ8Z3EGaKOteVhwAPVPeuTf7xwhqoqp4ujLgTL7BcaifqvGhi3+uo9/KcavE15d
eale3MuhbCIsAQeyB4SwgGwilE/oZTPTos4BNdUrIyxO4nDajbeLb1qsLSHYcirH
CA7++bTE9V6TvO1tBLVpeYdSAGcDKKUBHM6N+0UDwkR/Tp4oRyQ115Peo2H34ak=
=kFxZ
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Jirka Hladky

On 08/02/2014 06:17 AM, Rik van Riel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/01/2014 05:30 PM, Jirka Hladky wrote:


I see the regression only on this box. It has 4 "Ivy Bridge-EX"
Xeon E7-4890 v2 CPUs.

http://ark.intel.com/products/75251
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2



Please rerun the test on box with Ivy Bridge CPUs. It seems that
older CPU generations are not affected.

That would have been good info to know :)

I've been spending about a month trying to reproduce your issue on a
Westmere E7-4860.

Good thing I found all kinds of other scheduler issues along the way...


Hi Rik,

till recently I have seen the regression on all systems.

With the latest kernel, only Ivy Bridge system seems to be affected.

Jirka

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Fri, Aug 01, 2014 at 11:30:34PM +0200, Jirka Hladky wrote:
> I see the regression only on this box. It has 4 "Ivy Bridge-EX" Xeon E7-4890
> v2 CPUs.

That's the exact CPU I've got in the 4 node machine I did the tests on.



pgpSRWYqkSh2L.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/01/2014 05:30 PM, Jirka Hladky wrote:

> I see the regression only on this box. It has 4 "Ivy Bridge-EX"
> Xeon E7-4890 v2 CPUs.
> 
> http://ark.intel.com/products/75251 
> http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2
>
> 
> 
> Please rerun the test on box with Ivy Bridge CPUs. It seems that
> older CPU generations are not affected.

That would have been good info to know :)

I've been spending about a month trying to reproduce your issue on a
Westmere E7-4860.

Good thing I found all kinds of other scheduler issues along the way...

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT3GZBAAoJEM553pKExN6D4FcH/2c/kYOZkbJeLBEWJHB0yWNR
tqI2Lt/qxPxOKADlDylJwj2Dq8R19Cc4tnJZAdPh+wgCivFefseQY0MI1TI8CO/Z
vEH+dCG8hokygFxKqAX9udI0MD1OxfTKKIk4fdjInZ632JG+JHnqVH6qWxBsriXD
151jzCR/zQEjg6gyCc8YsL06Q9YHyVv7dakggtRkYnE1GIUAtTDhFttRpNYoiVQQ
y/d32adq//PywTmsyWwJMu1ZGe1eGC57JBYzjoUo2iOlFQ9QR+fe4W2/6ZCbekwK
O8ZYbrJzDGrNQP2yDYd+o040KeVfzYkOtwz7+/40TYIvqFiuvKxEAxbJ32+krxA=
=XxCE
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Jirka Hladky

On 08/01/2014 10:46 PM, Davidlohr Bueso wrote:

On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote:

Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total
number of cores in the box.

Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is
for 24 warehouses.

By looking at your graph, that's around a 10% difference.

So I'm not seeing anywhere near as bad a regression on a 80-core box.
Testing single with 80 warehouses, I get:

tip/master baseline:
677476.36 bops
705826.70 bops
704870.87 bops
681741.20 bops
707014.59 bops

Avg: 695385.94 bops

tip/master + patch (NUMA_SCALE/8 variant):
698242.66 bops
693873.18 bops
707852.28 bops
691785.96 bops
747206.03 bopsthis

Avg: 707792.022 bops

So both these are pretty similar, however, when reverting, on avg we
increase the amount of bops a mere ~4%:

tip/master + reverted:
778416.02 bops
702602.62 bops
712557.32 bops
713982.90 bops
783300.36 bops

Avg: 738171.84 bops

Are there perhaps any special specjbb options you are using?



I see the regression only on this box. It has 4 "Ivy Bridge-EX" Xeon 
E7-4890 v2 CPUs.


http://ark.intel.com/products/75251
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2

Please rerun the test on box with Ivy Bridge CPUs. It seems that older 
CPU generations are not affected.


Thanks
Jirka


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Davidlohr Bueso
On Fri, 2014-08-01 at 13:46 -0700, Davidlohr Bueso wrote:
> So both these are pretty similar, however, when reverting, on avg we
> increase the amount of bops a mere ~4%:
> 
> tip/master + reverted:

Just to be clear, this is reverting a43455a1d57.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote:
> Peter, I'm seeing regressions for
> 
> SINGLE SPECjbb instance for number of warehouses being the same as total 
> number of cores in the box.
> 
> Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is 
> for 24 warehouses.

By looking at your graph, that's around a 10% difference.

So I'm not seeing anywhere near as bad a regression on a 80-core box.
Testing single with 80 warehouses, I get:

tip/master baseline:
677476.36 bops
705826.70 bops
704870.87 bops
681741.20 bops 
707014.59 bops

Avg: 695385.94 bops

tip/master + patch (NUMA_SCALE/8 variant):
698242.66 bops
693873.18 bops 
707852.28 bops
691785.96 bops 
747206.03 bopsthis 

Avg: 707792.022 bops

So both these are pretty similar, however, when reverting, on avg we
increase the amount of bops a mere ~4%:

tip/master + reverted:
778416.02 bops 
702602.62 bops 
712557.32 bops 
713982.90 bops
783300.36 bops

Avg: 738171.84 bops

Are there perhaps any special specjbb options you are using?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 07:37:05PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote:
> > I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test
> > significantly please do the run with 20 warehouses only
> > (or in general with #warehouses ==  number of nodes * number of PHYSICAL
> > cores)
> 
> Yeah, went and did that for my 4 node machine, its got a ton more cores, but I
> matches the warehouses to it:
> 
> -a43455a1d57  tip/master
> 
> 979996.47 1144715.44
> 8761461098499.07
> 1058974.181019499.38
> 1055951.591139405.22
> 970504.01 1099659.09
> 
> 988314.45 1100355.64  (avg)
> 75059.546179565   50085.7473975167(stdev)
> 
> So for 5 runs, tip/master (which includes the offending patch) wins hands 
> down.
> 
> Each run is 2 minutes.

Because Rik asked for a43455a1d57^1 numbers:

546423.08
546558.63
545990.01
546015.98

some a43455a1d57 numbers:

538652.93
544333.57
542684.77

same setup and everything. So clearly the patches after that made 'some'
difference indeed, seeing how tip/master is almost twice that.

So the reason I didn't so a43455a1d57^1 vs a43455a1d57 is because we already
fingered a commit, after that what you test is the revert of that commit,
because revert is what you typically end up doing if a commit is fail.

But on the state of tip/master, taking that commit out is a net negative for
everything I've tested.


pgpQ2fcgd0Qwu.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Fengguang Wu
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
> > > > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > > > ---  -  
> > > > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > > > ivb42/hackbench/50%-threads-pipe
> > > > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > > > lkp-snb01/hackbench/50%-threads-socket
> > > > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > > > proc-vmstat.numa_hint_faults_local
> 
> > It means, for commit ebe06187bf2aec1, the number for
> > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
> > machine. The 3%, 4% following that number means the deviation of the
> > different runs to their average(we usually run it multiple times to
> > phase out possible sharp values). We should probably remove that
> > percentage, as they cause confusion if no detailed explanation and may
> > not mean much to the commit author and others(if the deviation is big
> > enough, we should simply drop that result).
> 
> Nah, variance is good, but the typical symbol would be +- or the fancy
> ±.
> 
> ~ when used as a unary op means 'approx' or 'about' or 'same order'
> ~ when used as a binary op means equivalence, a weaker equal, often in
> the vein of the unary op meaning.
> 
> Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics
> 
> So while I think having a measure of variance is good, I think you
> picked entirely the wrong symbol.

Good point! We'll first try ± for the stddev percent and fall back to
+- if it turn out to not work well in some cases.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Yuyang Du
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
> > > > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > > > ---  -  
> > > > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > > > ivb42/hackbench/50%-threads-pipe
> > > > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > > > lkp-snb01/hackbench/50%-threads-socket
> > > > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > > > proc-vmstat.numa_hint_faults_local
> 
> > It means, for commit ebe06187bf2aec1, the number for
> > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
> > machine. The 3%, 4% following that number means the deviation of the
> > different runs to their average(we usually run it multiple times to
> > phase out possible sharp values). We should probably remove that
> > percentage, as they cause confusion if no detailed explanation and may
> > not mean much to the commit author and others(if the deviation is big
> > enough, we should simply drop that result).
> 
> Nah, variance is good, but the typical symbol would be +- or the fancy
> ±.
> 
> ~ when used as a unary op means 'approx' or 'about' or 'same order'
> ~ when used as a binary op means equivalence, a weaker equal, often in
> the vein of the unary op meaning.
> 
> Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics
> 
> So while I think having a measure of variance is good, I think you
> picked entirely the wrong symbol.

Or, maybe you can use σ (lower case sigma) to indicate stddev, :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 09:03:22PM -0700, Davidlohr Bueso wrote:
> 
> Instead of removing info, why not document what each piece of data
> represents. Or add headers to the table. etc.

Yes headers are good, knowing exactly what a number is often removes a
lot of confusion ;-)


pgpKZNok1PSKp.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
> > > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > > ---  -  
> > > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > > ivb42/hackbench/50%-threads-pipe
> > > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > > lkp-snb01/hackbench/50%-threads-socket
> > > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > > proc-vmstat.numa_hint_faults_local

> It means, for commit ebe06187bf2aec1, the number for
> num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
> machine. The 3%, 4% following that number means the deviation of the
> different runs to their average(we usually run it multiple times to
> phase out possible sharp values). We should probably remove that
> percentage, as they cause confusion if no detailed explanation and may
> not mean much to the commit author and others(if the deviation is big
> enough, we should simply drop that result).

Nah, variance is good, but the typical symbol would be +- or the fancy
±.

~ when used as a unary op means 'approx' or 'about' or 'same order'
~ when used as a binary op means equivalence, a weaker equal, often in
the vein of the unary op meaning.

Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics

So while I think having a measure of variance is good, I think you
picked entirely the wrong symbol.


pgpndtQQlPmYX.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 94500 ~ 3%+115.6% 203711 ~ 6%  
ivb42/hackbench/50%-threads-pipe
 67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

 It means, for commit ebe06187bf2aec1, the number for
 num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
 machine. The 3%, 4% following that number means the deviation of the
 different runs to their average(we usually run it multiple times to
 phase out possible sharp values). We should probably remove that
 percentage, as they cause confusion if no detailed explanation and may
 not mean much to the commit author and others(if the deviation is big
 enough, we should simply drop that result).

Nah, variance is good, but the typical symbol would be +- or the fancy
±.

~ when used as a unary op means 'approx' or 'about' or 'same order'
~ when used as a binary op means equivalence, a weaker equal, often in
the vein of the unary op meaning.

Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics

So while I think having a measure of variance is good, I think you
picked entirely the wrong symbol.


pgpndtQQlPmYX.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 09:03:22PM -0700, Davidlohr Bueso wrote:
 
 Instead of removing info, why not document what each piece of data
 represents. Or add headers to the table. etc.

Yes headers are good, knowing exactly what a number is often removes a
lot of confusion ;-)


pgpKZNok1PSKp.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Yuyang Du
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote:
 On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
 ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
 ---  -  
  94500 ~ 3%+115.6% 203711 ~ 6%  
 ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
 lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
 proc-vmstat.numa_hint_faults_local
 
  It means, for commit ebe06187bf2aec1, the number for
  num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
  machine. The 3%, 4% following that number means the deviation of the
  different runs to their average(we usually run it multiple times to
  phase out possible sharp values). We should probably remove that
  percentage, as they cause confusion if no detailed explanation and may
  not mean much to the commit author and others(if the deviation is big
  enough, we should simply drop that result).
 
 Nah, variance is good, but the typical symbol would be +- or the fancy
 ±.
 
 ~ when used as a unary op means 'approx' or 'about' or 'same order'
 ~ when used as a binary op means equivalence, a weaker equal, often in
 the vein of the unary op meaning.
 
 Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics
 
 So while I think having a measure of variance is good, I think you
 picked entirely the wrong symbol.

Or, maybe you can use σ (lower case sigma) to indicate stddev, :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Fengguang Wu
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote:
 On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote:
 ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
 ---  -  
  94500 ~ 3%+115.6% 203711 ~ 6%  
 ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
 lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
 proc-vmstat.numa_hint_faults_local
 
  It means, for commit ebe06187bf2aec1, the number for
  num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
  machine. The 3%, 4% following that number means the deviation of the
  different runs to their average(we usually run it multiple times to
  phase out possible sharp values). We should probably remove that
  percentage, as they cause confusion if no detailed explanation and may
  not mean much to the commit author and others(if the deviation is big
  enough, we should simply drop that result).
 
 Nah, variance is good, but the typical symbol would be +- or the fancy
 ±.
 
 ~ when used as a unary op means 'approx' or 'about' or 'same order'
 ~ when used as a binary op means equivalence, a weaker equal, often in
 the vein of the unary op meaning.
 
 Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics
 
 So while I think having a measure of variance is good, I think you
 picked entirely the wrong symbol.

Good point! We'll first try ± for the stddev percent and fall back to
+- if it turn out to not work well in some cases.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 07:37:05PM +0200, Peter Zijlstra wrote:
 On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote:
  I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test
  significantly please do the run with 20 warehouses only
  (or in general with #warehouses ==  number of nodes * number of PHYSICAL
  cores)
 
 Yeah, went and did that for my 4 node machine, its got a ton more cores, but I
 matches the warehouses to it:
 
 -a43455a1d57  tip/master
 
 979996.47 1144715.44
 8761461098499.07
 1058974.181019499.38
 1055951.591139405.22
 970504.01 1099659.09
 
 988314.45 1100355.64  (avg)
 75059.546179565   50085.7473975167(stdev)
 
 So for 5 runs, tip/master (which includes the offending patch) wins hands 
 down.
 
 Each run is 2 minutes.

Because Rik asked for a43455a1d57^1 numbers:

546423.08
546558.63
545990.01
546015.98

some a43455a1d57 numbers:

538652.93
544333.57
542684.77

same setup and everything. So clearly the patches after that made 'some'
difference indeed, seeing how tip/master is almost twice that.

So the reason I didn't so a43455a1d57^1 vs a43455a1d57 is because we already
fingered a commit, after that what you test is the revert of that commit,
because revert is what you typically end up doing if a commit is fail.

But on the state of tip/master, taking that commit out is a net negative for
everything I've tested.


pgpQ2fcgd0Qwu.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote:
 Peter, I'm seeing regressions for
 
 SINGLE SPECjbb instance for number of warehouses being the same as total 
 number of cores in the box.
 
 Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is 
 for 24 warehouses.

By looking at your graph, that's around a 10% difference.

So I'm not seeing anywhere near as bad a regression on a 80-core box.
Testing single with 80 warehouses, I get:

tip/master baseline:
677476.36 bops
705826.70 bops
704870.87 bops
681741.20 bops 
707014.59 bops

Avg: 695385.94 bops

tip/master + patch (NUMA_SCALE/8 variant):
698242.66 bops
693873.18 bops 
707852.28 bops
691785.96 bops 
747206.03 bopsthis 

Avg: 707792.022 bops

So both these are pretty similar, however, when reverting, on avg we
increase the amount of bops a mere ~4%:

tip/master + reverted:
778416.02 bops 
702602.62 bops 
712557.32 bops 
713982.90 bops
783300.36 bops

Avg: 738171.84 bops

Are there perhaps any special specjbb options you are using?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Davidlohr Bueso
On Fri, 2014-08-01 at 13:46 -0700, Davidlohr Bueso wrote:
 So both these are pretty similar, however, when reverting, on avg we
 increase the amount of bops a mere ~4%:
 
 tip/master + reverted:

Just to be clear, this is reverting a43455a1d57.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Jirka Hladky

On 08/01/2014 10:46 PM, Davidlohr Bueso wrote:

On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote:

Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total
number of cores in the box.

Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is
for 24 warehouses.

By looking at your graph, that's around a 10% difference.

So I'm not seeing anywhere near as bad a regression on a 80-core box.
Testing single with 80 warehouses, I get:

tip/master baseline:
677476.36 bops
705826.70 bops
704870.87 bops
681741.20 bops
707014.59 bops

Avg: 695385.94 bops

tip/master + patch (NUMA_SCALE/8 variant):
698242.66 bops
693873.18 bops
707852.28 bops
691785.96 bops
747206.03 bopsthis

Avg: 707792.022 bops

So both these are pretty similar, however, when reverting, on avg we
increase the amount of bops a mere ~4%:

tip/master + reverted:
778416.02 bops
702602.62 bops
712557.32 bops
713982.90 bops
783300.36 bops

Avg: 738171.84 bops

Are there perhaps any special specjbb options you are using?



I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon 
E7-4890 v2 CPUs.


http://ark.intel.com/products/75251
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2

Please rerun the test on box with Ivy Bridge CPUs. It seems that older 
CPU generations are not affected.


Thanks
Jirka


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/01/2014 05:30 PM, Jirka Hladky wrote:

 I see the regression only on this box. It has 4 Ivy Bridge-EX
 Xeon E7-4890 v2 CPUs.
 
 http://ark.intel.com/products/75251 
 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2

 
 
 Please rerun the test on box with Ivy Bridge CPUs. It seems that
 older CPU generations are not affected.

That would have been good info to know :)

I've been spending about a month trying to reproduce your issue on a
Westmere E7-4860.

Good thing I found all kinds of other scheduler issues along the way...

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT3GZBAAoJEM553pKExN6D4FcH/2c/kYOZkbJeLBEWJHB0yWNR
tqI2Lt/qxPxOKADlDylJwj2Dq8R19Cc4tnJZAdPh+wgCivFefseQY0MI1TI8CO/Z
vEH+dCG8hokygFxKqAX9udI0MD1OxfTKKIk4fdjInZ632JG+JHnqVH6qWxBsriXD
151jzCR/zQEjg6gyCc8YsL06Q9YHyVv7dakggtRkYnE1GIUAtTDhFttRpNYoiVQQ
y/d32adq//PywTmsyWwJMu1ZGe1eGC57JBYzjoUo2iOlFQ9QR+fe4W2/6ZCbekwK
O8ZYbrJzDGrNQP2yDYd+o040KeVfzYkOtwz7+/40TYIvqFiuvKxEAxbJ32+krxA=
=XxCE
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Peter Zijlstra
On Fri, Aug 01, 2014 at 11:30:34PM +0200, Jirka Hladky wrote:
 I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon E7-4890
 v2 CPUs.

That's the exact CPU I've got in the 4 node machine I did the tests on.



pgpSRWYqkSh2L.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-08-01 Thread Jirka Hladky

On 08/02/2014 06:17 AM, Rik van Riel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/01/2014 05:30 PM, Jirka Hladky wrote:


I see the regression only on this box. It has 4 Ivy Bridge-EX
Xeon E7-4890 v2 CPUs.

http://ark.intel.com/products/75251
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2



Please rerun the test on box with Ivy Bridge CPUs. It seems that
older CPU generations are not affected.

That would have been good info to know :)

I've been spending about a month trying to reproduce your issue on a
Westmere E7-4860.

Good thing I found all kinds of other scheduler issues along the way...


Hi Rik,

till recently I have seen the regression on all systems.

With the latest kernel, only Ivy Bridge system seems to be affected.

Jirka

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote:
> On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > > On Tue, 29 Jul 2014 13:24:05 +0800
> > > Aaron Lu  wrote:
> > > 
> > > > FYI, we noticed the below changes on
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > > task_numa_migrate() checks the preferred node")
> > > > 
> > > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > > ---  -  
> > > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > > ivb42/hackbench/50%-threads-pipe
> > > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > > lkp-snb01/hackbench/50%-threads-socket
> > > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > > proc-vmstat.numa_hint_faults_local
> > > 
> > > Hi Aaron,
> > > 
> > > Jirka Hladky has reported a regression with that changeset as
> > > well, and I have already spent some time debugging the issue.
> > 
> > So assuming those numbers above are the difference in
> 
> Yes, they are.
> 
> It means, for commit ebe06187bf2aec1, the number for
> num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
> machine. The 3%, 4% following that number means the deviation of the
> different runs to their average(we usually run it multiple times to
> phase out possible sharp values). We should probably remove that
> percentage, as they cause confusion if no detailed explanation and may
> not mean much to the commit author and others(if the deviation is big
> enough, we should simply drop that result).
> 
> The percentage in the middle is the change between the two commits.
> 
> Another thing is the meaning of the numbers, it doesn't seem that
> evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
> like this is better?

Instead of removing info, why not document what each piece of data
represents. Or add headers to the table. etc.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > On Tue, 29 Jul 2014 13:24:05 +0800
> > Aaron Lu  wrote:
> > 
> > > FYI, we noticed the below changes on
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > task_numa_migrate() checks the preferred node")
> > > 
> > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > ---  -  
> > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > ivb42/hackbench/50%-threads-pipe
> > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > lkp-snb01/hackbench/50%-threads-socket
> > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > proc-vmstat.numa_hint_faults_local
> > 
> > Hi Aaron,
> > 
> > Jirka Hladky has reported a regression with that changeset as
> > well, and I have already spent some time debugging the issue.
> 
> So assuming those numbers above are the difference in

Yes, they are.

It means, for commit ebe06187bf2aec1, the number for
num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
machine. The 3%, 4% following that number means the deviation of the
different runs to their average(we usually run it multiple times to
phase out possible sharp values). We should probably remove that
percentage, as they cause confusion if no detailed explanation and may
not mean much to the commit author and others(if the deviation is big
enough, we should simply drop that result).

The percentage in the middle is the change between the two commits.

Another thing is the meaning of the numbers, it doesn't seem that
evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
like this is better?

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  proc-vmstat.numa_hint_faults_local
---  -  -
 94500 +115.6% 203711   ivb42/hackbench/50%-threads-pipe
 67745  +64.1% 74   
lkp-snb01/hackbench/50%-threads-socket
162245  +94.1% 314885   TOTAL 

Regards,
Aaron

> numa_hint_local_faults, the report is actually a significant
> _improvement_, not a regression.
> 
> On my IVB-EP I get similar numbers; using:
> 
>   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   perf bench sched messaging -g 24 -t -p -l 6
>   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   echo $((POST-PRE))
> 
> 
> tip/mater+origin/master   tip/master+origin/master-a43455a1d57
> 
> local total   local   total
> faults  timefaults  time
> 
> 19971 51.384  10104   50.838
> 17193 50.564  911650.208
> 13435 49.057  833251.344
> 23794 50.795  995451.364
> 20255 49.463  959851.258
> 
> 18929.6   50.2526 9420.8  51.0024
> 3863.61   0.96717.78  0.49
> 
> So that patch improves both local faults and runtime. Its good (even
> though for the runtime we're still inside stdev overlap, so ideally I'd
> do more runs).
> 
> 
> Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
> that slightly reduces both again:
> 
> tip/master+origin/master+patch
> 
> local total
> faults  time
> 
> 21296 50.541
> 12771 50.54
> 13872 52.224
> 23352 50.85
> 16516 50.705
> 
> 17561.4   50.972
> 4613.32   0.71
> 
> So for hackbench a43455a1d57 is good and the proposed patch is making
> things worse.
> 
> Let me see if I can still find my SPECjbb2005 copy to see what that
> does.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > On Tue, 29 Jul 2014 13:24:05 +0800
> > Aaron Lu  wrote:
> > 
> > > FYI, we noticed the below changes on
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > task_numa_migrate() checks the preferred node")
> > > 
> > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > ---  -  
> > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > ivb42/hackbench/50%-threads-pipe
> > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > lkp-snb01/hackbench/50%-threads-socket
> > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > proc-vmstat.numa_hint_faults_local
> > 
> > Hi Aaron,
> > 
> > Jirka Hladky has reported a regression with that changeset as
> > well, and I have already spent some time debugging the issue.
> 
> So assuming those numbers above are the difference in
> numa_hint_local_faults, the report is actually a significant
> _improvement_, not a regression.
> 
> On my IVB-EP I get similar numbers; using:
> 
>   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   perf bench sched messaging -g 24 -t -p -l 6
>   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   echo $((POST-PRE))
> 
> 
> tip/mater+origin/master   tip/master+origin/master-a43455a1d57
> 
> local total   local   total
> faults  timefaults  time
> 
> 19971 51.384  10104   50.838
> 17193 50.564  911650.208
> 13435 49.057  833251.344
> 23794 50.795  995451.364
> 20255 49.463  959851.258
> 
> 18929.6   50.2526 9420.8  51.0024
> 3863.61   0.96717.78  0.49
> 
> So that patch improves both local faults and runtime. Its good (even
> though for the runtime we're still inside stdev overlap, so ideally I'd
> do more runs).
> 
> 
> Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
> that slightly reduces both again:
> 
> tip/master+origin/master+patch
> 
> local total
> faults  time
> 
> 21296 50.541
> 12771 50.54
> 13872 52.224
> 23352 50.85
> 16516 50.705
> 
> 17561.4   50.972
> 4613.32   0.71
> 
> So for hackbench a43455a1d57 is good and the proposed patch is making
> things worse.

It also seems to be the case on a 8-socket 80 core DL980:

tip/master baseline:
67276 169.590 [sec]
82400 188.406 [sec]
87827 201.122 [sec]
96659 228.243 [sec]
83180 192.422 [sec]

tip/master + a43455a1d57 reverted
36686 170.373 [sec]
52670 187.904 [sec]
55723 203.597 [sec]
41780 174.354 [sec]
36070 173.179 [sec]

Runtimes are pretty much all over the place, cannot really say if it's
gotten slower or faster. However, on avg, we nearly double the amount of
hint local faults with the commit in question.

After adding the proposed fix (NUMA_SCALE/8 variant), it goes down
again, closer to without a43455a1d57"

tip/master + patch
50591 175.272 [sec]
57858 191.969 [sec]
77564 215.429 [sec]
50613 179.384 [sec]
61673 201.694 [sec]

> Let me see if I can still find my SPECjbb2005 copy to see what that
> does.

I'll try to dig it up as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote:
> I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test
> significantly please do the run with 20 warehouses only
> (or in general with #warehouses ==  number of nodes * number of PHYSICAL
> cores)

Yeah, went and did that for my 4 node machine, its got a ton more cores, but I
matches the warehouses to it:

-a43455a1d57tip/master

979996.47   1144715.44
876146  1098499.07
1058974.18  1019499.38
1055951.59  1139405.22
970504.01   1099659.09

988314.45   1100355.64  (avg)
75059.546179565 50085.7473975167(stdev)

So for 5 runs, tip/master (which includes the offending patch) wins hands down.

Each run is 2 minutes.


pgpqMzlZNWcUU.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Jirka Hladky

On 07/31/2014 06:27 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote:

On 07/31/2014 05:57 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:

On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:

On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu  wrote:


FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
task_numa_migrate() checks the preferred node")

ebe06187bf2aec1  a43455a1d572daf7b730fe12e
---  -
  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.



Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total
number of cores in the box.

Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for
24 warehouses.

IVB-EP: 2 node, 10 cores, 2 thread per core:

tip/master+origin/master:

  Warehouses   Thrput
   4   196781
   8   358064
  12   511318
  16   589251
  20   656123
  24   710789
  28   765426
  32   787059
  36   777899
* 40   748568
 
Throughput  18258


  Warehouses   Thrput
   4   201598
   8   363470
  12   512968
  16   584289
  20   605299
  24   720142
  28   776066
  32   791263
  36   776965
* 40   760572
 
Throughput  18551



tip/master+origin/master-a43455a1d57

SPEC scores
  Warehouses   Thrput
   4   198667
   8   362481
  12   503344
  16   582602
  20   647688
  24   731639
  28   786135
  32   794124
  36   774567
* 40   757559
 
Throughput  18477



Given that there's fairly large variance between the two runs with the
commit in, I'm not sure I can say there's a problem here.

The one run without the patch is more or less between the two runs with
the patch.

And doing this many runs takes ages, so I'm not tempted to either make
the runs longer or do more of them.

Lemme try on a 4 node box though, who knows.


IVB-EP: 2 node, 10 cores, 2 thread per core
=> on such system, I run only 20 warenhouses as maximum. (number of 
nodes * number of PHYSICAL cores)


The kernels you have tested shows following results:
656123/605299/647688


I'm doing 3 iterations (3 runs) to get some statistics. To speed up the 
test significantly please do the run with 20 warehouses only
(or in general with #warehouses ==  number of nodes * number of PHYSICAL 
cores)


Jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote:
> On 07/31/2014 05:57 PM, Peter Zijlstra wrote:
> >On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> >>On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> >>>On Tue, 29 Jul 2014 13:24:05 +0800
> >>>Aaron Lu  wrote:
> >>>
> FYI, we noticed the below changes on
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> task_numa_migrate() checks the preferred node")
> 
> ebe06187bf2aec1  a43455a1d572daf7b730fe12e
> ---  -
>   94500 ~ 3%+115.6% 203711 ~ 6%  
>  ivb42/hackbench/50%-threads-pipe
>   67745 ~ 4% +64.1% 74 ~ 5%  
>  lkp-snb01/hackbench/50%-threads-socket
>  162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
>  proc-vmstat.numa_hint_faults_local
> >>>Hi Aaron,
> >>>
> >>>Jirka Hladky has reported a regression with that changeset as
> >>>well, and I have already spent some time debugging the issue.
> >>Let me see if I can still find my SPECjbb2005 copy to see what that
> >>does.
> >Jirka, what kind of setup were you seeing SPECjbb regressions?
> >
> >I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
> >check one instance per socket now.
> >
> >
> Peter, I'm seeing regressions for
> 
> SINGLE SPECjbb instance for number of warehouses being the same as total
> number of cores in the box.
> 
> Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for
> 24 warehouses.

IVB-EP: 2 node, 10 cores, 2 thread per core:

tip/master+origin/master:

 Warehouses   Thrput
  4   196781
  8   358064
 12   511318
 16   589251
 20   656123
 24   710789
 28   765426
 32   787059
 36   777899
   * 40   748568

Throughput  18258   

 Warehouses   Thrput
  4   201598
  8   363470
 12   512968
 16   584289
 20   605299
 24   720142
 28   776066
 32   791263
 36   776965
   * 40   760572

Throughput  18551   


tip/master+origin/master-a43455a1d57

   SPEC scores  
  
 Warehouses   Thrput
  4   198667
  8   362481
 12   503344
 16   582602
 20   647688
 24   731639
 28   786135
 32   794124
 36   774567
   * 40   757559

Throughput  18477  


Given that there's fairly large variance between the two runs with the
commit in, I'm not sure I can say there's a problem here.

The one run without the patch is more or less between the two runs with
the patch.

And doing this many runs takes ages, so I'm not tempted to either make
the runs longer or do more of them.

Lemme try on a 4 node box though, who knows.


pgpM70i9W_6Xw.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Jirka Hladky

On 07/31/2014 05:57 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:

On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:

On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu  wrote:


FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
task_numa_migrate() checks the preferred node")

ebe06187bf2aec1  a43455a1d572daf7b730fe12e
---  -
  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.



Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total 
number of cores in the box.


Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is 
for 24 warehouses.


See the attached snapshot.

Jirka


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > On Tue, 29 Jul 2014 13:24:05 +0800
> > Aaron Lu  wrote:
> > 
> > > FYI, we noticed the below changes on
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > task_numa_migrate() checks the preferred node")
> > > 
> > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > ---  -  
> > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > ivb42/hackbench/50%-threads-pipe
> > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > lkp-snb01/hackbench/50%-threads-socket
> > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > proc-vmstat.numa_hint_faults_local
> > 
> > Hi Aaron,
> > 
> > Jirka Hladky has reported a regression with that changeset as
> > well, and I have already spent some time debugging the issue.
> 
> Let me see if I can still find my SPECjbb2005 copy to see what that
> does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.




pgpX4ciWQErN1.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> On Tue, 29 Jul 2014 13:24:05 +0800
> Aaron Lu  wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > task_numa_migrate() checks the preferred node")
> > 
> > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > ---  -  
> >  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
> >  67745 ~ 4% +64.1% 74 ~ 5%  
> > lkp-snb01/hackbench/50%-threads-socket
> > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > proc-vmstat.numa_hint_faults_local
> 
> Hi Aaron,
> 
> Jirka Hladky has reported a regression with that changeset as
> well, and I have already spent some time debugging the issue.

So assuming those numbers above are the difference in
numa_hint_local_faults, the report is actually a significant
_improvement_, not a regression.

On my IVB-EP I get similar numbers; using:

  PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
  perf bench sched messaging -g 24 -t -p -l 6
  POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
  echo $((POST-PRE))


tip/mater+origin/master tip/master+origin/master-a43455a1d57

local   total   local   total
faults  timefaults  time

19971   51.384  10104   50.838
17193   50.564  911650.208
13435   49.057  833251.344
23794   50.795  995451.364
20255   49.463  959851.258

18929.6 50.2526 9420.8  51.0024
3863.61 0.96717.78  0.49

So that patch improves both local faults and runtime. Its good (even
though for the runtime we're still inside stdev overlap, so ideally I'd
do more runs).


Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
that slightly reduces both again:

tip/master+origin/master+patch

local   total
faults  time

21296   50.541
12771   50.54
13872   52.224
23352   50.85
16516   50.705

17561.4 50.972
4613.32 0.71

So for hackbench a43455a1d57 is good and the proposed patch is making
things worse.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.


pgpbXbCxJdleb.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 10:33:30AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote:
> > 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe
> 
> What kind of IVB is that EP or EX (or rather, how many sockets)? Also
> what arguments to hackbench do you use?
> 

2 sockets EP.

The cmdline is:
/usr/bin/hackbench -g 24 --threads --pipe -l 6

Regards,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote:
> 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe

What kind of IVB is that EP or EX (or rather, how many sockets)? Also
what arguments to hackbench do you use?



pgpaZ5R1fdmdc.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 02:22:55AM -0400, Rik van Riel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 07/31/2014 01:04 AM, Aaron Lu wrote:
> > On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
> >> On 07/29/2014 10:14 PM, Aaron Lu wrote:
> >>> On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
>  On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra
>   wrote:
>  
> >> +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50
> > 
> > Please make that 1024, there's no reason not to use power
> > of two here. This base 10 factor thing annoyed me no end
> > already, its time for it to die.
>  
>  That's easy enough.  However, it would be good to know
>  whether this actually helps with the regression Aaron found
>  :)
> >>> 
> >>> Sorry for the delay.
> >>> 
> >>> I applied the last patch and queued the hackbench job to the
> >>> ivb42 test machine for it to run 5 times, and here is the
> >>> result(regarding the proc-vmstat.numa_hint_faults_local
> >>> field): 173565 201262 192317 198342 198595 avg: 192816
> >>> 
> >>> It seems it is still very big than previous kernels.
> >> 
> >> It looks like a step in the right direction, though.
> >> 
> >> Could you try running with a larger threshold?
> >> 
>  +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
>  unsigned long group_faults_cpu(struct numa_group *group, int
>  nid)
>  
>  /* * These return the fraction of accesses done by a
>  particular task, or - * task group, on a particular numa
>  node.  The group weight is given a - * larger multiplier, in
>  order to group tasks together that are almost - * evenly
>  spread out between numa nodes. + * task group, on a
>  particular numa node.  The NUMA move threshold + * prevents
>  task moves with marginal improvement, and is set to 5%. */ 
>  +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
>  NUMA_SCALE / 100)
> >> 
> >> It would be good to see if changing NUMA_MOVE_THRESH to 
> >> (NUMA_SCALE / 8) does the trick.
> > 
> > With your 2nd patch and the above change, the result is:
> > 
> > "proc-vmstat.numa_hint_faults_local": [ 199708, 209152, 200638, 
> > 187324, 196654 ],
> > 
> > avg: 198695
> 
> OK, so it is still a little higher than your original 162245.

The original number is 94500 for ivb42 machine, the 162245 is the sum
of the two numbers above it that are tested on two machines - one is the
number for ivb42 and one is for lkp-snb01. Sorry if that is not clear.

And for the numbers I have given with your patch applied, they are all
for ivb42 alone.

> 
> I guess this is to be expected, since the code will be more
> successful at placing a task on the right node, which results
> in the task scanning its memory more rapidly for a little bit.
> 
> Are you seeing any changes in throughput?

The throughput has almost no change. Your 2nd patch with scale changed
has seen a decrease of 0.1% compared to your original commit that
triggered the report, and that original commit has a increase of 1.2%
compared to its parent commit.

Regards,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Rik van Riel
On Thu, 31 Jul 2014 13:04:54 +0800
Aaron Lu  wrote:

> On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
> > On 07/29/2014 10:14 PM, Aaron Lu wrote:

> > >> +#define NUMA_SCALE 1024
> > >> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
> > 
> > It would be good to see if changing NUMA_MOVE_THRESH to
> > (NUMA_SCALE / 8) does the trick.

FWIW, running with NUMA_MOVE_THRESH set to (NUMA_SCALE / 8)
seems to resolve the SPECjbb2005 threshold on my system.

I will run some more sanity tests later today...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/31/2014 01:04 AM, Aaron Lu wrote:
> On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
>> On 07/29/2014 10:14 PM, Aaron Lu wrote:
>>> On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra
  wrote:
 
>> +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50
> 
> Please make that 1024, there's no reason not to use power
> of two here. This base 10 factor thing annoyed me no end
> already, its time for it to die.
 
 That's easy enough.  However, it would be good to know
 whether this actually helps with the regression Aaron found
 :)
>>> 
>>> Sorry for the delay.
>>> 
>>> I applied the last patch and queued the hackbench job to the
>>> ivb42 test machine for it to run 5 times, and here is the
>>> result(regarding the proc-vmstat.numa_hint_faults_local
>>> field): 173565 201262 192317 198342 198595 avg: 192816
>>> 
>>> It seems it is still very big than previous kernels.
>> 
>> It looks like a step in the right direction, though.
>> 
>> Could you try running with a larger threshold?
>> 
 +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
 unsigned long group_faults_cpu(struct numa_group *group, int
 nid)
 
 /* * These return the fraction of accesses done by a
 particular task, or - * task group, on a particular numa
 node.  The group weight is given a - * larger multiplier, in
 order to group tasks together that are almost - * evenly
 spread out between numa nodes. + * task group, on a
 particular numa node.  The NUMA move threshold + * prevents
 task moves with marginal improvement, and is set to 5%. */ 
 +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
 NUMA_SCALE / 100)
>> 
>> It would be good to see if changing NUMA_MOVE_THRESH to 
>> (NUMA_SCALE / 8) does the trick.
> 
> With your 2nd patch and the above change, the result is:
> 
> "proc-vmstat.numa_hint_faults_local": [ 199708, 209152, 200638, 
> 187324, 196654 ],
> 
> avg: 198695

OK, so it is still a little higher than your original 162245.

I guess this is to be expected, since the code will be more
successful at placing a task on the right node, which results
in the task scanning its memory more rapidly for a little bit.

Are you seeing any changes in throughput?

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT2eC/AAoJEM553pKExN6DIFMH/23LsoEJ8cUqMTdWUzhXesEb
TW0yncraZ6tDkGHopTU4oFmck93XUUVSJRVjLC3lxvxAIdWt8M4GCbWN8RD1yicX
Ii9s18+2r2vkc30gkIgh2yahaqQUun9sUkuaQ4BaKlbP+hwQzB3OfU1GjR7iStFE
t04krgCAL+xL63H/4mN0Y9ZjOBUz2QYbkspS21+oEWKkFY2FyyQn+hOSnA6lSvqy
o7v4tmC8jtRXsQY+hfy1aOtMUZO5sRcYHOttlxgjE5MbnW/whhsC+oB7cWw646St
LhvhhIykl/g2Bz+E3KbfnREGn5OO7NmEhv3am2Dj5XsNHnEfxYJH/m/aTA4az/s=
=/IeV
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/31/2014 01:04 AM, Aaron Lu wrote:
 On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
 On 07/29/2014 10:14 PM, Aaron Lu wrote:
 On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra
 pet...@infradead.org wrote:
 
 +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50
 
 Please make that 1024, there's no reason not to use power
 of two here. This base 10 factor thing annoyed me no end
 already, its time for it to die.
 
 That's easy enough.  However, it would be good to know
 whether this actually helps with the regression Aaron found
 :)
 
 Sorry for the delay.
 
 I applied the last patch and queued the hackbench job to the
 ivb42 test machine for it to run 5 times, and here is the
 result(regarding the proc-vmstat.numa_hint_faults_local
 field): 173565 201262 192317 198342 198595 avg: 192816
 
 It seems it is still very big than previous kernels.
 
 It looks like a step in the right direction, though.
 
 Could you try running with a larger threshold?
 
 +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
 unsigned long group_faults_cpu(struct numa_group *group, int
 nid)
 
 /* * These return the fraction of accesses done by a
 particular task, or - * task group, on a particular numa
 node.  The group weight is given a - * larger multiplier, in
 order to group tasks together that are almost - * evenly
 spread out between numa nodes. + * task group, on a
 particular numa node.  The NUMA move threshold + * prevents
 task moves with marginal improvement, and is set to 5%. */ 
 +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
 NUMA_SCALE / 100)
 
 It would be good to see if changing NUMA_MOVE_THRESH to 
 (NUMA_SCALE / 8) does the trick.
 
 With your 2nd patch and the above change, the result is:
 
 proc-vmstat.numa_hint_faults_local: [ 199708, 209152, 200638, 
 187324, 196654 ],
 
 avg: 198695

OK, so it is still a little higher than your original 162245.

I guess this is to be expected, since the code will be more
successful at placing a task on the right node, which results
in the task scanning its memory more rapidly for a little bit.

Are you seeing any changes in throughput?

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJT2eC/AAoJEM553pKExN6DIFMH/23LsoEJ8cUqMTdWUzhXesEb
TW0yncraZ6tDkGHopTU4oFmck93XUUVSJRVjLC3lxvxAIdWt8M4GCbWN8RD1yicX
Ii9s18+2r2vkc30gkIgh2yahaqQUun9sUkuaQ4BaKlbP+hwQzB3OfU1GjR7iStFE
t04krgCAL+xL63H/4mN0Y9ZjOBUz2QYbkspS21+oEWKkFY2FyyQn+hOSnA6lSvqy
o7v4tmC8jtRXsQY+hfy1aOtMUZO5sRcYHOttlxgjE5MbnW/whhsC+oB7cWw646St
LhvhhIykl/g2Bz+E3KbfnREGn5OO7NmEhv3am2Dj5XsNHnEfxYJH/m/aTA4az/s=
=/IeV
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Rik van Riel
On Thu, 31 Jul 2014 13:04:54 +0800
Aaron Lu aaron...@intel.com wrote:

 On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
  On 07/29/2014 10:14 PM, Aaron Lu wrote:

   +#define NUMA_SCALE 1024
   +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
  
  It would be good to see if changing NUMA_MOVE_THRESH to
  (NUMA_SCALE / 8) does the trick.

FWIW, running with NUMA_MOVE_THRESH set to (NUMA_SCALE / 8)
seems to resolve the SPECjbb2005 threshold on my system.

I will run some more sanity tests later today...

-- 
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 02:22:55AM -0400, Rik van Riel wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 07/31/2014 01:04 AM, Aaron Lu wrote:
  On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
  On 07/29/2014 10:14 PM, Aaron Lu wrote:
  On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
  On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra
  pet...@infradead.org wrote:
  
  +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50
  
  Please make that 1024, there's no reason not to use power
  of two here. This base 10 factor thing annoyed me no end
  already, its time for it to die.
  
  That's easy enough.  However, it would be good to know
  whether this actually helps with the regression Aaron found
  :)
  
  Sorry for the delay.
  
  I applied the last patch and queued the hackbench job to the
  ivb42 test machine for it to run 5 times, and here is the
  result(regarding the proc-vmstat.numa_hint_faults_local
  field): 173565 201262 192317 198342 198595 avg: 192816
  
  It seems it is still very big than previous kernels.
  
  It looks like a step in the right direction, though.
  
  Could you try running with a larger threshold?
  
  +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline
  unsigned long group_faults_cpu(struct numa_group *group, int
  nid)
  
  /* * These return the fraction of accesses done by a
  particular task, or - * task group, on a particular numa
  node.  The group weight is given a - * larger multiplier, in
  order to group tasks together that are almost - * evenly
  spread out between numa nodes. + * task group, on a
  particular numa node.  The NUMA move threshold + * prevents
  task moves with marginal improvement, and is set to 5%. */ 
  +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 *
  NUMA_SCALE / 100)
  
  It would be good to see if changing NUMA_MOVE_THRESH to 
  (NUMA_SCALE / 8) does the trick.
  
  With your 2nd patch and the above change, the result is:
  
  proc-vmstat.numa_hint_faults_local: [ 199708, 209152, 200638, 
  187324, 196654 ],
  
  avg: 198695
 
 OK, so it is still a little higher than your original 162245.

The original number is 94500 for ivb42 machine, the 162245 is the sum
of the two numbers above it that are tested on two machines - one is the
number for ivb42 and one is for lkp-snb01. Sorry if that is not clear.

And for the numbers I have given with your patch applied, they are all
for ivb42 alone.

 
 I guess this is to be expected, since the code will be more
 successful at placing a task on the right node, which results
 in the task scanning its memory more rapidly for a little bit.
 
 Are you seeing any changes in throughput?

The throughput has almost no change. Your 2nd patch with scale changed
has seen a decrease of 0.1% compared to your original commit that
triggered the report, and that original commit has a increase of 1.2%
compared to its parent commit.

Regards,
Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote:
 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe

What kind of IVB is that EP or EX (or rather, how many sockets)? Also
what arguments to hackbench do you use?



pgpaZ5R1fdmdc.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 10:33:30AM +0200, Peter Zijlstra wrote:
 On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote:
  118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe
 
 What kind of IVB is that EP or EX (or rather, how many sockets)? Also
 what arguments to hackbench do you use?
 

2 sockets EP.

The cmdline is:
/usr/bin/hackbench -g 24 --threads --pipe -l 6

Regards,
Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 13:24:05 +0800
 Aaron Lu aaron...@intel.com wrote:
 
  FYI, we noticed the below changes on
  
  git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
  commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
  task_numa_migrate() checks the preferred node)
  
  ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
  ---  -  
   94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
   67745 ~ 4% +64.1% 74 ~ 5%  
  lkp-snb01/hackbench/50%-threads-socket
  162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
  proc-vmstat.numa_hint_faults_local
 
 Hi Aaron,
 
 Jirka Hladky has reported a regression with that changeset as
 well, and I have already spent some time debugging the issue.

So assuming those numbers above are the difference in
numa_hint_local_faults, the report is actually a significant
_improvement_, not a regression.

On my IVB-EP I get similar numbers; using:

  PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
  perf bench sched messaging -g 24 -t -p -l 6
  POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
  echo $((POST-PRE))


tip/mater+origin/master tip/master+origin/master-a43455a1d57

local   total   local   total
faults  timefaults  time

19971   51.384  10104   50.838
17193   50.564  911650.208
13435   49.057  833251.344
23794   50.795  995451.364
20255   49.463  959851.258

18929.6 50.2526 9420.8  51.0024
3863.61 0.96717.78  0.49

So that patch improves both local faults and runtime. Its good (even
though for the runtime we're still inside stdev overlap, so ideally I'd
do more runs).


Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
that slightly reduces both again:

tip/master+origin/master+patch

local   total
faults  time

21296   50.541
12771   50.54
13872   52.224
23352   50.85
16516   50.705

17561.4 50.972
4613.32 0.71

So for hackbench a43455a1d57 is good and the proposed patch is making
things worse.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.


pgpbXbCxJdleb.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
 On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
  On Tue, 29 Jul 2014 13:24:05 +0800
  Aaron Lu aaron...@intel.com wrote:
  
   FYI, we noticed the below changes on
   
   git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
   commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
   task_numa_migrate() checks the preferred node)
   
   ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
   ---  -  
94500 ~ 3%+115.6% 203711 ~ 6%  
   ivb42/hackbench/50%-threads-pipe
67745 ~ 4% +64.1% 74 ~ 5%  
   lkp-snb01/hackbench/50%-threads-socket
   162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
   proc-vmstat.numa_hint_faults_local
  
  Hi Aaron,
  
  Jirka Hladky has reported a regression with that changeset as
  well, and I have already spent some time debugging the issue.
 
 Let me see if I can still find my SPECjbb2005 copy to see what that
 does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.




pgpX4ciWQErN1.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Jirka Hladky

On 07/31/2014 05:57 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:

On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:

On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu aaron...@intel.com wrote:


FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
task_numa_migrate() checks the preferred node)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e
---  -
  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.



Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total 
number of cores in the box.


Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is 
for 24 warehouses.


See the attached snapshot.

Jirka


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote:
 On 07/31/2014 05:57 PM, Peter Zijlstra wrote:
 On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
 On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 13:24:05 +0800
 Aaron Lu aaron...@intel.com wrote:
 
 FYI, we noticed the below changes on
 
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
 commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
 task_numa_migrate() checks the preferred node)
 
 ebe06187bf2aec1  a43455a1d572daf7b730fe12e
 ---  -
   94500 ~ 3%+115.6% 203711 ~ 6%  
  ivb42/hackbench/50%-threads-pipe
   67745 ~ 4% +64.1% 74 ~ 5%  
  lkp-snb01/hackbench/50%-threads-socket
  162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
  proc-vmstat.numa_hint_faults_local
 Hi Aaron,
 
 Jirka Hladky has reported a regression with that changeset as
 well, and I have already spent some time debugging the issue.
 Let me see if I can still find my SPECjbb2005 copy to see what that
 does.
 Jirka, what kind of setup were you seeing SPECjbb regressions?
 
 I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
 check one instance per socket now.
 
 
 Peter, I'm seeing regressions for
 
 SINGLE SPECjbb instance for number of warehouses being the same as total
 number of cores in the box.
 
 Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for
 24 warehouses.

IVB-EP: 2 node, 10 cores, 2 thread per core:

tip/master+origin/master:

 Warehouses   Thrput
  4   196781
  8   358064
 12   511318
 16   589251
 20   656123
 24   710789
 28   765426
 32   787059
 36   777899
   * 40   748568

Throughput  18258   

 Warehouses   Thrput
  4   201598
  8   363470
 12   512968
 16   584289
 20   605299
 24   720142
 28   776066
 32   791263
 36   776965
   * 40   760572

Throughput  18551   


tip/master+origin/master-a43455a1d57

   SPEC scores  
  
 Warehouses   Thrput
  4   198667
  8   362481
 12   503344
 16   582602
 20   647688
 24   731639
 28   786135
 32   794124
 36   774567
   * 40   757559

Throughput  18477  


Given that there's fairly large variance between the two runs with the
commit in, I'm not sure I can say there's a problem here.

The one run without the patch is more or less between the two runs with
the patch.

And doing this many runs takes ages, so I'm not tempted to either make
the runs longer or do more of them.

Lemme try on a 4 node box though, who knows.


pgpM70i9W_6Xw.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Jirka Hladky

On 07/31/2014 06:27 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote:

On 07/31/2014 05:57 PM, Peter Zijlstra wrote:

On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:

On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:

On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu aaron...@intel.com wrote:


FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
task_numa_migrate() checks the preferred node)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e
---  -
  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

Let me see if I can still find my SPECjbb2005 copy to see what that
does.

Jirka, what kind of setup were you seeing SPECjbb regressions?

I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go
check one instance per socket now.



Peter, I'm seeing regressions for

SINGLE SPECjbb instance for number of warehouses being the same as total
number of cores in the box.

Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for
24 warehouses.

IVB-EP: 2 node, 10 cores, 2 thread per core:

tip/master+origin/master:

  Warehouses   Thrput
   4   196781
   8   358064
  12   511318
  16   589251
  20   656123
  24   710789
  28   765426
  32   787059
  36   777899
* 40   748568
 
Throughput  18258


  Warehouses   Thrput
   4   201598
   8   363470
  12   512968
  16   584289
  20   605299
  24   720142
  28   776066
  32   791263
  36   776965
* 40   760572
 
Throughput  18551



tip/master+origin/master-a43455a1d57

SPEC scores
  Warehouses   Thrput
   4   198667
   8   362481
  12   503344
  16   582602
  20   647688
  24   731639
  28   786135
  32   794124
  36   774567
* 40   757559
 
Throughput  18477



Given that there's fairly large variance between the two runs with the
commit in, I'm not sure I can say there's a problem here.

The one run without the patch is more or less between the two runs with
the patch.

And doing this many runs takes ages, so I'm not tempted to either make
the runs longer or do more of them.

Lemme try on a 4 node box though, who knows.


IVB-EP: 2 node, 10 cores, 2 thread per core
= on such system, I run only 20 warenhouses as maximum. (number of 
nodes * number of PHYSICAL cores)


The kernels you have tested shows following results:
656123/605299/647688


I'm doing 3 iterations (3 runs) to get some statistics. To speed up the 
test significantly please do the run with 20 warehouses only
(or in general with #warehouses ==  number of nodes * number of PHYSICAL 
cores)


Jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Peter Zijlstra
On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote:
 I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test
 significantly please do the run with 20 warehouses only
 (or in general with #warehouses ==  number of nodes * number of PHYSICAL
 cores)

Yeah, went and did that for my 4 node machine, its got a ton more cores, but I
matches the warehouses to it:

-a43455a1d57tip/master

979996.47   1144715.44
876146  1098499.07
1058974.18  1019499.38
1055951.59  1139405.22
970504.01   1099659.09

988314.45   1100355.64  (avg)
75059.546179565 50085.7473975167(stdev)

So for 5 runs, tip/master (which includes the offending patch) wins hands down.

Each run is 2 minutes.


pgpqMzlZNWcUU.pgp
Description: PGP signature


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote:
 On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
  On Tue, 29 Jul 2014 13:24:05 +0800
  Aaron Lu aaron...@intel.com wrote:
  
   FYI, we noticed the below changes on
   
   git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
   commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
   task_numa_migrate() checks the preferred node)
   
   ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
   ---  -  
94500 ~ 3%+115.6% 203711 ~ 6%  
   ivb42/hackbench/50%-threads-pipe
67745 ~ 4% +64.1% 74 ~ 5%  
   lkp-snb01/hackbench/50%-threads-socket
   162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
   proc-vmstat.numa_hint_faults_local
  
  Hi Aaron,
  
  Jirka Hladky has reported a regression with that changeset as
  well, and I have already spent some time debugging the issue.
 
 So assuming those numbers above are the difference in
 numa_hint_local_faults, the report is actually a significant
 _improvement_, not a regression.
 
 On my IVB-EP I get similar numbers; using:
 
   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
   perf bench sched messaging -g 24 -t -p -l 6
   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
   echo $((POST-PRE))
 
 
 tip/mater+origin/master   tip/master+origin/master-a43455a1d57
 
 local total   local   total
 faults  timefaults  time
 
 19971 51.384  10104   50.838
 17193 50.564  911650.208
 13435 49.057  833251.344
 23794 50.795  995451.364
 20255 49.463  959851.258
 
 18929.6   50.2526 9420.8  51.0024
 3863.61   0.96717.78  0.49
 
 So that patch improves both local faults and runtime. Its good (even
 though for the runtime we're still inside stdev overlap, so ideally I'd
 do more runs).
 
 
 Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
 that slightly reduces both again:
 
 tip/master+origin/master+patch
 
 local total
 faults  time
 
 21296 50.541
 12771 50.54
 13872 52.224
 23352 50.85
 16516 50.705
 
 17561.4   50.972
 4613.32   0.71
 
 So for hackbench a43455a1d57 is good and the proposed patch is making
 things worse.

It also seems to be the case on a 8-socket 80 core DL980:

tip/master baseline:
67276 169.590 [sec]
82400 188.406 [sec]
87827 201.122 [sec]
96659 228.243 [sec]
83180 192.422 [sec]

tip/master + a43455a1d57 reverted
36686 170.373 [sec]
52670 187.904 [sec]
55723 203.597 [sec]
41780 174.354 [sec]
36070 173.179 [sec]

Runtimes are pretty much all over the place, cannot really say if it's
gotten slower or faster. However, on avg, we nearly double the amount of
hint local faults with the commit in question.

After adding the proposed fix (NUMA_SCALE/8 variant), it goes down
again, closer to without a43455a1d57

tip/master + patch
50591 175.272 [sec]
57858 191.969 [sec]
77564 215.429 [sec]
50613 179.384 [sec]
61673 201.694 [sec]

 Let me see if I can still find my SPECjbb2005 copy to see what that
 does.

I'll try to dig it up as well.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
 On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
  On Tue, 29 Jul 2014 13:24:05 +0800
  Aaron Lu aaron...@intel.com wrote:
  
   FYI, we noticed the below changes on
   
   git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
   commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
   task_numa_migrate() checks the preferred node)
   
   ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
   ---  -  
94500 ~ 3%+115.6% 203711 ~ 6%  
   ivb42/hackbench/50%-threads-pipe
67745 ~ 4% +64.1% 74 ~ 5%  
   lkp-snb01/hackbench/50%-threads-socket
   162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
   proc-vmstat.numa_hint_faults_local
  
  Hi Aaron,
  
  Jirka Hladky has reported a regression with that changeset as
  well, and I have already spent some time debugging the issue.
 
 So assuming those numbers above are the difference in

Yes, they are.

It means, for commit ebe06187bf2aec1, the number for
num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
machine. The 3%, 4% following that number means the deviation of the
different runs to their average(we usually run it multiple times to
phase out possible sharp values). We should probably remove that
percentage, as they cause confusion if no detailed explanation and may
not mean much to the commit author and others(if the deviation is big
enough, we should simply drop that result).

The percentage in the middle is the change between the two commits.

Another thing is the meaning of the numbers, it doesn't seem that
evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
like this is better?

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  proc-vmstat.numa_hint_faults_local
---  -  -
 94500 +115.6% 203711   ivb42/hackbench/50%-threads-pipe
 67745  +64.1% 74   
lkp-snb01/hackbench/50%-threads-socket
162245  +94.1% 314885   TOTAL 

Regards,
Aaron

 numa_hint_local_faults, the report is actually a significant
 _improvement_, not a regression.
 
 On my IVB-EP I get similar numbers; using:
 
   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
   perf bench sched messaging -g 24 -t -p -l 6
   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
   echo $((POST-PRE))
 
 
 tip/mater+origin/master   tip/master+origin/master-a43455a1d57
 
 local total   local   total
 faults  timefaults  time
 
 19971 51.384  10104   50.838
 17193 50.564  911650.208
 13435 49.057  833251.344
 23794 50.795  995451.364
 20255 49.463  959851.258
 
 18929.6   50.2526 9420.8  51.0024
 3863.61   0.96717.78  0.49
 
 So that patch improves both local faults and runtime. Its good (even
 though for the runtime we're still inside stdev overlap, so ideally I'd
 do more runs).
 
 
 Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
 that slightly reduces both again:
 
 tip/master+origin/master+patch
 
 local total
 faults  time
 
 21296 50.541
 12771 50.54
 13872 52.224
 23352 50.85
 16516 50.705
 
 17561.4   50.972
 4613.32   0.71
 
 So for hackbench a43455a1d57 is good and the proposed patch is making
 things worse.
 
 Let me see if I can still find my SPECjbb2005 copy to see what that
 does.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote:
 On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
  On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
   On Tue, 29 Jul 2014 13:24:05 +0800
   Aaron Lu aaron...@intel.com wrote:
   
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
task_numa_migrate() checks the preferred node)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 94500 ~ 3%+115.6% 203711 ~ 6%  
ivb42/hackbench/50%-threads-pipe
 67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local
   
   Hi Aaron,
   
   Jirka Hladky has reported a regression with that changeset as
   well, and I have already spent some time debugging the issue.
  
  So assuming those numbers above are the difference in
 
 Yes, they are.
 
 It means, for commit ebe06187bf2aec1, the number for
 num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
 machine. The 3%, 4% following that number means the deviation of the
 different runs to their average(we usually run it multiple times to
 phase out possible sharp values). We should probably remove that
 percentage, as they cause confusion if no detailed explanation and may
 not mean much to the commit author and others(if the deviation is big
 enough, we should simply drop that result).
 
 The percentage in the middle is the change between the two commits.
 
 Another thing is the meaning of the numbers, it doesn't seem that
 evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
 like this is better?

Instead of removing info, why not document what each piece of data
represents. Or add headers to the table. etc.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-30 Thread Aaron Lu
On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
> On 07/29/2014 10:14 PM, Aaron Lu wrote:
> > On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
> >> On Tue, 29 Jul 2014 10:17:12 +0200
> >> Peter Zijlstra  wrote:
> >>
>  +#define NUMA_SCALE 1000
>  +#define NUMA_MOVE_THRESH 50
> >>>
> >>> Please make that 1024, there's no reason not to use power of two here.
> >>> This base 10 factor thing annoyed me no end already, its time for it to
> >>> die.
> >>
> >> That's easy enough.  However, it would be good to know whether
> >> this actually helps with the regression Aaron found :)
> > 
> > Sorry for the delay.
> > 
> > I applied the last patch and queued the hackbench job to the ivb42 test
> > machine for it to run 5 times, and here is the result(regarding the
> > proc-vmstat.numa_hint_faults_local field):
> > 173565
> > 201262
> > 192317
> > 198342
> > 198595
> > avg:
> > 192816
> > 
> > It seems it is still very big than previous kernels.
> 
> It looks like a step in the right direction, though.
> 
> Could you try running with a larger threshold?
> 
> >> +++ b/kernel/sched/fair.c
> >> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
> >> numa_group *group, int nid)
> >>  
> >>  /*
> >>   * These return the fraction of accesses done by a particular task, or
> >> - * task group, on a particular numa node.  The group weight is given a
> >> - * larger multiplier, in order to group tasks together that are almost
> >> - * evenly spread out between numa nodes.
> >> + * task group, on a particular numa node.  The NUMA move threshold
> >> + * prevents task moves with marginal improvement, and is set to 5%.
> >>   */
> >> +#define NUMA_SCALE 1024
> >> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
> 
> It would be good to see if changing NUMA_MOVE_THRESH to
> (NUMA_SCALE / 8) does the trick.

With your 2nd patch and the above change, the result is:

"proc-vmstat.numa_hint_faults_local": [
  199708,
  209152,
  200638,
  187324,
  196654
  ],

avg:
198695

Regards,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-30 Thread Rik van Riel
On 07/29/2014 10:14 PM, Aaron Lu wrote:
> On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
>> On Tue, 29 Jul 2014 10:17:12 +0200
>> Peter Zijlstra  wrote:
>>
 +#define NUMA_SCALE 1000
 +#define NUMA_MOVE_THRESH 50
>>>
>>> Please make that 1024, there's no reason not to use power of two here.
>>> This base 10 factor thing annoyed me no end already, its time for it to
>>> die.
>>
>> That's easy enough.  However, it would be good to know whether
>> this actually helps with the regression Aaron found :)
> 
> Sorry for the delay.
> 
> I applied the last patch and queued the hackbench job to the ivb42 test
> machine for it to run 5 times, and here is the result(regarding the
> proc-vmstat.numa_hint_faults_local field):
> 173565
> 201262
> 192317
> 198342
> 198595
> avg:
> 192816
> 
> It seems it is still very big than previous kernels.

It looks like a step in the right direction, though.

Could you try running with a larger threshold?

>> +++ b/kernel/sched/fair.c
>> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
>> numa_group *group, int nid)
>>  
>>  /*
>>   * These return the fraction of accesses done by a particular task, or
>> - * task group, on a particular numa node.  The group weight is given a
>> - * larger multiplier, in order to group tasks together that are almost
>> - * evenly spread out between numa nodes.
>> + * task group, on a particular numa node.  The NUMA move threshold
>> + * prevents task moves with marginal improvement, and is set to 5%.
>>   */
>> +#define NUMA_SCALE 1024
>> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)

It would be good to see if changing NUMA_MOVE_THRESH to
(NUMA_SCALE / 8) does the trick.

I will run the same thing here with SPECjbb2005.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-30 Thread Rik van Riel
On 07/29/2014 10:14 PM, Aaron Lu wrote:
 On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 10:17:12 +0200
 Peter Zijlstra pet...@infradead.org wrote:

 +#define NUMA_SCALE 1000
 +#define NUMA_MOVE_THRESH 50

 Please make that 1024, there's no reason not to use power of two here.
 This base 10 factor thing annoyed me no end already, its time for it to
 die.

 That's easy enough.  However, it would be good to know whether
 this actually helps with the regression Aaron found :)
 
 Sorry for the delay.
 
 I applied the last patch and queued the hackbench job to the ivb42 test
 machine for it to run 5 times, and here is the result(regarding the
 proc-vmstat.numa_hint_faults_local field):
 173565
 201262
 192317
 198342
 198595
 avg:
 192816
 
 It seems it is still very big than previous kernels.

It looks like a step in the right direction, though.

Could you try running with a larger threshold?

 +++ b/kernel/sched/fair.c
 @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
 numa_group *group, int nid)
  
  /*
   * These return the fraction of accesses done by a particular task, or
 - * task group, on a particular numa node.  The group weight is given a
 - * larger multiplier, in order to group tasks together that are almost
 - * evenly spread out between numa nodes.
 + * task group, on a particular numa node.  The NUMA move threshold
 + * prevents task moves with marginal improvement, and is set to 5%.
   */
 +#define NUMA_SCALE 1024
 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)

It would be good to see if changing NUMA_MOVE_THRESH to
(NUMA_SCALE / 8) does the trick.

I will run the same thing here with SPECjbb2005.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-30 Thread Aaron Lu
On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote:
 On 07/29/2014 10:14 PM, Aaron Lu wrote:
  On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
  On Tue, 29 Jul 2014 10:17:12 +0200
  Peter Zijlstra pet...@infradead.org wrote:
 
  +#define NUMA_SCALE 1000
  +#define NUMA_MOVE_THRESH 50
 
  Please make that 1024, there's no reason not to use power of two here.
  This base 10 factor thing annoyed me no end already, its time for it to
  die.
 
  That's easy enough.  However, it would be good to know whether
  this actually helps with the regression Aaron found :)
  
  Sorry for the delay.
  
  I applied the last patch and queued the hackbench job to the ivb42 test
  machine for it to run 5 times, and here is the result(regarding the
  proc-vmstat.numa_hint_faults_local field):
  173565
  201262
  192317
  198342
  198595
  avg:
  192816
  
  It seems it is still very big than previous kernels.
 
 It looks like a step in the right direction, though.
 
 Could you try running with a larger threshold?
 
  +++ b/kernel/sched/fair.c
  @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
  numa_group *group, int nid)
   
   /*
* These return the fraction of accesses done by a particular task, or
  - * task group, on a particular numa node.  The group weight is given a
  - * larger multiplier, in order to group tasks together that are almost
  - * evenly spread out between numa nodes.
  + * task group, on a particular numa node.  The NUMA move threshold
  + * prevents task moves with marginal improvement, and is set to 5%.
*/
  +#define NUMA_SCALE 1024
  +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
 
 It would be good to see if changing NUMA_MOVE_THRESH to
 (NUMA_SCALE / 8) does the trick.

With your 2nd patch and the above change, the result is:

proc-vmstat.numa_hint_faults_local: [
  199708,
  209152,
  200638,
  187324,
  196654
  ],

avg:
198695

Regards,
Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Aaron Lu
On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
> On Tue, 29 Jul 2014 10:17:12 +0200
> Peter Zijlstra  wrote:
> 
> > > +#define NUMA_SCALE 1000
> > > +#define NUMA_MOVE_THRESH 50
> > 
> > Please make that 1024, there's no reason not to use power of two here.
> > This base 10 factor thing annoyed me no end already, its time for it to
> > die.
> 
> That's easy enough.  However, it would be good to know whether
> this actually helps with the regression Aaron found :)

Sorry for the delay.

I applied the last patch and queued the hackbench job to the ivb42 test
machine for it to run 5 times, and here is the result(regarding the
proc-vmstat.numa_hint_faults_local field):
173565
201262
192317
198342
198595
avg:
192816

It seems it is still very big than previous kernels.

BTW, to highlight changes, we only include metrics that have changed a
lot in the report, which means, for metrics that don't show in the
report, it means it doesn't change much. But just in case, here is the
throughput metric regarding commit a43455a1d(compared to its parent):

ebe06187bf2aec1   a43455a1d572daf7b730fe12e  
---   -  
118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe
 78410 ~ 0%+0.6% 78857 ~ 0% 
lkp-snb01/hackbench/50%-threads-socket
197292 ~ 0%+1.0%199182 ~ 0% TOTAL hackbench.throughput

Feel free to let me know if you need more information.

Thanks,
Aaron

> 
> ---8<---
> 
> Subject: sched,numa: prevent task moves with marginal benefit
> 
> Commit a43455a1d57 makes task_numa_migrate() always check the
> preferred node for task placement. This is causing a performance
> regression with hackbench, as well as SPECjbb2005.
> 
> Tracing task_numa_compare() with a single instance of SPECjbb2005
> on a 4 node system, I have seen several thread swaps with tiny
> improvements. 
> 
> It appears that the hysteresis code that was added to task_numa_compare
> is not doing what we needed it to do, and a simple threshold could be
> better.
> 
> Aaron, does this patch help, or am I barking up the wrong tree?
> 
> Reported-by: Aaron Lu 
> Reported-by: Jirka Hladky 
> Signed-off-by: Rik van Riel 
> ---
>  kernel/sched/fair.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f5e3c2..9bd283b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
> numa_group *group, int nid)
>  
>  /*
>   * These return the fraction of accesses done by a particular task, or
> - * task group, on a particular numa node.  The group weight is given a
> - * larger multiplier, in order to group tasks together that are almost
> - * evenly spread out between numa nodes.
> + * task group, on a particular numa node.  The NUMA move threshold
> + * prevents task moves with marginal improvement, and is set to 5%.
>   */
> +#define NUMA_SCALE 1024
> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
> +
>  static inline unsigned long task_weight(struct task_struct *p, int nid)
>  {
>   unsigned long total_faults;
> @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct 
> task_struct *p, int nid)
>   if (!total_faults)
>   return 0;
>  
> - return 1000 * task_faults(p, nid) / total_faults;
> + return NUMA_SCALE * task_faults(p, nid) / total_faults;
>  }
>  
>  static inline unsigned long group_weight(struct task_struct *p, int nid)
> @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct 
> task_struct *p, int nid)
>   if (!p->numa_group || !p->numa_group->total_faults)
>   return 0;
>  
> - return 1000 * group_faults(p, nid) / p->numa_group->total_faults;
> + return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults;
>  }
>  
>  bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
> @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env 
> *env,
>   imp = taskimp + task_weight(cur, env->src_nid) -
> task_weight(cur, env->dst_nid);
>   /*
> -  * Add some hysteresis to prevent swapping the
> -  * tasks within a group over tiny differences.
> +  * Do not swap tasks within a group around unless
> +  * there is a significant improvement.
>*/
> - if (cur->numa_group)
> - imp -= imp/16;
> + if (cur->numa_group && imp < NUMA_MOVE_THRESH)
> + goto unlock;
>   } else {
>   /*
>* Compare the group weights. If a task is all by
> @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env 
> *env,
>   goto unlock;
>  
>   

Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Rik van Riel
On Tue, 29 Jul 2014 10:17:12 +0200
Peter Zijlstra  wrote:

> > +#define NUMA_SCALE 1000
> > +#define NUMA_MOVE_THRESH 50
> 
> Please make that 1024, there's no reason not to use power of two here.
> This base 10 factor thing annoyed me no end already, its time for it to
> die.

That's easy enough.  However, it would be good to know whether
this actually helps with the regression Aaron found :)

---8<---

Subject: sched,numa: prevent task moves with marginal benefit

Commit a43455a1d57 makes task_numa_migrate() always check the
preferred node for task placement. This is causing a performance
regression with hackbench, as well as SPECjbb2005.

Tracing task_numa_compare() with a single instance of SPECjbb2005
on a 4 node system, I have seen several thread swaps with tiny
improvements. 

It appears that the hysteresis code that was added to task_numa_compare
is not doing what we needed it to do, and a simple threshold could be
better.

Aaron, does this patch help, or am I barking up the wrong tree?

Reported-by: Aaron Lu 
Reported-by: Jirka Hladky 
Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f5e3c2..9bd283b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
numa_group *group, int nid)
 
 /*
  * These return the fraction of accesses done by a particular task, or
- * task group, on a particular numa node.  The group weight is given a
- * larger multiplier, in order to group tasks together that are almost
- * evenly spread out between numa nodes.
+ * task group, on a particular numa node.  The NUMA move threshold
+ * prevents task moves with marginal improvement, and is set to 5%.
  */
+#define NUMA_SCALE 1024
+#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
+
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
unsigned long total_faults;
@@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct 
*p, int nid)
if (!total_faults)
return 0;
 
-   return 1000 * task_faults(p, nid) / total_faults;
+   return NUMA_SCALE * task_faults(p, nid) / total_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
@@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct 
*p, int nid)
if (!p->numa_group || !p->numa_group->total_faults)
return 0;
 
-   return 1000 * group_faults(p, nid) / p->numa_group->total_faults;
+   return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults;
 }
 
 bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
@@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env,
imp = taskimp + task_weight(cur, env->src_nid) -
  task_weight(cur, env->dst_nid);
/*
-* Add some hysteresis to prevent swapping the
-* tasks within a group over tiny differences.
+* Do not swap tasks within a group around unless
+* there is a significant improvement.
 */
-   if (cur->numa_group)
-   imp -= imp/16;
+   if (cur->numa_group && imp < NUMA_MOVE_THRESH)
+   goto unlock;
} else {
/*
 * Compare the group weights. If a task is all by
@@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
 
if (!cur) {
+   /* Only move if there is a significant improvement. */
+   if (imp < NUMA_MOVE_THRESH)
+   goto unlock;
+
/* Is there capacity at our destination? */
if (env->src_stats.has_free_capacity &&
!env->dst_stats.has_free_capacity)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Peter Zijlstra
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> Subject: sched,numa: prevent task moves with marginal benefit
> 
> Commit a43455a1d57 makes task_numa_migrate() always check the
> preferred node for task placement. This is causing a performance
> regression with hackbench, as well as SPECjbb2005.
> 
> Tracing task_numa_compare() with a single instance of SPECjbb2005
> on a 4 node system, I have seen several thread swaps with tiny
> improvements. 
> 
> It appears that the hysteresis code that was added to task_numa_compare
> is not doing what we needed it to do, and a simple threshold could be
> better.
> 
> Reported-by: Aaron Lu 
> Reported-by: Jirka Hladky 
> Signed-off-by: Rik van Riel 
> ---
>  kernel/sched/fair.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4f5e3c2..bedbc3e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
> numa_group *group, int nid)
>  
>  /*
>   * These return the fraction of accesses done by a particular task, or
> - * task group, on a particular numa node.  The group weight is given a
> - * larger multiplier, in order to group tasks together that are almost
> - * evenly spread out between numa nodes.
> + * task group, on a particular numa node.  The NUMA move threshold
> + * prevents task moves with marginal improvement, and is set to 5%.
>   */
> +#define NUMA_SCALE 1000
> +#define NUMA_MOVE_THRESH 50

Please make that 1024, there's no reason not to use power of two here.
This base 10 factor thing annoyed me no end already, its time for it to
die.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Rik van Riel
On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu  wrote:

> FYI, we noticed the below changes on
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> task_numa_migrate() checks the preferred node")
> 
> ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> ---  -  
>  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
>  67745 ~ 4% +64.1% 74 ~ 5%  
> lkp-snb01/hackbench/50%-threads-socket
> 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

I added tracing code to task_numa_compare() and saw a number
of thread swaps with tiny improvements.

Does preventing those help your workload, or am I barking up
the wrong tree again?  (I have been looking at this for a while...)

---8<---

Subject: sched,numa: prevent task moves with marginal benefit

Commit a43455a1d57 makes task_numa_migrate() always check the
preferred node for task placement. This is causing a performance
regression with hackbench, as well as SPECjbb2005.

Tracing task_numa_compare() with a single instance of SPECjbb2005
on a 4 node system, I have seen several thread swaps with tiny
improvements. 

It appears that the hysteresis code that was added to task_numa_compare
is not doing what we needed it to do, and a simple threshold could be
better.

Reported-by: Aaron Lu 
Reported-by: Jirka Hladky 
Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f5e3c2..bedbc3e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
numa_group *group, int nid)
 
 /*
  * These return the fraction of accesses done by a particular task, or
- * task group, on a particular numa node.  The group weight is given a
- * larger multiplier, in order to group tasks together that are almost
- * evenly spread out between numa nodes.
+ * task group, on a particular numa node.  The NUMA move threshold
+ * prevents task moves with marginal improvement, and is set to 5%.
  */
+#define NUMA_SCALE 1000
+#define NUMA_MOVE_THRESH 50
+
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
unsigned long total_faults;
@@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct 
*p, int nid)
if (!total_faults)
return 0;
 
-   return 1000 * task_faults(p, nid) / total_faults;
+   return NUMA_SCALE * task_faults(p, nid) / total_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
@@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct 
*p, int nid)
if (!p->numa_group || !p->numa_group->total_faults)
return 0;
 
-   return 1000 * group_faults(p, nid) / p->numa_group->total_faults;
+   return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults;
 }
 
 bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
@@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env,
imp = taskimp + task_weight(cur, env->src_nid) -
  task_weight(cur, env->dst_nid);
/*
-* Add some hysteresis to prevent swapping the
-* tasks within a group over tiny differences.
+* Do not swap tasks within a group around unless
+* there is a significant improvement.
 */
-   if (cur->numa_group)
-   imp -= imp/16;
+   if (cur->numa_group && imp < NUMA_MOVE_THRESH)
+   goto unlock;
} else {
/*
 * Compare the group weights. If a task is all by
@@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
 
if (!cur) {
+   /* Only move if there is a significant improvement. */
+   if (imp < NUMA_MOVE_THRESH)
+   goto unlock;
+
/* Is there capacity at our destination? */
if (env->src_stats.has_free_capacity &&
!env->dst_stats.has_free_capacity)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Rik van Riel
On Tue, 29 Jul 2014 13:24:05 +0800
Aaron Lu aaron...@intel.com wrote:

 FYI, we noticed the below changes on
 
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
 commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
 task_numa_migrate() checks the preferred node)
 
 ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
 ---  -  
  94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
  67745 ~ 4% +64.1% 74 ~ 5%  
 lkp-snb01/hackbench/50%-threads-socket
 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
 proc-vmstat.numa_hint_faults_local

Hi Aaron,

Jirka Hladky has reported a regression with that changeset as
well, and I have already spent some time debugging the issue.

I added tracing code to task_numa_compare() and saw a number
of thread swaps with tiny improvements.

Does preventing those help your workload, or am I barking up
the wrong tree again?  (I have been looking at this for a while...)

---8---

Subject: sched,numa: prevent task moves with marginal benefit

Commit a43455a1d57 makes task_numa_migrate() always check the
preferred node for task placement. This is causing a performance
regression with hackbench, as well as SPECjbb2005.

Tracing task_numa_compare() with a single instance of SPECjbb2005
on a 4 node system, I have seen several thread swaps with tiny
improvements. 

It appears that the hysteresis code that was added to task_numa_compare
is not doing what we needed it to do, and a simple threshold could be
better.

Reported-by: Aaron Lu aaron...@intel.com
Reported-by: Jirka Hladky jhla...@redhat.com
Signed-off-by: Rik van Riel r...@redhat.com
---
 kernel/sched/fair.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f5e3c2..bedbc3e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
numa_group *group, int nid)
 
 /*
  * These return the fraction of accesses done by a particular task, or
- * task group, on a particular numa node.  The group weight is given a
- * larger multiplier, in order to group tasks together that are almost
- * evenly spread out between numa nodes.
+ * task group, on a particular numa node.  The NUMA move threshold
+ * prevents task moves with marginal improvement, and is set to 5%.
  */
+#define NUMA_SCALE 1000
+#define NUMA_MOVE_THRESH 50
+
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
unsigned long total_faults;
@@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct 
*p, int nid)
if (!total_faults)
return 0;
 
-   return 1000 * task_faults(p, nid) / total_faults;
+   return NUMA_SCALE * task_faults(p, nid) / total_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
@@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct 
*p, int nid)
if (!p-numa_group || !p-numa_group-total_faults)
return 0;
 
-   return 1000 * group_faults(p, nid) / p-numa_group-total_faults;
+   return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults;
 }
 
 bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
@@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env,
imp = taskimp + task_weight(cur, env-src_nid) -
  task_weight(cur, env-dst_nid);
/*
-* Add some hysteresis to prevent swapping the
-* tasks within a group over tiny differences.
+* Do not swap tasks within a group around unless
+* there is a significant improvement.
 */
-   if (cur-numa_group)
-   imp -= imp/16;
+   if (cur-numa_group  imp  NUMA_MOVE_THRESH)
+   goto unlock;
} else {
/*
 * Compare the group weights. If a task is all by
@@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
 
if (!cur) {
+   /* Only move if there is a significant improvement. */
+   if (imp  NUMA_MOVE_THRESH)
+   goto unlock;
+
/* Is there capacity at our destination? */
if (env-src_stats.has_free_capacity 
!env-dst_stats.has_free_capacity)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Peter Zijlstra
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
 Subject: sched,numa: prevent task moves with marginal benefit
 
 Commit a43455a1d57 makes task_numa_migrate() always check the
 preferred node for task placement. This is causing a performance
 regression with hackbench, as well as SPECjbb2005.
 
 Tracing task_numa_compare() with a single instance of SPECjbb2005
 on a 4 node system, I have seen several thread swaps with tiny
 improvements. 
 
 It appears that the hysteresis code that was added to task_numa_compare
 is not doing what we needed it to do, and a simple threshold could be
 better.
 
 Reported-by: Aaron Lu aaron...@intel.com
 Reported-by: Jirka Hladky jhla...@redhat.com
 Signed-off-by: Rik van Riel r...@redhat.com
 ---
  kernel/sched/fair.c | 24 +++-
  1 file changed, 15 insertions(+), 9 deletions(-)
 
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 4f5e3c2..bedbc3e 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
 numa_group *group, int nid)
  
  /*
   * These return the fraction of accesses done by a particular task, or
 - * task group, on a particular numa node.  The group weight is given a
 - * larger multiplier, in order to group tasks together that are almost
 - * evenly spread out between numa nodes.
 + * task group, on a particular numa node.  The NUMA move threshold
 + * prevents task moves with marginal improvement, and is set to 5%.
   */
 +#define NUMA_SCALE 1000
 +#define NUMA_MOVE_THRESH 50

Please make that 1024, there's no reason not to use power of two here.
This base 10 factor thing annoyed me no end already, its time for it to
die.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Rik van Riel
On Tue, 29 Jul 2014 10:17:12 +0200
Peter Zijlstra pet...@infradead.org wrote:

  +#define NUMA_SCALE 1000
  +#define NUMA_MOVE_THRESH 50
 
 Please make that 1024, there's no reason not to use power of two here.
 This base 10 factor thing annoyed me no end already, its time for it to
 die.

That's easy enough.  However, it would be good to know whether
this actually helps with the regression Aaron found :)

---8---

Subject: sched,numa: prevent task moves with marginal benefit

Commit a43455a1d57 makes task_numa_migrate() always check the
preferred node for task placement. This is causing a performance
regression with hackbench, as well as SPECjbb2005.

Tracing task_numa_compare() with a single instance of SPECjbb2005
on a 4 node system, I have seen several thread swaps with tiny
improvements. 

It appears that the hysteresis code that was added to task_numa_compare
is not doing what we needed it to do, and a simple threshold could be
better.

Aaron, does this patch help, or am I barking up the wrong tree?

Reported-by: Aaron Lu aaron...@intel.com
Reported-by: Jirka Hladky jhla...@redhat.com
Signed-off-by: Rik van Riel r...@redhat.com
---
 kernel/sched/fair.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4f5e3c2..9bd283b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
numa_group *group, int nid)
 
 /*
  * These return the fraction of accesses done by a particular task, or
- * task group, on a particular numa node.  The group weight is given a
- * larger multiplier, in order to group tasks together that are almost
- * evenly spread out between numa nodes.
+ * task group, on a particular numa node.  The NUMA move threshold
+ * prevents task moves with marginal improvement, and is set to 5%.
  */
+#define NUMA_SCALE 1024
+#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
+
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
unsigned long total_faults;
@@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct 
*p, int nid)
if (!total_faults)
return 0;
 
-   return 1000 * task_faults(p, nid) / total_faults;
+   return NUMA_SCALE * task_faults(p, nid) / total_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
@@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct 
*p, int nid)
if (!p-numa_group || !p-numa_group-total_faults)
return 0;
 
-   return 1000 * group_faults(p, nid) / p-numa_group-total_faults;
+   return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults;
 }
 
 bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
@@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env,
imp = taskimp + task_weight(cur, env-src_nid) -
  task_weight(cur, env-dst_nid);
/*
-* Add some hysteresis to prevent swapping the
-* tasks within a group over tiny differences.
+* Do not swap tasks within a group around unless
+* there is a significant improvement.
 */
-   if (cur-numa_group)
-   imp -= imp/16;
+   if (cur-numa_group  imp  NUMA_MOVE_THRESH)
+   goto unlock;
} else {
/*
 * Compare the group weights. If a task is all by
@@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
 
if (!cur) {
+   /* Only move if there is a significant improvement. */
+   if (imp  NUMA_MOVE_THRESH)
+   goto unlock;
+
/* Is there capacity at our destination? */
if (env-src_stats.has_free_capacity 
!env-dst_stats.has_free_capacity)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-29 Thread Aaron Lu
On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote:
 On Tue, 29 Jul 2014 10:17:12 +0200
 Peter Zijlstra pet...@infradead.org wrote:
 
   +#define NUMA_SCALE 1000
   +#define NUMA_MOVE_THRESH 50
  
  Please make that 1024, there's no reason not to use power of two here.
  This base 10 factor thing annoyed me no end already, its time for it to
  die.
 
 That's easy enough.  However, it would be good to know whether
 this actually helps with the regression Aaron found :)

Sorry for the delay.

I applied the last patch and queued the hackbench job to the ivb42 test
machine for it to run 5 times, and here is the result(regarding the
proc-vmstat.numa_hint_faults_local field):
173565
201262
192317
198342
198595
avg:
192816

It seems it is still very big than previous kernels.

BTW, to highlight changes, we only include metrics that have changed a
lot in the report, which means, for metrics that don't show in the
report, it means it doesn't change much. But just in case, here is the
throughput metric regarding commit a43455a1d(compared to its parent):

ebe06187bf2aec1   a43455a1d572daf7b730fe12e  
---   -  
118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe
 78410 ~ 0%+0.6% 78857 ~ 0% 
lkp-snb01/hackbench/50%-threads-socket
197292 ~ 0%+1.0%199182 ~ 0% TOTAL hackbench.throughput

Feel free to let me know if you need more information.

Thanks,
Aaron

 
 ---8---
 
 Subject: sched,numa: prevent task moves with marginal benefit
 
 Commit a43455a1d57 makes task_numa_migrate() always check the
 preferred node for task placement. This is causing a performance
 regression with hackbench, as well as SPECjbb2005.
 
 Tracing task_numa_compare() with a single instance of SPECjbb2005
 on a 4 node system, I have seen several thread swaps with tiny
 improvements. 
 
 It appears that the hysteresis code that was added to task_numa_compare
 is not doing what we needed it to do, and a simple threshold could be
 better.
 
 Aaron, does this patch help, or am I barking up the wrong tree?
 
 Reported-by: Aaron Lu aaron...@intel.com
 Reported-by: Jirka Hladky jhla...@redhat.com
 Signed-off-by: Rik van Riel r...@redhat.com
 ---
  kernel/sched/fair.c | 24 +++-
  1 file changed, 15 insertions(+), 9 deletions(-)
 
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 4f5e3c2..9bd283b 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct 
 numa_group *group, int nid)
  
  /*
   * These return the fraction of accesses done by a particular task, or
 - * task group, on a particular numa node.  The group weight is given a
 - * larger multiplier, in order to group tasks together that are almost
 - * evenly spread out between numa nodes.
 + * task group, on a particular numa node.  The NUMA move threshold
 + * prevents task moves with marginal improvement, and is set to 5%.
   */
 +#define NUMA_SCALE 1024
 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100)
 +
  static inline unsigned long task_weight(struct task_struct *p, int nid)
  {
   unsigned long total_faults;
 @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct 
 task_struct *p, int nid)
   if (!total_faults)
   return 0;
  
 - return 1000 * task_faults(p, nid) / total_faults;
 + return NUMA_SCALE * task_faults(p, nid) / total_faults;
  }
  
  static inline unsigned long group_weight(struct task_struct *p, int nid)
 @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct 
 task_struct *p, int nid)
   if (!p-numa_group || !p-numa_group-total_faults)
   return 0;
  
 - return 1000 * group_faults(p, nid) / p-numa_group-total_faults;
 + return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults;
  }
  
  bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
 @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env 
 *env,
   imp = taskimp + task_weight(cur, env-src_nid) -
 task_weight(cur, env-dst_nid);
   /*
 -  * Add some hysteresis to prevent swapping the
 -  * tasks within a group over tiny differences.
 +  * Do not swap tasks within a group around unless
 +  * there is a significant improvement.
*/
 - if (cur-numa_group)
 - imp -= imp/16;
 + if (cur-numa_group  imp  NUMA_MOVE_THRESH)
 + goto unlock;
   } else {
   /*
* Compare the group weights. If a task is all by
 @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env 
 *env,
   goto unlock;
  
   if (!cur) {
 + /* Only move if there is a 

[LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-28 Thread Aaron Lu
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
task_numa_migrate() checks the preferred node")

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
 67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
147474 ~ 3% +70.6% 251650 ~ 5%  ivb42/hackbench/50%-threads-pipe
 94889 ~ 3% +46.3% 138815 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
242364 ~ 3% +61.1% 390465 ~ 5%  TOTAL proc-vmstat.numa_pte_updates

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
147104 ~ 3% +69.5% 249306 ~ 5%  ivb42/hackbench/50%-threads-pipe
 94431 ~ 3% +43.9% 135902 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
241535 ~ 3% +59.5% 385209 ~ 5%  TOTAL proc-vmstat.numa_hint_faults

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   308 ~ 8% +24.1%382 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
   308 ~ 8% +24.1%382 ~ 5%  TOTAL 
numa-vmstat.node0.nr_page_table_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  1234 ~ 8% +24.0%   1530 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
  1234 ~ 8% +24.0%   1530 ~ 5%  TOTAL numa-meminfo.node0.PageTables

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   381 ~ 6% -17.9%313 ~ 6%  
lkp-snb01/hackbench/50%-threads-socket
   381 ~ 6% -17.9%313 ~ 6%  TOTAL 
numa-vmstat.node1.nr_page_table_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  1528 ~ 6% -18.0%   1253 ~ 6%  
lkp-snb01/hackbench/50%-threads-socket
  1528 ~ 6% -18.0%   1253 ~ 6%  TOTAL numa-meminfo.node1.PageTables

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 24533 ~ 2% -16.2%  20560 ~ 3%  ivb42/hackbench/50%-threads-pipe
 13551 ~ 2% -10.7%  12096 ~ 2%  
lkp-snb01/hackbench/50%-threads-socket
 38084 ~ 2% -14.2%  32657 ~ 3%  TOTAL 
proc-vmstat.numa_pages_migrated

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 24533 ~ 2% -16.2%  20560 ~ 3%  ivb42/hackbench/50%-threads-pipe
 13551 ~ 2% -10.7%  12096 ~ 2%  
lkp-snb01/hackbench/50%-threads-socket
 38084 ~ 2% -14.2%  32657 ~ 3%  TOTAL proc-vmstat.pgmigrate_success

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  3538 ~ 7% +11.6%   3949 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
  3538 ~ 7% +11.6%   3949 ~ 7%  TOTAL 
numa-vmstat.node0.nr_anon_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 14154 ~ 7% +11.6%  15799 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
 14154 ~ 7% +11.6%  15799 ~ 7%  TOTAL numa-meminfo.node0.AnonPages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  3511 ~ 7% +11.0%   3898 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
  3511 ~ 7% +11.0%   3898 ~ 7%  TOTAL 
numa-vmstat.node0.nr_active_anon

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 14044 ~ 7% +11.1%  15597 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
 14044 ~ 7% +11.1%  15597 ~ 7%  TOTAL 
numa-meminfo.node0.Active(anon)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
187958 ~ 2% +56.6% 294375 ~ 5%  ivb42/hackbench/50%-threads-pipe
124490 ~ 2% +35.0% 168004 ~ 4%  
lkp-snb01/hackbench/50%-threads-socket
312448 ~ 2% +48.0% 462379 ~ 5%  TOTAL time.minor_page_faults

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 11.47 ~ 1%  -2.8%  11.15 ~ 1%  ivb42/hackbench/50%-threads-pipe
 11.47 ~ 1%  -2.8%  11.15 ~ 1%  TOTAL turbostat.RAM_W

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 3.649e+08 ~ 0%  -2.4%  3.562e+08 ~ 0%  
lkp-snb01/hackbench/50%-threads-socket
 3.649e+08 ~ 0%  -2.4%  3.562e+08 ~ 0%  TOTAL 
time.involuntary_context_switches

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   1924472 ~ 0%  -2.6%1874425 ~ 0%  ivb42/hackbench/50%-threads-pipe
   1924472 ~ 0%  -2.6%1874425 ~ 0%  

[LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-28 Thread Aaron Lu
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure 
task_numa_migrate() checks the preferred node)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 94500 ~ 3%+115.6% 203711 ~ 6%  ivb42/hackbench/50%-threads-pipe
 67745 ~ 4% +64.1% 74 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
proc-vmstat.numa_hint_faults_local

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
147474 ~ 3% +70.6% 251650 ~ 5%  ivb42/hackbench/50%-threads-pipe
 94889 ~ 3% +46.3% 138815 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
242364 ~ 3% +61.1% 390465 ~ 5%  TOTAL proc-vmstat.numa_pte_updates

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
147104 ~ 3% +69.5% 249306 ~ 5%  ivb42/hackbench/50%-threads-pipe
 94431 ~ 3% +43.9% 135902 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
241535 ~ 3% +59.5% 385209 ~ 5%  TOTAL proc-vmstat.numa_hint_faults

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   308 ~ 8% +24.1%382 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
   308 ~ 8% +24.1%382 ~ 5%  TOTAL 
numa-vmstat.node0.nr_page_table_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  1234 ~ 8% +24.0%   1530 ~ 5%  
lkp-snb01/hackbench/50%-threads-socket
  1234 ~ 8% +24.0%   1530 ~ 5%  TOTAL numa-meminfo.node0.PageTables

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   381 ~ 6% -17.9%313 ~ 6%  
lkp-snb01/hackbench/50%-threads-socket
   381 ~ 6% -17.9%313 ~ 6%  TOTAL 
numa-vmstat.node1.nr_page_table_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  1528 ~ 6% -18.0%   1253 ~ 6%  
lkp-snb01/hackbench/50%-threads-socket
  1528 ~ 6% -18.0%   1253 ~ 6%  TOTAL numa-meminfo.node1.PageTables

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 24533 ~ 2% -16.2%  20560 ~ 3%  ivb42/hackbench/50%-threads-pipe
 13551 ~ 2% -10.7%  12096 ~ 2%  
lkp-snb01/hackbench/50%-threads-socket
 38084 ~ 2% -14.2%  32657 ~ 3%  TOTAL 
proc-vmstat.numa_pages_migrated

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 24533 ~ 2% -16.2%  20560 ~ 3%  ivb42/hackbench/50%-threads-pipe
 13551 ~ 2% -10.7%  12096 ~ 2%  
lkp-snb01/hackbench/50%-threads-socket
 38084 ~ 2% -14.2%  32657 ~ 3%  TOTAL proc-vmstat.pgmigrate_success

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  3538 ~ 7% +11.6%   3949 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
  3538 ~ 7% +11.6%   3949 ~ 7%  TOTAL 
numa-vmstat.node0.nr_anon_pages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 14154 ~ 7% +11.6%  15799 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
 14154 ~ 7% +11.6%  15799 ~ 7%  TOTAL numa-meminfo.node0.AnonPages

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
  3511 ~ 7% +11.0%   3898 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
  3511 ~ 7% +11.0%   3898 ~ 7%  TOTAL 
numa-vmstat.node0.nr_active_anon

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 14044 ~ 7% +11.1%  15597 ~ 7%  
lkp-snb01/hackbench/50%-threads-socket
 14044 ~ 7% +11.1%  15597 ~ 7%  TOTAL 
numa-meminfo.node0.Active(anon)

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
187958 ~ 2% +56.6% 294375 ~ 5%  ivb42/hackbench/50%-threads-pipe
124490 ~ 2% +35.0% 168004 ~ 4%  
lkp-snb01/hackbench/50%-threads-socket
312448 ~ 2% +48.0% 462379 ~ 5%  TOTAL time.minor_page_faults

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 11.47 ~ 1%  -2.8%  11.15 ~ 1%  ivb42/hackbench/50%-threads-pipe
 11.47 ~ 1%  -2.8%  11.15 ~ 1%  TOTAL turbostat.RAM_W

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
 3.649e+08 ~ 0%  -2.4%  3.562e+08 ~ 0%  
lkp-snb01/hackbench/50%-threads-socket
 3.649e+08 ~ 0%  -2.4%  3.562e+08 ~ 0%  TOTAL 
time.involuntary_context_switches

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
---  -  
   1924472 ~ 0%  -2.6%1874425 ~ 0%  ivb42/hackbench/50%-threads-pipe
   1924472 ~ 0%  -2.6%1874425 ~ 0%  TOTAL